How to optimise WIEN2k with math-libraries ?
Why should you use math-libraries and a good compiler ?
The effect of optimised libraries on WIEN2k
How to obtain and use libraries with WIEN2k
·
Compiling and linking with the
libraries (for pc’s)
WIEN2k is written in a programming language that humans can understand: fortran90. The processor of a computer doesn’t understand fortran90, however. Therefore, the code has to be translated into a (long) series of instructions that the processor understands (machine language). This translation is done by the compiler, and the translated files are executable files as lapw1, lapw2, … Examples of fortran90 compilers are IFC or pgf90.
Different types of processors speak different languages, therefore one cannot use the executables made for e.g. IBM SP4 on an Intel Pentium IV. Also, the compiler can translate a piece of fortran90 code in many different ways into machine language for one specific processors, some of them executing the programmed task much faster than others. A good compiler is one that delivers an executable that needs a minimal amount of time to perform the programmed task. (Note that the execution time is meant here – the time needed for the processor to execute the compiled executable. This is different from the compilation time – the time the compiler needs to translate the fortran90 code into an executable. It might require a long compilation time to produce an executable with the smallest execution time.)
Some tasks are used in many different codes, as e.g. the multiplication of 2 matrices. Such tasks are programmed once in general purpose library-routines as LAPACK and BLAS. A programmer now doesn’t need to spend personal time on programming a matrix multiplication any more, but can immediately use the relevant subroutine available in such libraries. This saves programming time, and – if the subroutines use efficient algorithms – execution time as well.
Still, a compiler can translate such an efficient subroutine in many different ways into executable code, which can be less or more efficient. As a code often spends a large share of its time in the execution of such library routines, it could be highly beneficial if an efficient translation of these routines into machine language is used. Therefore, all major vendors of processors provide libraries that are already translated into machine language in such a way that execution on that particular processor needs (almost) the minimal amount of time. Such processor-dependent versions of e.g. BLAS, are essl (for IBM SP4) or mkl (for Intel Pentium IV).
An alternative for these sometimes expensive vendor-supplied compiled libraries is ATLAS. It is a software package that examines the details of your processor, and then compiles a BLAS-library that is rather fast for your particular processor, whatever it be.
Here you see some test results (h:mm:ss) for lapw1 on two rather time-consuming cases, once for a Pentium IV (2.4 GHz) and once for an AMD Athlon 2200+. In all cases IFC 7.1 was used as compiler.
CPU |
compiler |
library |
Case 1 |
Case 2 |
Pentium IV |
IFC 7.1 |
(precompiled) |
1:03:57 |
1:06:01 |
Pentium IV |
IFC 7.1 |
ATLAS 3.4.1 |
0:29:17 |
0:31:31 |
Pentium IV |
IFC 7.1 |
mkl 6.0 |
0:22:41 |
0:24:05 |
Athlon |
IFC 7.1 |
(precompiled) |
1:59:24 |
2:07:43 |
Athlon |
IFC 7.1 |
ATLAS 3.4.1 |
0:44:01 |
0:45:01 |
Athlon |
IFC 7.1 |
mkl 6.0 |
0:43:53 |
0:46:43 |
You see that in both cases there is a factor of 2 gain between using the precompiled executables (downloaded directly from www.wien2k.at), and compiling yourself while linking to either ATLAS or mkl (the precompiled executables were made with a now rather old version of the pfg90 compiler, and linked with the an ATLAS library of around 2001). There is no meaningful difference between ATLAS and mkl on an Athlon, while mkl is definitely faster than ATLAS on a Pentium (which is normal, as mkl is designed for Pentiums). This table shows that it is highly recommendable to compile WIEN2k yourself, and to link it to mkl or ATLAS. The alternative would be to buy a pc that is twice as fast, which is of course much more expensive.
mkl is free for academic use, and can be obtained from www.intel.com (search for mkl or Math Kernel Library). Follow the (easy) instructions of the install script to install it on your system (Pentium 4 or maybe Athlon).
ATLAS is free for everybody, and can be obtained from math-atlas.sourceforge.net . Either install a precompiled package for your system (if available), or install the source and compile it yourself (you need a fortran90 compiler for this). The latter is advised for optimal performance, and is thanks to a highly user-friendly and informative install script not difficult to do. (Warning: at least on some Athlons the precompiled ATLAS 3.4.1 yields errors when trying to link it with WIEN2k, while these errors disappear if you compile ATLAS yourself.)
Intel has (as the only one ?) a good fortran90 compiler available that is free for academic use: IFC. It can be obtained from www.intel.com (search for IFC).
The siteconfig script from WIEN2k guides you through the installation for supported compilers and libraries. Depending on the version of the mkl-libraries, or if you use ATLAS, some small modifications may be necessary. First, locate the position of the library that you want to link with (more precisely, the location of the lib*.a files). For mkl, that will most likely be something as
/opt/intel/mkl60/lib/libmkl_ia32.a and /opt/intel/mkl60/lib/libmkl_lapack.a (the name of the file changes from version to version, hence it could be libmkl_p4.a or something different as well).
For ATLAS, it will be something as
/path-to-ATLAS/lib/libatlas.a (you were free to put this at any place during
installation of ATLAS)
/path-to-ATLAS/lib/libf77blas.a
In both cases, you now have to add the path to these files behind the –L option in siteconfig (siteconfig, O, L) :
-L../SRC_lib -L/opt/intel/mkl60/lib
or
-L../SRC_lib -L/path-to-ATLAS/lib
Next, you have to indicate which of the lib*.a files you actually want to use (i.e. the part of the name that is on the position of the ‘*’). (siteconfig, O, R) :
For mkl 6.0 :
-lmkl_lapack -lmkl_ia32 -lguide –lpthread
(mkl_lapack and mkl_ia32 is the library from mkl)
For ATLAS :
-llapack_lapw -lf77blas -latlas -lguide -lpthread
(f77blas and atlas are the libraries from ATLAS)