WIEN2k

(L)APW

Features

Hard+Software

Order info

Papers

reg_user







Adding a new dimension to DFT calculations of solids ...

Serial benchmark

P4 dual-Xeon, 3.6 GHz            165 sec    ifort9 + mkl8 (1 job with 1 thread!)
P4 dual-Xeon, 3.6 GHz            125 sec    ifort9 + mkl8 (1 job with 2 threads!)

bi-Xeon 5320 (overcl 2.67GHz)    119 sec    ifort9.1 + mkl9.0 (1 job with 1 threads)
bi-Xeon 5320 (overcl 2.67GHz)     90 sec    ifort9.1 + mkl9.0 (1 job with 2 threads)
bi-Xeon 5320 (overcl 2.67GHz)     76 sec    ifort9.1 + mkl9.0 (1 job with 4 threads)
bi-Xeon 5320 (overcl 2.67GHz)     69 sec    ifort9.1 + mkl9.0 (1 job with 8 threads)

P4 Core2 Duo E6600, 2.4 GHz      128 sec    ifort10.1+mkl9.1, OMP_NUM_THREADS=1
P4 Core2 Duo E6600, 2.4 GHz      103 sec    ifort10.1+mkl9.1, OMP_NUM_THREADS=2

IBM 52A  1.90GHz Power5+(1 cpu)  135 sec    xlf10.1,-q64 -O5,ESSL4.2
IBM 52A  (-"-,2 cpus)             83 sec     - " -
IBM 52A  (-"-,2 cpus, SMT=on)     80 sec     - " -

Itanium2(1.6GHz,SGI Altix 3700)  122 sec    ifort9.0 +mkl8.0, libgoto_itanium2_64p-r1.00
Itanium2(-"-, 2 threads)          90 sec    ifort9.0 +mkl8.0, libgoto_itanium2_64p-r1.00

AMD-Opteron, single cpu, 2.4 Ghz   190 sec    ifort(9.1.40) + libgoto_opteron64p-r1.09.so

A "historical list" can be found  here.


Compaq-Alpha EV68, 1GHz          574 sec    -ldxml

G5(dual CPU64bits 2GHz)          690 sec    Absoft/-O
G5                               350 sec    /xlf compiler  /G5 64bits libraries /-O5 -qhot -arch=g5

Athlon XP3000+ (2.17 GHz)        541 sec    ifc, ifc compiled LAPACK, Athlon ATLAS)
Athlon XP3000+ (2.17 GHz)        515 sec    PGI, PGI compiled LAPACK, Athlon ATLAS)

Athlon64 XP3500+                 275 sec    ifort9, mkl8                             

P4, 2.5 GHz, dual channel mem.   347 sec    ifc7, mkl6
P4, 2.5 GHz, dual channel mem.   328 sec    ifc7, goto-library*
P4, 3.0 GHz, dual channel mem.   288 sec    ifc7, mkl6
P4, 3.2 GHz, dual channel mem.   258 sec    ifc7, mkl6
P4, 3.2 GHz, 400MHz dual ch.mem. 228 sec    ifort8.0, mkl6.1
P4, 3.4 GHz, 400MHz dual ch.mem. 212 sec    ifort8.0, mkl6.1
P4-640, Intel-945G, DDR-II 533   176 sec    ifort9+mkl8, HT enabled
P4-640, Intel-945G, DDR-II 533   163 sec    ifort9+mkl8, HT disabled
P4-640, Intel-945G, DDR-II 533   194 sec    ifort9+mkl8, HT enabled, OMP_NUM_THREADS=2
P4-640, Intel-945G, DDR-II 533   386 sec    ifort9+mkl8, HT enabled, 2 parallel lapw1c

P4 dual-Xeon, 3.0 GHz            253 sec    ifc7.1, mkl6.1
P4 dual-Xeon, 2.8 GHz            226 sec    ifort 8.1 + mkl 7.2.1 EM64T
P4 dual-Xeon, 3.6 GHz            211 sec    pgf90 + libgoto_p4-64_1024-r0.97 (2 jobs in parallel)
P4 dual-Xeon, 3.6 GHz            184 sec    pgf90 + libgoto_p4-64_1024-r0.97 (1 job)
P4 dual-Xeon, 3.6 GHz            165 sec    ifort9 + mkl8 (1 job with 1 thread!)
P4 dual-Xeon, 3.6 GHz            125 sec    ifort9 + mkl8 (1 job with 2 threads!)

bi-Xeon 5140  2.33GHz            132 sec    ifort9 + goto1.08 (1 job with 1 thread)
bi-Xeon 5140  2.33GHz            138 sec    ifort9 + goto1.08 (2 jobs with 1 thread)
bi-Xeon 5140  2.33GHz            181 sec    ifort9 + goto1.08 (4 jobs with 1 thread)
bi-Xeon 5320 (overcl 2.67GHz)    119 sec    ifort9.1 + mkl9.0 (1 job with 1 threads)
bi-Xeon 5320 (overcl 2.67GHz)     90 sec    ifort9.1 + mkl9.0 (1 job with 2 threads)
bi-Xeon 5320 (overcl 2.67GHz)     76 sec    ifort9.1 + mkl9.0 (1 job with 4 threads)
bi-Xeon 5320 (overcl 2.67GHz)     69 sec    ifort9.1 + mkl9.0 (1 job with 8 threads)
bi-Xeon 5320 (overcl 2.67GHz)    122 sec    ifort9.1 + mkl9.0 (2 jobs with 1 thread)
bi-Xeon 5320 (overcl 2.67GHz)    159 sec    ifort9.1 + mkl9.0 (4 job with 1 thread)
bi-Xeon 5320 (overcl 2.67GHz)    286 sec    ifort9.1 + mkl9.0 (8 job with 1 thread)

MacPRO, dual-Xeon 5300, 8x3.0GHz 182 sec    gcc4.01,gfortran,libgoto (1 job with 1 thread, -O3 -ftree-vectorize -ffast-math)
MacPRO, dual-Xeon 5300, 8x3.0GHz 254 sec    (eq. 64 sec) gcc4.01,gfortran,libgoto (4 jobs with 1 thread)
MacPRO, dual-Xeon 5300, 8x3.0GHz 337 sec    (eq. 56 sec) gcc4.01,gfortran,libgoto (6 jobs with 1 thread)
MacPRO, dual-Xeon 5300, 8x3.0GHz 448 sec    (eq. 56 sec) gcc4.01,gfortran,libgoto (8 jobs with 1 thread)

MacPRO, dual-Xeon 5300, 8x3.0GHz 115 sec    fedora7,ifort10,mkl9.1 (1 job  with 1 thread)
MacPRO, dual-Xeon 5300, 8x3.0GHz 125 sec    (62.610 sec/kpt),2 jobs with 1 thread  
MacPRO, dual-Xeon 5300, 8x3.0GHz 167 sec    (41.771 sec/kpt),4 jobs with 1 thread 
MacPRO, dual-Xeon 5300, 8x3.0GHz 237 sec    (39.627 sec/kpt),6 jobs with 1 thread
MacPRO, dual-Xeon 5300, 8x3.0GHz 311 sec    (38.897 sec/kpt),8 jobs with 1 thread

P4D dual-core (820), 2.8 GHz     192 sec    ifort9 + cmkl8.0
P4D dual-core (820), 2.8 GHz     142 sec    ifort9 + cmkl8.0 OMP_NUM_THREADS=2
P4D dual-core (820), 3.2 GHz     128 sec    ifort9 + cmkl8.0 OMP_NUM_THREADS=2

P4 Core2 Duo E6600, 2.4 GHz      131 sec    ifort9.1 + cmkl8.1,-axT, OMP_NUM_THREADS=1
P4 Core2 Duo E6600, 2.4 GHz      103 sec    ifort9.1 + cmkl8.1,-axT, OMP_NUM_THREADS=2
P4 Core2 Duo E6600, 2.8 GHz       88 sec    ifort9.1 + cmkl8.1,-axT, OMP_NUM_THREADS=2
P4 Core2 Duo E6600, overcl3.15GHz 79 sec    ifort9.1 + cmkl8.1,-axT, OMP_NUM_THREADS=2

AMD-Opteron, dual cpu, 2.0 Ghz   270 sec    ifc7, goto_opt32-r0.92-library*
AMD-Opteron, dual cpu, 2.2 Ghz   282 sec    pgf90, ACML2.0-library
AMD-Opteron, dual cpu, 2.4 Ghz   365 sec    pathscale2.1 + mkl 7.2
AMD-Opteron, dual cpu, 2.4 Ghz   355 sec    pathscale2.1 (-Ofast -IPA) + mkl 7.2
AMD-Opteron, dual cpu, 2.4 Ghz   366 sec    ifort64 + mkl 7.2
AMD-Opteron, dual cpu, 2.4 Ghz   270 sec    ifort64 + atlas  
AMD-Opteron, dual cpu, 2.4 Ghz   215 sec    ifort64 + goto_opt64-r0.96-2
AMD-Opteron, single cpu, 2.4 Ghz   190 sec    ifort(9.1.40) + libgoto_opteron64p-r1.09.so

AMD (Sun V40z)         2.6GHz    237 sec    pgf90 + ACML
AMD (-"-, 2 threads)   2.6GHz    175 sec    -"-
AMD (-"-, 4 threads)   2.6GHz    144 sec    -"-


IBM p630 1.45GHz Power4+         241 sec    xlf 8.1.1,-q64 -O5,ESSL4.1
IBM p655 1.50GHz Power4+         206 sec    xlf 8.1.1,-q64 -O5,ESSL4.1
IBM SP5  1.90GHz Power5+         167 sec    xlf 9.1,-q64 -O3,ESSL4.2
IBM 52A  1.90GHz Power5+(1 cpu)  135 sec    xlf10.1,-q64 -O5,ESSL4.2
IBM 52A  (-"-,2 cpus)             83 sec     - " -                  
IBM 52A  (-"-,2 cpus, SMT=on)     80 sec     - " -                  

Itanium2(1.3GHz,SGI Altix 3700)  298 sec    ifc7.1 + mkl6.0
Itanium2(1.5GHz 6Mb cache, HP)   190 sec    HP f90 + mlib (2004)
Itanium2(1.5GHz 6Mb cache, HP)   168 sec    HP f90 + mlib (Mai 2005)
Itanium2(1.3GHz,SGI Altix 3700)  189 sec    ifc7.1 +SCSL 1.5, libgoto_it2-r0.94
Itanium2(1.5GHz,SGI Altix 3700)  165 sec    ifc7.1 +SCSL 1.5, libgoto_it2-r0.94
Itanium2(1.6GHz,SGI Altix 3700)  122 sec    ifort9.0 +mkl8.0, libgoto_itanium2_64p-r1.00
Itanium2(-"-, 2 threads)          90 sec    ifort9.0 +mkl8.0, libgoto_itanium2_64p-r1.00

* libgoto_p4_512-r0.6.so blas libraries are available from:



[Home] [(L)APW+lo] [Features] [Hard+Soft] [Order info] [Papers] [Reg Users]

©2001 by P. Blaha and K. Schwarz