StartDate: 2026-06-23 21:18:14+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: 5fdc47cff4886017b084b7ab2c66bae19986e771 CommitTime: 2026-06-23 21:49:03 +0200 CommitAuthor: SY Wang CommitSubject: Toolchain: Add Eigen3 and Remove FindLibint2.cmake (#5427) #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=5fdc47cff4886017b084b7ab2c66bae19986e771 SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 420.9MB Step 1/46 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 2cd52cbb1ebe: Waiting 6e8af4fd0a07: Waiting 15a17189b2df: Waiting 5a2aba542b08: Waiting 02cb0e091e33: Waiting 9c3d619183d2: Waiting 7f7602a82106: Waiting 6cb9b761b877: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 32f112e3802c: Download complete 2cd52cbb1ebe: Verifying Checksum 2cd52cbb1ebe: Download complete 6e8af4fd0a07: Verifying Checksum 6e8af4fd0a07: Download complete 02cb0e091e33: Verifying Checksum 02cb0e091e33: Download complete 9c3d619183d2: Verifying Checksum 9c3d619183d2: Download complete 7f7602a82106: Verifying Checksum 7f7602a82106: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/46 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/46 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/46 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/46 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/46 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/46 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> ae920e0abda3 Step 8/46 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> 94839a704e2d Step 9/46 : RUN mkdir scripts ---> Using cache ---> 433a8b0a0499 Step 10/46 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./tools/build_utils/fypp ./scripts/ ---> Using cache ---> b2d99e681d8f Step 11/46 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> 2dfccf0c988c Step 12/46 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --with-sirius=install --gpu-ver=V100 --dry-run ---> Running in 6215319ba488 No MPI installation detected. (Ignore this message if a fresh MPI installation is requested.) Toolchain script received the following options: --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --with-sirius=install --gpu-ver=V100 --dry-run Parsing options and resolving conflicts... WARNING: (./install_cp2k_toolchain.sh, line 1163) Installing dependencies and CP2K requires CMake but CMake is not enabled, so a new copy of CMake will be installed first.  Toolchain configuration summary ------------------------------- System specifications: -j = 12 --target-cpu = native --gpu-ver = V100 --mpi-mode = mpich --math-mode = openblas Enabled features: --enable-tsan = no --enable-cuda = yes --enable-gauxc-cutlass = no --enable-hip = no --enable-opencl = no --enable-cray = no Packages to be installed: - cmake - mpich - openblas - fftw - eigen - libint - libxc - libxsmm - libxs - cosma - scalapack - elpa - dbcsr - spfft - spla - gsl - spglib - hdf5 - libvdwxc - sirius - libvori - tblite - pugixml - fmt Packages to be detected from system: - gcc Packages not used: - intel - amd - ninja - openmpi - intelmpi - mkl - acml - gauxc - libxstream - cusolvermp - plumed - libtorch - deepmd - ace - dftd4 - libsmeagol - trexio - libfci - greenx - gmp - mcl With --dry-run option, this script concludes with above report. The setup, toolchain env and conf files are written to /opt/cp2k-toolchain/install. ---> Removed intermediate container 6215319ba488 ---> 596c64249bfe Step 13/46 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> be0307b3c1f5 Step 14/46 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Running in 3c8c101a68b0 ==================== Finding GCC from system paths ==================== path to gcc is /usr/bin/gcc path to g++ is /usr/bin/g++ path to gfortran is /usr/bin/gfortran GCC compiler version 13.3.0 found Found include directory /usr/include Found lib directory /usr/lib/x86_64-linux-gnu Step gcc took 0.00 seconds. Step intel took 0.00 seconds. Step amd took 0.00 seconds. ==================== Getting proc arch info using OpenBLAS tools ==================== wget --quiet https://www.cp2k.org/static/downloads/OpenBLAS-0.3.33.tar.gz -O OpenBLAS-0.3.33.tar.gz OpenBLAS-0.3.33.tar.gz: OK Checksum of OpenBLAS-0.3.33.tar.gz Ok OpenBLAS detected LIBCORE = skylakex OpenBLAS detected ARCH = x86_64 ==================== Installing CMake ==================== wget --quiet https://www.cp2k.org/static/downloads/cmake-4.3.0-linux-x86_64.tar.gz -O cmake-4.3.0-linux-x86_64.tar.gz cmake-4.3.0-linux-x86_64.tar.gz: OK Checksum of cmake-4.3.0-linux-x86_64.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/cmake-4.3.0 Step cmake took 6.00 seconds. Step ninja took 0.00 seconds. ---> Removed intermediate container 3c8c101a68b0 ---> 8bdf9055ff3a Step 15/46 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> 523f433f7c2f Step 16/46 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Running in 296f80cfb73c ==================== Installing MPICH ==================== wget --quiet https://www.cp2k.org/static/downloads/mpich-5.0.1.tar.gz -O mpich-5.0.1.tar.gz mpich-5.0.1.tar.gz: OK Checksum of mpich-5.0.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/mpich-5.0.1 for MPICH device ch4 Found directory /opt/cp2k-toolchain/install/mpich-5.0.1/bin Found directory /opt/cp2k-toolchain/install/mpich-5.0.1/lib Found directory /opt/cp2k-toolchain/install/mpich-5.0.1/include mpiexec is installed as /opt/cp2k-toolchain/install/mpich-5.0.1/bin/mpiexec mpicc is installed as /opt/cp2k-toolchain/install/mpich-5.0.1/bin/mpicc mpicxx is installed as /opt/cp2k-toolchain/install/mpich-5.0.1/bin/mpicxx mpifort is installed as /opt/cp2k-toolchain/install/mpich-5.0.1/bin/mpifort Step mpich took 583.00 seconds. ---> Removed intermediate container 296f80cfb73c ---> 052550fddb2b Step 17/46 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> 914f7b5902f4 Step 18/46 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Running in feea873470d6 ==================== Installing OpenBLAS ==================== wget --quiet https://www.cp2k.org/static/downloads/OpenBLAS-0.3.33.tar.gz -O OpenBLAS-0.3.33.tar.gz OpenBLAS-0.3.33.tar.gz: OK Checksum of OpenBLAS-0.3.33.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/openblas-0.3.33 Installing OpenBLAS library for target SKYLAKEX Step openblas took 303.00 seconds. Step gmp took 0.00 seconds. ---> Removed intermediate container feea873470d6 ---> 7aa6a82cb0bf Step 19/46 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> 95b9a7247422 Step 20/46 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Running in d1a4dc7b25e3 ==================== Installing FFTW ==================== wget --quiet https://www.cp2k.org/static/downloads/fftw-3.3.11.tar.gz -O fftw-3.3.11.tar.gz fftw-3.3.11.tar.gz: OK Checksum of fftw-3.3.11.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/fftw-3.3.11 Step fftw took 169.00 seconds. ==================== Installing Eigen ==================== wget --quiet https://www.cp2k.org/static/downloads/eigen-5.0.1.tar.gz -O eigen-5.0.1.tar.gz eigen-5.0.1.tar.gz: OK Checksum of eigen-5.0.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/eigen-5.0.1 Step eigen took 3.00 seconds. ==================== Installing LIBINT ==================== wget --quiet https://www.cp2k.org/static/downloads/libint-v2.13.1-cp2k-lmax-5.tar.xz -O libint-v2.13.1-cp2k-lmax-5.tar.xz libint-v2.13.1-cp2k-lmax-5.tar.xz: OK Checksum of libint-v2.13.1-cp2k-lmax-5.tar.xz Ok Installing from scratch into /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5 Step libint took 545.00 seconds. ==================== Installing LIBXC ==================== wget --quiet https://www.cp2k.org/static/downloads/libxc-7.0.0.tar.bz2 -O libxc-7.0.0.tar.bz2 libxc-7.0.0.tar.bz2: OK Checksum of libxc-7.0.0.tar.bz2 Ok Installing from scratch into /opt/cp2k-toolchain/install/libxc-7.0.0 Step libxc took 444.00 seconds. Step greenx took 0.00 seconds. ---> Removed intermediate container d1a4dc7b25e3 ---> 9b9f548f6b38 Step 21/46 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> 922188eceede Step 22/46 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Running in 80cbf7e20bf4 ==================== Installing Libxsmm ==================== wget --quiet https://www.cp2k.org/static/downloads/libxsmm-db07b74.tar.gz -O libxsmm-db07b74.tar.gz libxsmm-db07b74.tar.gz: OK Checksum of libxsmm-db07b74.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libxsmm-db07b74 Step libxsmm took 21.00 seconds. ==================== Installing LIBXS ==================== wget --quiet https://codeload.github.com/hfp/libxs/tar.gz/ee1e6ab -O libxs-ee1e6ab.tar.gz libxs-ee1e6ab.tar.gz: OK Checksum of ee1e6ab Ok Installing from scratch into /opt/cp2k-toolchain/install/libxs-ee1e6ab Step libxs took 8.00 seconds. Step libxstream took 0.00 seconds. ==================== Installing ScaLAPACK ==================== wget --quiet https://www.cp2k.org/static/downloads/scalapack-2.2.3.tar.gz -O scalapack-2.2.3.tar.gz scalapack-2.2.3.tar.gz: OK Checksum of scalapack-2.2.3.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/scalapack-2.2.3 Step scalapack took 39.00 seconds. Step cusolvermp took 0.00 seconds. ==================== Installing COSMA ==================== wget --quiet https://www.cp2k.org/static/downloads/COSMA-v2.8.4.tar.gz -O COSMA-v2.8.4.tar.gz COSMA-v2.8.4.tar.gz: OK Checksum of COSMA-v2.8.4.tar.gz Ok wget --quiet https://www.cp2k.org/static/downloads/COSTA-v2.3.2.tar.gz -O COSTA-v2.3.2.tar.gz COSTA-v2.3.2.tar.gz: OK Checksum of COSTA-v2.3.2.tar.gz Ok wget --quiet https://www.cp2k.org/static/downloads/Tiled-MM-v2.3.2.tar.gz -O Tiled-MM-v2.3.2.tar.gz Tiled-MM-v2.3.2.tar.gz: OK Checksum of Tiled-MM-v2.3.2.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/COSMA-2.8.4 Step cosma took 66.00 seconds. ---> Removed intermediate container 80cbf7e20bf4 ---> c213e02a5118 Step 23/46 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> 8f6648e0dd42 Step 24/46 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Running in 8ccd253fd495 ==================== Installing ELPA ==================== wget --quiet https://www.cp2k.org/static/downloads/elpa-2026.02.001.tar.gz -O elpa-2026.02.001.tar.gz elpa-2026.02.001.tar.gz: OK Checksum of elpa-2026.02.001.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/elpa-2026.02.001 Installing from scratch into /opt/cp2k-toolchain/install/elpa-2026.02.001/cpu Installing from scratch into /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia Step elpa took 307.00 seconds. ---> Removed intermediate container 8ccd253fd495 ---> 0aa33b78a11f Step 25/46 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> 673fc41bcbbb Step 26/46 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Running in 14cfc677435c ==================== Installing GSL ==================== wget --quiet https://www.cp2k.org/static/downloads/gsl-2.8.tar.gz -O gsl-2.8.tar.gz gsl-2.8.tar.gz: OK Checksum of gsl-2.8.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/gsl-2.8 Step gsl took 72.00 seconds. Step plumed took 0.00 seconds. Step libtorch took 0.00 seconds. Step gauxc took 0.00 seconds. Step deepmd took 0.00 seconds. Step ace took 0.00 seconds. ---> Removed intermediate container 14cfc677435c ---> c3d131d99b11 Step 27/46 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> 089f449a4a91 Step 28/46 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Running in 506e7198ac2c ==================== Installing HDF5 ==================== wget --quiet https://www.cp2k.org/static/downloads/hdf5-2.1.1.tar.gz -O hdf5-2.1.1.tar.gz hdf5-2.1.1.tar.gz: OK Checksum of hdf5-2.1.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/hdf5-2.1.1 Step hdf5 took 126.00 seconds. ==================== Installing libvdwxc ==================== wget --quiet https://www.cp2k.org/static/downloads/libvdwxc-0.5.0.tar.gz -O libvdwxc-0.5.0.tar.gz libvdwxc-0.5.0.tar.gz: OK Checksum of libvdwxc-0.5.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libvdwxc-0.5.0 Step libvdwxc took 14.00 seconds. ==================== Installing Spglib ==================== wget --quiet https://www.cp2k.org/static/downloads/spglib-2.7.0.tar.gz -O spglib-2.7.0.tar.gz spglib-2.7.0.tar.gz: OK Checksum of spglib-2.7.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/spglib-2.7.0 Step spglib took 5.00 seconds. ==================== Installing libvori ==================== wget --quiet https://www.cp2k.org/static/downloads/libvori-220621.tar.gz -O libvori-220621.tar.gz libvori-220621.tar.gz: OK Checksum of libvori-220621.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libvori-220621 Step libvori took 23.00 seconds. Step libsmeagol took 0.00 seconds. ==================== Installing fmt ==================== wget --quiet https://www.cp2k.org/static/downloads/fmt-12.1.0.zip -O fmt-12.1.0.zip fmt-12.1.0.zip: OK Checksum of fmt-12.1.0.zip Ok Installing from scratch into /opt/cp2k-toolchain/install/fmt-12.1.0 Step fmt took 8.00 seconds. ---> Removed intermediate container 506e7198ac2c ---> afd9134b45a1 Step 29/46 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> 616019f7f105 Step 30/46 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Running in a040243e5c3c Step dftd4 took 0.00 seconds. ==================== Installing tblite ==================== wget --quiet https://www.cp2k.org/static/downloads/tblite-0.6.0.tar.xz -O tblite-0.6.0.tar.xz tblite-0.6.0.tar.xz: OK Checksum of tblite-0.6.0.tar.xz Ok Step tblite took 40.00 seconds. ==================== Installing pugixml ==================== wget --quiet https://www.cp2k.org/static/downloads/pugixml-1.15.tar.gz -O pugixml-1.15.tar.gz pugixml-1.15.tar.gz: OK Checksum of pugixml-1.15.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/pugixml-1.15 Step pugixml took 8.00 seconds. ==================== Installing SpFFT ==================== wget --quiet https://www.cp2k.org/static/downloads/SpFFT-1.1.1.tar.gz -O SpFFT-1.1.1.tar.gz SpFFT-1.1.1.tar.gz: OK Checksum of SpFFT-1.1.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/SpFFT-1.1.1 Step spfft took 21.00 seconds. ==================== Installing SpLA ==================== wget --quiet https://www.cp2k.org/static/downloads/SpLA-1.6.1.tar.gz -O SpLA-1.6.1.tar.gz SpLA-1.6.1.tar.gz: OK Checksum of SpLA-1.6.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/SpLA-1.6.1 Step spla took 24.00 seconds. ==================== Installing SIRIUS ==================== wget --quiet https://www.cp2k.org/static/downloads/SIRIUS-7.11.1.tar.gz -O SIRIUS-7.11.1.tar.gz SIRIUS-7.11.1.tar.gz: OK Checksum of SIRIUS-7.11.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/sirius-7.11.1 Step sirius took 419.00 seconds. Step libfci took 0.00 seconds. Step trexio took 0.00 seconds. Step MCL took 0.00 seconds. ---> Removed intermediate container a040243e5c3c ---> 0b3835d75902 Step 31/46 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> af1ba6f05fc9 Step 32/46 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Running in b1f1220e3185 ==================== Installing DBCSR ==================== wget --quiet https://codeload.github.com/cp2k/dbcsr/tar.gz/4d85b72 -O dbcsr-4d85b72.tar.gz dbcsr-4d85b72.tar.gz: OK Checksum of 4d85b72 Ok Installing from scratch into /opt/cp2k-toolchain/install/dbcsr-4d85b72 Step DBCSR took 131.00 seconds. ---> Removed intermediate container b1f1220e3185 ---> 08c327b5f21c Step 33/46 : WORKDIR /opt/cp2k ---> Running in aa9948d0e52b ---> Removed intermediate container aa9948d0e52b ---> e36cf44f7d02 Step 34/46 : COPY ./src ./src ---> 4691695725a1 Step 35/46 : COPY ./data ./data ---> 9d9aa9740242 Step 36/46 : COPY ./tools/build_utils ./tools/build_utils ---> 03cfcd81aca4 Step 37/46 : COPY ./cmake ./cmake ---> 5bdc1b6365cb Step 38/46 : COPY ./CMakeLists.txt . ---> df9c333f00b0 Step 39/46 : COPY ./tools/docker/scripts/build_cp2k.sh ./tools/docker/scripts/cmake_cp2k.sh ./ ---> 35c8f88ed48a Step 40/46 : RUN ./build_cp2k.sh toolchain_cuda_V100 psmp ---> Running in b68a663b050f ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpi.so (found version "5.0") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpicxx.so (found version "5.0") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpifort.so (found version "5.0") -- Found MPI: TRUE (found version "5.0") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "5.0") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.33 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "5.0") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") -- Using LIBXS + LIBXSMM for Small Matrix Multiplication -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 52 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "2.1.1") found components: C Fortran -- Found MPI: TRUE (found version "5.0") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.11/include -- Boost detected. satisfied by headers bundled with Libint2 distribution -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- mctc-lib: Find installed package -- multicharge: Find installed package -- DFTD4: found version 4.2.0, using v4.2+ API -- toml-f: Find installed package -- s-dftd3: Find installed package -- DFTD4: found version 4.2.0, using v4.2+ API -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.7.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "2.1.1") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.11/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Checking for one of the modules 's-dftd3' -- Checking for one of the modules 'mctc-lib' -- Found DFTD3: /opt/cp2k-toolchain/install/tblite-0.6.0/lib/libs-dftd3.a -- Checking for one of the modules 'dftd4' -- Checking for one of the modules 'multicharge' -- Found DFTD4: /opt/cp2k-toolchain/install/tblite-0.6.0/lib/libdftd4.a -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'scalapack' -- Checking for one of the modules 'elpa;elpa_openmp;elpa-openmp-2019.05.001;elpa_openmp-2019.11.001;elpa_openmp-2020.05.001;elpa-2019.05.001;elpa-2019.11.001;elpa-2020.05.001' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so -- Checking for module 'libvdwxc>=0.5.0' -- Found libvdwxc, version 0.5.0 -- Checking for module 'fftw3' -- Found fftw3, version 3.3.11 -- Found LibVDWXC: vdwxc;fftw3 (Required is at least version "0.5.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -- Adding libxs_jit.F from dependency libxs for compilation -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - vendor: OpenBLAS - include directories: /opt/cp2k-toolchain/install/openblas-0.3.33/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so - LAPACK - include directories: /opt/cp2k-toolchain/install/openblas-0.3.33/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so - MPI - include directories: /opt/cp2k-toolchain/install/mpich-5.0.1/include - libraries: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpi.so - MPI_F08: ON - ScaLAPACK - vendor: auto - include directories: - libraries: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a - Hardware Acceleration: - CUDA: - GPU architecture number: 52 - GPU profiling enabled: - GPU accelerated modules - ELPA module: ON - GRID module: ON - DBM module: ON - PW module: ON - LibXC - version: 7.0.0 - include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - version: 2.1.1 - include directories: /opt/cp2k-toolchain/install/hdf5-2.1.1/include - libraries: hdf5-shared - FFTW3 - include directories: /opt/cp2k-toolchain/install/fftw-3.3.11/include - libraries: /opt/cp2k-toolchain/install/fftw-3.3.11/lib/libfftw3.a - LIBXS - include directories: - libraries: - SpLA - include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - DFTD4 - include directories : /opt/cp2k-toolchain/install/tblite-0.6.0/include;/opt/cp2k-toolchain/install/tblite-0.6.0/include/dftd4/GNU-13.3.0 - libraries : - TBLITE : - include directories : /opt/cp2k-toolchain/install/tblite-0.6.0/include;/opt/cp2k-toolchain/install/tblite-0.6.0/include/tblite/GNU-13.3.0 - tblite libraries : - SIRIUS - include directories: - libraries: - COSMA - include directories: /opt/cp2k-toolchain/install/COSMA-2.8.4/include - libraries: MPI::MPI_CXX;costa::costa;$;$;$<$:cosma::BLAS::blas>;$;$<$:Tiled-MM::Tiled-MM>;$<$:Tiled-MM::Tiled-MM>;$<$:semiprof::semiprof>;$<$:cosma::scalapack::scalapack> - Libint2 - include directories: - libraries: - ELPA - include directories: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/include/elpa_openmp-2026.02.001 - libraries: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -------------------------------------------------------------------- - - - List of dependencies not included in this build - - - -------------------------------------------------------------------- - DeePMD - PEXSI - ACE (libpace) - Spglib - LibSMEAGOL - MiMiC - openPMD - DLA-Future - PLUMED - LibFCI - GauXC - Libvori - LibTorch - TREXIO - GreenX After building and installing CP2K the regtests can be run with the following command: /opt/cp2k/tests/do_regtest.py /opt/cp2k/bin psmp -- Configuring done (12.3s) -- Generating done (0.5s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container b68a663b050f ---> 522f80438cb6 Step 41/46 : COPY ./benchmarks ./benchmarks ---> 1d910ec93139 Step 42/46 : COPY ./tools/regtesting ./tools/regtesting ---> a1442e33057b Step 43/46 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> 86b70f1f0eb6 Step 44/46 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in 2e663fb821a1 ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma libxs libxsmm dbcsr_acc libdftd4 dftd4_v4_2 s_dftd3 mctc-lib tblite sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 83 input files and 0 errors. ========== Running Performance Test ========== Plot: name="total_timings_6cpu_1gpu", title="Total Timings with 6 CPU Cores and 1 GPU", ylabel="time [s]" Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.027 0.028 100.636 100.636 qs_mol_dyn_low 1 2.0 0.004 0.004 100.214 100.217 qs_forces 11 3.9 0.002 0.002 100.162 100.162 qs_energies 11 4.9 0.001 0.001 89.639 89.641 scf_env_do_scf 11 5.9 0.001 0.001 74.032 74.032 scf_env_do_scf_inner_loop 108 6.5 0.006 0.008 62.169 62.169 velocity_verlet 10 3.0 0.001 0.002 60.512 60.531 rebuild_ks_matrix 119 8.3 0.001 0.001 28.028 28.029 qs_ks_build_kohn_sham_matrix 119 9.3 0.019 0.019 28.027 28.028 qs_ks_update_qs_env 119 7.6 0.001 0.001 26.089 26.091 dbcsr_multiply_generic 2286 12.5 0.145 0.147 25.009 25.063 qs_rho_update_rho_low 119 7.7 0.001 0.001 22.714 22.735 calculate_rho_elec 119 8.7 0.841 0.850 22.713 22.735 qs_scf_new_mos 108 7.5 0.001 0.001 20.460 20.461 qs_scf_loop_do_ot 108 8.5 0.001 0.001 20.459 20.460 ot_scf_mini 108 9.5 0.003 0.003 18.538 18.540 fft_wrap_pw1pw2 1201 11.6 0.022 0.022 16.476 16.511 sum_up_and_integrate 119 10.3 0.002 0.002 15.454 15.506 integrate_v_rspace 119 11.3 0.334 0.339 15.364 15.417 fft_wrap_pw1pw2_140 487 12.2 0.003 0.003 14.174 14.201 multiply_cannon 2286 13.5 0.337 0.339 12.625 12.633 init_scf_loop 11 6.9 0.000 0.000 11.782 11.782 multiply_cannon_loop 2286 14.5 0.260 0.262 11.558 11.559 grid_collocate_task_list 119 9.7 11.137 11.184 11.137 11.184 ot_mini 108 10.5 0.001 0.001 10.845 10.846 density_rs2pw 119 9.7 0.007 0.008 10.697 10.787 make_m2s 4572 13.5 0.044 0.045 10.765 10.783 make_images 4572 14.5 1.098 1.106 10.594 10.610 grid_integrate_task_list 119 12.3 8.683 8.739 8.683 8.739 pw_gpu_r3dc1d_3d_ps 606 13.1 2.304 2.330 8.439 8.442 pw_gpu_c1dr3d_3d_ps 595 14.2 2.195 2.223 8.008 8.040 init_scf_run 11 5.9 0.000 0.000 7.530 7.530 scf_env_initial_rho_setup 11 6.9 0.000 0.001 7.530 7.530 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.422 7.490 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 7.445 7.445 prepare_preconditioner 11 7.9 0.000 0.000 7.174 7.175 make_preconditioner 11 8.9 0.000 0.000 7.174 7.175 qs_ot_get_derivative 108 11.5 0.002 0.002 6.603 6.606 hybrid_alltoall_any 4725 16.4 4.736 4.769 6.448 6.467 potential_pw2rs 119 12.3 0.035 0.036 6.347 6.348 make_images_data 4572 15.5 0.055 0.056 6.314 6.335 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.987 6.245 multiply_cannon_multrec 4572 15.5 1.956 2.021 6.103 6.139 mp_alltoall_z22v 1201 15.6 4.197 4.285 4.197 4.285 ot_diis_step 108 11.5 0.006 0.006 4.219 4.219 wfi_extrapolate 11 7.9 0.001 0.001 3.940 3.940 build_core_ppl_forces 11 5.9 3.773 3.824 3.773 3.824 dbcsr_mm_accdrv_process 9594 16.2 0.596 0.598 3.763 3.780 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.741 3.742 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.677 3.682 apply_single 119 13.6 0.001 0.001 3.677 3.682 mp_waitall_1 64495 16.9 3.609 3.662 3.609 3.662 dbcsr_complete_redistribute 329 12.2 1.217 1.229 3.304 3.570 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.315 3.350 calculate_dm_sparse 119 9.5 0.001 0.001 3.310 3.312 qs_ot_get_p 119 10.4 0.001 0.001 3.248 3.253 multiply_cannon_sync_h2d 4572 15.5 3.071 3.145 3.071 3.145 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.025 3.025 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.682 2.683 yz_to_x 606 14.1 0.451 0.455 2.596 2.645 transfer_rs2pw 487 10.6 0.008 0.008 2.578 2.644 pw_poisson_solve 119 10.3 0.003 0.003 2.609 2.611 jit_kernel_multiply 12 15.8 2.595 2.606 2.595 2.606 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.574 2.586 x_to_yz 595 15.2 0.493 0.498 2.544 2.574 qs_create_task_list 11 7.9 0.000 0.000 2.472 2.506 generate_qs_task_list 11 8.9 1.132 1.137 2.472 2.506 qs_ot_get_derivative_taylor 59 13.0 0.003 0.003 2.319 2.321 calculate_first_density_matrix 1 7.0 0.000 0.000 2.316 2.316 transfer_rs2pw_140 130 11.5 1.536 1.565 2.138 2.222 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.175 2.176 cp_fm_cholesky_invert 11 10.9 2.167 2.167 2.167 2.167 pw_gpu_fg 606 14.1 2.122 2.135 2.122 2.135 qs_ot_p2m_diag 50 11.0 0.082 0.084 2.091 2.094 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.013 2.023 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64", label="H2O-64", y=100.636, yerr=0.0 Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=68.10999999999999, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=11.137, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.683, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.736, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=4.197, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.773, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.025 0.026 92.966 92.966 qs_mol_dyn_low 1 2.0 0.004 0.004 92.555 92.557 qs_forces 11 3.9 0.002 0.002 92.510 92.510 qs_energies 11 4.9 0.001 0.001 81.827 81.829 scf_env_do_scf 11 5.9 0.001 0.001 65.737 65.737 velocity_verlet 10 3.0 0.001 0.002 57.204 57.220 scf_env_do_scf_inner_loop 96 6.5 0.005 0.007 53.945 53.945 rebuild_ks_matrix 107 8.3 0.001 0.001 25.802 25.803 qs_ks_build_kohn_sham_matrix 107 9.3 0.017 0.017 25.802 25.802 qs_ks_update_qs_env 107 7.6 0.001 0.001 23.696 23.697 dbcsr_multiply_generic 1966 12.4 0.125 0.125 22.332 22.351 qs_rho_update_rho_low 107 7.7 0.001 0.001 19.231 19.246 calculate_rho_elec 107 8.7 0.754 0.760 19.231 19.246 qs_scf_new_mos 96 7.5 0.001 0.001 17.953 17.957 qs_scf_loop_do_ot 96 8.5 0.001 0.001 17.952 17.956 ot_scf_mini 96 9.5 0.003 0.003 16.251 16.252 fft_wrap_pw1pw2 1081 11.6 0.020 0.020 14.899 14.919 sum_up_and_integrate 107 10.3 0.002 0.002 14.576 14.597 integrate_v_rspace 107 11.3 0.307 0.308 14.494 14.516 fft_wrap_pw1pw2_140 439 12.2 0.003 0.003 12.838 12.864 init_scf_loop 11 6.9 0.000 0.000 11.711 11.711 multiply_cannon 1966 13.4 0.289 0.297 11.286 11.288 multiply_cannon_loop 1966 14.4 0.220 0.222 10.405 10.415 density_rs2pw 107 9.7 0.007 0.007 9.635 9.753 make_m2s 3932 13.4 0.038 0.038 9.637 9.639 ot_mini 96 10.5 0.001 0.001 9.508 9.510 make_images 3932 14.4 0.997 1.009 9.485 9.487 grid_collocate_task_list 107 9.7 8.814 8.904 8.814 8.904 grid_integrate_task_list 107 12.3 8.451 8.470 8.451 8.470 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.211 8.211 pw_gpu_r3dc1d_3d_ps 546 13.1 2.075 2.088 7.657 7.660 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.452 7.591 init_scf_run 11 5.9 0.000 0.000 7.251 7.251 scf_env_initial_rho_setup 11 6.9 0.000 0.001 7.251 7.251 pw_gpu_c1dr3d_3d_ps 535 14.2 1.971 1.990 7.217 7.233 prepare_preconditioner 11 7.9 0.000 0.000 7.139 7.146 make_preconditioner 11 8.9 0.000 0.000 7.139 7.146 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.999 6.250 hybrid_alltoall_any 4079 16.3 4.260 4.265 5.800 5.811 qs_ot_get_derivative 96 11.5 0.002 0.002 5.788 5.789 potential_pw2rs 107 12.3 0.032 0.032 5.736 5.738 multiply_cannon_multrec 3932 15.4 1.776 1.791 5.665 5.670 make_images_data 3932 15.4 0.047 0.047 5.659 5.667 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.165 4.281 build_core_ppl_forces 11 5.9 3.771 3.879 3.771 3.879 mp_alltoall_z22v 1081 15.6 3.752 3.804 3.752 3.804 wfi_extrapolate 11 7.9 0.001 0.001 3.794 3.794 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.666 3.720 ot_diis_step 96 11.5 0.005 0.005 3.700 3.700 dbcsr_complete_redistribute 317 12.2 1.212 1.228 3.361 3.606 dbcsr_mm_accdrv_process 8450 16.1 0.693 1.108 3.550 3.566 qs_create_task_list 11 7.9 0.000 0.000 3.307 3.365 generate_qs_task_list 11 8.9 1.389 1.401 3.307 3.365 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.294 3.298 apply_single 107 13.6 0.001 0.001 3.294 3.298 mp_waitall_1 55487 16.8 3.091 3.113 3.091 3.113 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.076 3.076 calculate_dm_sparse 107 9.5 0.001 0.001 3.022 3.023 qs_ot_get_p 107 10.4 0.001 0.001 2.820 2.821 multiply_cannon_sync_h2d 3932 15.4 2.757 2.784 2.757 2.784 jit_kernel_multiply 13 15.7 2.349 2.748 2.349 2.748 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.653 2.654 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.621 2.621 transfer_rs2pw 439 10.6 0.007 0.007 2.368 2.531 yz_to_x 546 14.1 0.411 0.416 2.331 2.357 pw_poisson_solve 107 10.3 0.003 0.003 2.337 2.339 x_to_yz 535 15.2 0.441 0.444 2.272 2.289 calculate_first_density_matrix 1 7.0 0.000 0.000 2.221 2.221 transfer_rs2pw_140 118 11.5 1.394 1.415 1.982 2.157 cp_fm_cholesky_invert 11 10.9 2.120 2.121 2.120 2.121 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.117 2.118 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.097 2.098 pw_gpu_fg 546 14.1 1.965 1.974 1.965 1.974 qs_ot_get_derivative_taylor 53 13.0 0.002 0.002 1.942 1.943 build_core_ppl 11 7.9 1.902 1.936 1.902 1.936 copy_fm_to_dbcsr 170 11.1 0.002 0.002 1.669 1.924 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64_nonortho", label="H2O-64_nonortho", y=92.966, yerr=0.0 Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=63.91799999999999, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.814, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.451, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.26, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.771, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=3.752, yerr=0.0 Running w64PBE.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/w64PBE_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.041 0.043 235.180 235.180 qs_mol_dyn_low 1 2.0 0.004 0.004 234.496 234.499 qs_forces 11 3.9 0.002 0.002 234.448 234.448 qs_energies 11 4.9 0.001 0.001 203.421 203.421 scf_env_do_scf 11 5.9 0.001 0.001 183.644 183.644 velocity_verlet 10 3.0 0.001 0.002 183.231 183.249 scf_env_do_scf_inner_loop 106 6.8 0.006 0.008 158.361 158.361 rebuild_ks_matrix 117 8.5 0.001 0.001 130.754 130.762 qs_ks_build_kohn_sham_matrix 117 9.5 0.019 0.019 130.754 130.761 qs_ks_update_qs_env 120 7.8 0.001 0.001 116.086 116.091 fft_wrap_pw1pw2 2000 12.9 0.043 0.043 67.986 68.069 fft_wrap_pw1pw2_200 1298 14.3 0.008 0.008 64.446 64.518 qs_vxc_create 117 10.5 0.004 0.004 63.881 63.916 xc_vxc_pw_create 117 11.5 1.404 1.420 63.878 63.912 sum_up_and_integrate 117 10.5 0.003 0.003 52.912 52.991 integrate_v_rspace 117 11.5 0.208 0.208 52.732 52.810 qs_rho_update_rho_low 117 7.9 0.001 0.001 51.858 51.867 calculate_rho_elec 117 8.9 1.178 1.180 51.857 51.866 grid_integrate_task_list 117 12.5 41.797 41.882 41.797 41.882 xc_pw_derive 702 13.5 0.011 0.012 37.811 37.835 xc_rho_set_and_dset_create 117 12.5 0.914 0.915 36.968 37.047 pw_gpu_c1dr3d_3d_ps 1053 15.2 10.380 10.396 36.392 36.426 grid_collocate_task_list 117 9.9 32.747 32.788 32.747 32.788 pw_gpu_r3dc1d_3d_ps 947 14.5 9.420 9.464 31.538 31.589 init_scf_loop 14 6.8 0.000 0.000 25.217 25.217 xc_pw_divergence 117 12.5 0.005 0.005 25.121 25.183 density_rs2pw 117 9.9 0.008 0.008 17.904 17.952 mp_alltoall_z22v 2000 16.9 17.822 17.928 17.822 17.928 dbcsr_multiply_generic 2035 12.5 0.137 0.138 17.490 17.582 xc_functional_eval 117 13.5 0.002 0.002 16.369 16.425 pbe_lda_eval 117 14.5 16.367 16.423 16.367 16.423 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 15.456 15.606 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 15.411 15.411 qs_scf_new_mos 106 7.8 0.001 0.001 12.932 12.943 qs_scf_loop_do_ot 106 8.8 0.001 0.001 12.931 12.942 x_to_yz 1053 16.2 2.434 2.437 11.836 11.842 ot_scf_mini 106 9.8 0.003 0.003 11.567 11.574 potential_pw2rs 117 12.5 0.055 0.055 10.727 10.734 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 10.605 10.605 yz_to_x 947 15.5 1.797 1.798 10.217 10.327 multiply_cannon 2035 13.5 0.297 0.301 8.799 8.818 init_scf_run 11 5.9 0.000 0.000 8.710 8.710 scf_env_initial_rho_setup 11 6.9 0.000 0.000 8.710 8.710 pw_gpu_sf 1053 16.2 8.291 8.299 8.291 8.299 build_core_ppl_forces 11 5.9 7.851 8.046 7.851 8.046 prepare_preconditioner 14 7.8 0.000 0.000 7.990 7.990 make_preconditioner 14 8.8 0.000 0.000 7.989 7.990 multiply_cannon_loop 2035 14.5 0.233 0.236 7.837 7.853 pw_gpu_fg 947 15.5 7.284 7.294 7.284 7.294 make_m2s 4070 13.5 0.042 0.042 7.243 7.243 make_images 4070 14.5 0.954 0.954 7.068 7.069 ot_mini 106 10.8 0.001 0.001 7.023 7.029 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 6.614 6.693 wfi_extrapolate 11 7.9 0.001 0.001 6.032 6.032 pw_gpu_ffc 1053 16.2 5.867 5.900 5.867 5.900 build_kinetic_matrix_low 22 6.9 4.791 4.814 4.872 4.896 build_overlap_matrix_low 22 6.9 4.692 4.707 4.764 4.780 pw_poisson_solve 117 10.5 0.003 0.003 4.580 4.591 pw_gpu_cff 947 15.5 4.556 4.562 4.556 4.562 qs_ot_get_derivative 106 11.8 0.002 0.002 4.336 4.343 transfer_rs2pw 479 10.8 0.009 0.009 4.183 4.314 multiply_cannon_multrec 4070 15.5 1.714 1.731 4.089 4.100 pw_derive 1053 13.8 3.981 3.997 3.981 3.997 make_full_single_inverse 14 9.8 0.002 0.002 3.980 3.981 make_images_data 4070 15.5 0.050 0.051 3.829 3.837 hybrid_alltoall_any 4213 16.4 2.697 2.703 3.819 3.823 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.634 3.707 transfer_rs2pw_200 128 11.7 2.551 2.587 3.475 3.615 make_full_inverse_cholesky 14 9.8 0.000 0.000 3.255 3.402 mp_waitall_1 57459 16.9 3.247 3.279 3.247 3.279 build_core_ppl 11 7.9 3.011 3.088 3.011 3.088 transfer_pw2rs 479 13.4 0.005 0.006 2.990 2.995 ot_diis_step 106 11.8 0.005 0.005 2.667 2.667 pw_copy 1755 13.0 2.631 2.638 2.631 2.638 fft_wrap_pw1pw2_70 234 13.2 0.001 0.002 2.576 2.591 arnoldi_generalized_ev 14 10.8 0.000 0.000 2.486 2.486 dbcsr_sym_matrix_vector_mult 1269 12.5 0.034 0.034 2.452 2.453 transfer_pw2rs_200 128 14.1 1.551 1.563 2.402 2.407 qs_create_task_list 11 7.9 0.000 0.000 2.329 2.334 generate_qs_task_list 11 8.9 1.293 1.301 2.329 2.334 apply_preconditioner_dbcsr 120 12.8 0.000 0.000 2.301 2.308 apply_single 120 13.8 0.001 0.001 2.300 2.308 dbcsr_complete_redistribute 323 11.8 0.863 0.893 2.133 2.305 gev_build_subspace 23 11.5 0.009 0.009 2.292 2.292 dbcsr_sym_matrix_vector_mult_l 1269 13.5 2.133 2.139 2.139 2.145 dbcsr_mm_accdrv_process 9388 16.2 1.050 1.130 2.119 2.123 pw_poisson_set 118 11.5 0.004 0.004 2.088 2.099 calculate_dm_sparse 117 9.7 0.001 0.001 2.045 2.049 qs_ot_get_derivative_taylor 89 12.9 0.003 0.003 1.918 1.922 cp_dbcsr_sm_fm_multiply 46 9.3 0.002 0.002 1.866 1.868 multiply_cannon_sync_h2d 4070 15.5 1.777 1.796 1.777 1.796 pw_integral_ab_c1d_c1d_gs 117 11.5 1.730 1.733 1.752 1.753 qs_ot_get_p 120 10.5 0.001 0.001 1.619 1.622 pw_axpy 1170 12.0 1.557 1.559 1.557 1.559 copy_dbcsr_to_fm 143 10.8 0.004 0.004 1.440 1.482 copy_fm_to_dbcsr 180 10.8 0.002 0.002 1.288 1.433 dbcsr_special_finalize 6105 15.5 0.033 0.033 1.415 1.416 cp_dbcsr_sm_fm_multiply_core 46 10.3 0.000 0.000 1.394 1.394 calculate_rho_core 11 7.9 0.159 0.160 1.252 1.331 dbcsr_merge_single_wm 4070 16.5 0.131 0.133 1.307 1.307 mp_sendrecv_dv 479 12.8 1.146 1.239 1.146 1.239 cp_fm_cholesky_invert 14 10.8 1.217 1.217 1.217 1.217 multiply_cannon_metrocomm1 4070 15.5 0.011 0.012 1.193 1.196 dbcsr_dot 1125 12.2 1.113 1.120 1.176 1.185 calculate_first_density_matrix 1 7.0 0.000 0.000 1.105 1.105 transfer_dbcsr_to_fm 14 10.8 0.001 0.001 0.957 0.992 dbcsr_sort_data 4070 17.5 0.912 0.913 0.912 0.913 dbcsr_finalize 4628 13.9 0.056 0.056 0.884 0.908 transfer_fm_to_dbcsr 14 9.8 0.000 0.000 0.754 0.900 cp_dbcsr_plus_fm_fm_t 22 8.9 0.001 0.001 0.899 0.900 dbcsr_merge_all 4098 15.1 0.175 0.176 0.782 0.805 qs_ot_get_orbitals 106 10.8 0.001 0.001 0.791 0.793 mp_alltoall_d11v 1899 13.8 0.757 0.778 0.757 0.778 dbcsr_copy 7812 13.3 0.193 0.194 0.760 0.774 qs_ot_p2m_diag 19 11.0 0.033 0.033 0.752 0.752 build_core_ppnl_forces 11 5.9 0.736 0.744 0.736 0.744 evaluate_core_matrix_traces 117 8.5 0.001 0.001 0.741 0.742 grid_create_task_list 11 9.9 0.729 0.742 0.729 0.742 calculate_ptrace_kp 234 9.5 0.001 0.001 0.740 0.741 fft_wrap_pw1pw2_30 234 13.2 0.001 0.001 0.671 0.674 cp_fm_cholesky_decompose 28 10.5 0.638 0.673 0.638 0.673 jit_kernel_multiply 8 15.0 0.586 0.667 0.586 0.667 cp_dbcsr_syevd 19 12.0 0.002 0.002 0.635 0.635 cp_fm_uplo_to_full 47 13.4 0.473 0.624 0.473 0.624 make_images_pack 4070 15.5 0.607 0.611 0.620 0.624 cp_fm_diag_elpa 19 13.0 0.000 0.000 0.602 0.603 qs_init_subsys 1 2.0 0.001 0.001 0.603 0.603 cp_fm_diag_elpa_base 19 14.0 0.593 0.595 0.602 0.602 qs_env_setup 1 3.0 0.000 0.000 0.595 0.596 qs_env_rebuild_pw_env 23 5.3 0.000 0.000 0.595 0.596 pw_env_rebuild 1 5.0 0.000 0.000 0.595 0.596 pw_grid_setup 4 6.0 0.000 0.000 0.570 0.571 pw_grid_setup_internal 4 7.0 0.006 0.006 0.560 0.561 transfer_rs2pw_70 117 11.9 0.376 0.377 0.549 0.557 qs_ot_get_derivative_diag 17 12.0 0.001 0.001 0.529 0.531 make_basis_sm 14 9.3 0.001 0.001 0.528 0.528 pw_zero 585 13.0 0.525 0.528 0.525 0.528 dbcsr_copy_into_existing 22 7.9 0.514 0.528 0.515 0.528 mp_sum_d 3821 11.6 0.381 0.517 0.381 0.517 acc_transpose_blocks 4070 15.5 0.022 0.023 0.495 0.497 dbcsr_mm_accdrv_process_sort 9388 17.2 0.483 0.486 0.483 0.486 transfer_pw2rs_70 117 14.5 0.299 0.300 0.457 0.457 pw_grid_sort 4 8.0 0.332 0.333 0.450 0.451 dbcsr_sort_indices 10929 16.5 0.418 0.419 0.418 0.419 mp_sum_l 6134 13.5 0.328 0.399 0.328 0.399 ot_scf_init 14 7.8 0.002 0.002 0.382 0.382 reorthogonalize_vectors 10 9.0 0.000 0.000 0.380 0.381 compute_matrix_w 11 5.9 0.000 0.000 0.377 0.378 calculate_w_matrix_ot 11 6.9 0.003 0.003 0.377 0.378 dbcsr_data_copy_aa2 2343 15.5 0.371 0.376 0.371 0.376 calculate_ecore_overlap 22 5.9 0.001 0.001 0.196 0.353 parallel_gemm_fm_cosma 96 8.9 0.350 0.352 0.350 0.352 mp_alltoall_i22 633 13.6 0.198 0.336 0.198 0.336 dbcsr_desymmetrize_deep 143 11.8 0.093 0.098 0.330 0.331 cp_dbcsr_alloc_block_from_nbl 88 7.7 0.205 0.207 0.320 0.323 build_qs_neighbor_lists 11 6.9 0.001 0.001 0.314 0.314 integrate_v_core_rspace 11 7.9 0.064 0.064 0.295 0.301 dbcsr_add_d 1795 13.1 0.003 0.003 0.299 0.299 dbcsr_add_anytype 1795 14.1 0.162 0.163 0.296 0.296 distribute_tasks 11 9.9 0.292 0.294 0.292 0.294 pw_scale 468 12.0 0.290 0.294 0.290 0.294 setup_rec_index_2d 4070 14.5 0.269 0.270 0.269 0.270 multiply_cannon_multrec_finali 2035 16.5 0.004 0.004 0.257 0.259 dbcsr_mm_multrec_finalize 2035 17.5 0.020 0.021 0.253 0.255 fft_wrap_pw1pw2_10 234 13.2 0.001 0.001 0.249 0.250 pw_multiply_with 117 11.5 0.249 0.250 0.249 0.250 dbcsr_make_untransposed_blocks 2481 13.4 0.229 0.229 0.240 0.241 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="w64PBE", label="w64PBE", y=235.18, yerr=0.0 Plot: name="w64PBE_timings_6cpu_1gpu", title="Timings of w64PBE with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="rest", label="rest", y=116.06700000000001, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=41.797, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=32.747, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=17.822, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="pbe_lda_eval", label="pbe_lda_eval", y=16.367, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.38, yerr=0.0 Running w64SCAN.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/w64SCAN_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.185 0.186 880.509 880.509 qs_mol_dyn_low 1 2.0 0.004 0.004 878.338 878.340 qs_forces 11 3.9 0.002 0.002 878.289 878.289 qs_energies 11 4.9 0.001 0.001 799.393 799.393 scf_env_do_scf 11 5.9 0.001 0.001 762.581 762.582 velocity_verlet 10 3.0 0.002 0.002 697.838 697.856 scf_env_do_scf_inner_loop 106 6.8 0.006 0.009 683.849 683.849 rebuild_ks_matrix 117 8.5 0.001 0.001 619.654 619.654 qs_ks_build_kohn_sham_matrix 117 9.5 0.020 0.020 619.653 619.653 qs_ks_update_qs_env 119 7.8 0.001 0.001 553.918 553.920 fft_wrap_pw1pw2 3053 12.6 0.068 0.069 421.582 422.650 fft_wrap_pw1pw2_400 1649 13.9 0.010 0.010 404.274 405.364 qs_vxc_create 117 10.5 0.004 0.004 374.637 374.642 xc_vxc_pw_create 117 11.5 4.490 4.491 374.633 374.638 xc_rho_set_and_dset_create 117 12.5 5.809 5.828 249.177 250.288 pw_gpu_r3dc1d_3d_ps 1532 14.1 118.875 119.095 210.355 211.423 pw_gpu_c1dr3d_3d_ps 1521 15.1 117.054 117.188 211.141 211.141 qs_rho_update_rho_low 117 7.9 0.001 0.001 207.892 207.898 calculate_rho_elec 234 8.9 6.497 6.505 207.891 207.897 sum_up_and_integrate 117 10.5 0.005 0.005 192.806 192.956 integrate_v_rspace 234 11.5 0.408 0.411 191.943 192.100 xc_pw_derive 702 13.5 0.011 0.011 181.222 182.242 density_rs2pw 234 9.9 0.020 0.021 160.854 160.968 xc_functional_eval 234 13.5 0.003 0.003 149.335 150.418 libxc_lda_eval 234 14.5 149.326 150.409 149.332 150.415 xc_pw_divergence 117 12.5 0.007 0.007 119.710 120.767 grid_integrate_task_list 234 12.5 96.824 96.991 96.824 96.991 potential_pw2rs 234 12.5 0.273 0.279 94.710 94.716 init_scf_loop 13 6.8 0.000 0.000 78.669 78.669 mp_alltoall_z22v 3053 16.6 70.549 71.433 70.549 71.433 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 66.472 66.472 yz_to_x 1532 15.1 7.544 7.556 43.732 44.562 x_to_yz 1521 16.1 8.897 8.903 43.257 43.330 grid_collocate_task_list 234 9.9 40.394 40.502 40.394 40.502 transfer_rs2pw 947 10.9 0.019 0.019 34.926 35.094 pw_gpu_sf 1521 16.1 30.926 30.947 30.926 30.947 transfer_rs2pw_400 245 11.8 25.429 25.431 30.548 30.714 pw_gpu_fg 1532 15.1 30.278 30.282 30.278 30.282 transfer_pw2rs 947 13.5 0.016 0.016 28.569 28.575 transfer_pw2rs_400 245 14.3 20.572 20.621 25.412 25.415 init_scf_run 11 5.9 0.000 0.000 23.692 23.692 scf_env_initial_rho_setup 11 6.9 0.000 0.001 23.692 23.692 pw_gpu_ffc 1521 16.1 19.876 19.915 19.876 19.915 wfi_extrapolate 11 7.9 0.002 0.002 19.406 19.406 dbcsr_multiply_generic 2100 12.6 0.138 0.140 17.982 18.142 pw_gpu_cff 1532 15.1 17.321 17.345 17.321 17.345 pw_poisson_solve 117 10.5 0.003 0.003 17.057 17.073 fft_wrap_pw1pw2_140 468 13.2 0.003 0.003 13.600 13.641 qs_scf_new_mos 106 7.8 0.001 0.001 13.088 13.091 qs_scf_loop_do_ot 106 8.8 0.001 0.001 13.087 13.090 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 12.664 12.664 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 12.316 12.417 pw_derive 1053 13.8 12.055 12.063 12.055 12.063 ot_scf_mini 106 9.8 0.003 0.003 11.719 11.720 pw_copy 2223 13.1 9.008 9.016 9.008 9.016 multiply_cannon 2100 13.6 0.289 0.291 8.962 8.972 mp_waitall_1 59747 17.0 8.137 8.198 8.137 8.198 pw_integral_ab_c1d_c1d_gs 117 11.5 8.017 8.065 8.141 8.152 multiply_cannon_loop 2100 14.6 0.237 0.239 7.983 7.988 prepare_preconditioner 13 7.8 0.000 0.000 7.818 7.822 make_preconditioner 13 8.8 0.000 0.000 7.818 7.822 make_m2s 4200 13.6 0.041 0.042 7.201 7.209 ot_mini 106 10.8 0.001 0.001 7.104 7.105 make_images 4200 14.6 0.942 0.944 7.029 7.037 qs_env_update_s_mstruct 11 6.9 0.000 0.000 6.818 6.832 pw_poisson_set 118 11.5 0.005 0.005 6.600 6.616 mp_sendrecv_dv 947 12.9 6.261 6.409 6.261 6.409 pw_axpy 1638 11.7 5.891 5.893 5.891 5.893 build_core_ppl_forces 11 5.9 5.724 5.809 5.724 5.809 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 5.522 5.575 calculate_rho_core 11 7.9 0.424 0.426 4.883 4.940 qs_ot_get_derivative 106 11.8 0.002 0.002 4.411 4.411 multiply_cannon_multrec 4200 15.6 1.744 1.752 4.213 4.222 build_kinetic_matrix_low 22 6.9 4.139 4.146 4.213 4.219 build_overlap_matrix_low 22 6.9 4.048 4.056 4.115 4.124 make_full_single_inverse 13 9.8 0.002 0.002 3.828 3.830 hybrid_alltoall_any 4338 16.5 2.659 2.661 3.814 3.818 make_images_data 4200 15.6 0.051 0.051 3.800 3.800 transfer_rs2pw_140 234 11.9 2.771 2.791 3.679 3.699 make_full_inverse_cholesky 13 9.8 0.000 0.000 3.256 3.391 fft_wrap_pw1pw2_50 468 13.2 0.003 0.003 2.782 2.857 ot_diis_step 106 11.8 0.005 0.005 2.674 2.674 transfer_pw2rs_140 234 14.5 1.642 1.645 2.543 2.546 arnoldi_generalized_ev 13 10.8 0.000 0.000 2.407 2.408 build_core_ppl 11 7.9 2.369 2.405 2.369 2.405 dbcsr_sym_matrix_vector_mult 1206 12.5 0.033 0.033 2.374 2.375 dbcsr_complete_redistribute 312 11.8 0.922 0.923 2.170 2.311 apply_preconditioner_dbcsr 119 12.8 0.000 0.000 2.262 2.269 apply_single 119 13.8 0.001 0.001 2.262 2.269 gev_build_subspace 22 11.5 0.009 0.009 2.224 2.224 dbcsr_mm_accdrv_process 9484 16.3 0.807 1.145 2.213 2.215 pw_zero 702 12.6 2.106 2.107 2.106 2.107 calculate_dm_sparse 117 9.7 0.001 0.001 2.091 2.092 dbcsr_sym_matrix_vector_mult_l 1206 13.5 2.045 2.046 2.051 2.052 qs_ot_get_derivative_taylor 89 12.9 0.004 0.004 2.033 2.036 cp_dbcsr_sm_fm_multiply 45 9.4 0.002 0.002 1.922 1.923 qs_init_subsys 1 2.0 0.001 0.001 1.910 1.910 qs_env_setup 1 3.0 0.000 0.000 1.903 1.904 qs_env_rebuild_pw_env 23 5.3 0.000 0.000 1.903 1.904 pw_env_rebuild 1 5.0 0.000 0.000 1.903 1.903 pw_grid_setup 4 6.0 0.000 0.000 1.840 1.841 pw_grid_setup_internal 4 7.0 0.018 0.018 1.810 1.811 multiply_cannon_sync_h2d 4200 15.6 1.779 1.781 1.779 1.781 qs_create_task_list 11 7.9 0.000 0.000 1.698 1.742 generate_qs_task_list 11 8.9 0.871 0.879 1.698 1.742 qs_ot_get_p 119 10.6 0.001 0.001 1.669 1.673 copy_dbcsr_to_fm 138 10.8 0.004 0.004 1.586 1.600 pw_grid_sort 4 8.0 1.094 1.096 1.492 1.493 cp_dbcsr_sm_fm_multiply_core 45 10.4 0.000 0.000 1.446 1.447 dbcsr_special_finalize 6300 15.6 0.033 0.033 1.412 1.415 copy_fm_to_dbcsr 174 10.8 0.002 0.002 1.264 1.398 dbcsr_merge_single_wm 4200 16.6 0.130 0.132 1.301 1.302 integrate_v_core_rspace 11 7.9 0.145 0.146 1.292 1.295 jit_kernel_multiply 11 15.1 0.910 1.247 0.910 1.247 calculate_first_density_matrix 1 7.0 0.000 0.000 1.237 1.238 dbcsr_dot 1134 12.2 1.115 1.120 1.190 1.199 multiply_cannon_metrocomm1 4200 15.6 0.011 0.012 1.183 1.189 cp_fm_cholesky_invert 13 10.8 1.152 1.152 1.152 1.152 transfer_dbcsr_to_fm 13 10.8 0.001 0.001 1.114 1.122 pw_scale 585 11.9 1.087 1.092 1.087 1.092 mp_sum_d 3885 11.5 0.786 0.997 0.786 0.997 cp_dbcsr_plus_fm_fm_t 22 8.9 0.001 0.001 0.944 0.945 dbcsr_finalize 4788 14.0 0.060 0.060 0.904 0.913 dbcsr_sort_data 4200 17.6 0.904 0.905 0.904 0.905 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="w64SCAN", label="w64SCAN", y=880.509, yerr=0.0 Plot: name="w64SCAN_timings_6cpu_1gpu", title="Timings of w64SCAN with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="rest", label="rest", y=327.881, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="libxc_lda_eval", label="libxc_lda_eval", y=149.326, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=118.875, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=117.054, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=96.824, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=70.549, yerr=0.0 Running GW_PBE_4benzene.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.017 0.019 108.026 108.026 qs_energies 1 2.0 0.000 0.000 107.714 107.721 mp2_main 1 3.0 0.000 0.000 99.169 99.176 mp2_gpw_main 1 4.0 0.000 0.000 97.619 97.626 rpa_ri_compute_en 1 5.0 0.000 0.000 90.740 90.747 rpa_num_int 1 6.0 0.001 0.001 90.732 90.739 dbt_total 2336 9.6 0.021 0.021 72.314 72.315 compute_mat_P_omega 1 7.0 0.001 0.002 67.866 67.873 compute_mat_P_omega_contract 10 8.0 5.010 5.030 67.549 67.556 dbt_contract 787 11.0 0.047 0.047 47.502 47.502 dbt_tas_total 1149 12.2 0.142 0.142 37.019 37.019 dbt_tas_multiply 807 12.1 0.003 0.003 36.339 36.340 dbt_tas_dbm 807 14.1 0.005 0.006 28.135 28.135 dbm_multiply 807 16.1 26.654 26.747 26.654 26.747 dbt_copy 1107 10.7 0.067 0.068 25.454 25.495 compute_mat_P_omega_calc_M_occ 250 9.0 5.066 5.075 23.717 23.717 dbt_tas_mm_1N 524 15.1 0.003 0.003 17.764 17.969 dbt_reshape 594 11.8 6.876 7.122 16.860 16.919 compute_QP_energies 1 7.0 0.000 0.000 15.709 15.709 compute_self_energy_cubic_gw 1 8.0 0.117 0.120 15.708 15.709 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 14.911 14.912 dbt_tas_reserve_blocks_index 3266 14.3 0.634 0.640 10.823 10.940 dbm_reserve_blocks 3634 15.3 10.495 10.614 10.495 10.614 dbt_reserve_blocks_index 2347 13.0 0.313 0.320 9.114 9.131 dbt_reserve_blocks_index_array 2289 12.1 0.012 0.013 8.922 8.925 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 8.907 8.907 dbt_crop 1042 12.0 6.522 6.547 8.738 8.784 mp_waitall_2 2656 15.9 8.239 8.295 8.239 8.295 dbt_tas_mm_2 251 15.0 0.003 0.003 7.633 7.633 dbt_communicate_buffer 594 12.8 0.012 0.012 7.426 7.480 contract_cubic_gw 21 9.0 0.000 0.000 7.229 7.229 scf_env_do_scf 1 3.0 0.000 0.000 7.155 7.155 scf_env_do_scf_inner_loop 17 4.0 0.001 0.001 7.155 7.155 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 6.869 6.869 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 5.326 5.356 compute_mat_P_omega_copy_M_occ 250 9.0 0.002 0.002 5.134 5.160 dbcsr_multiply_generic 30 8.1 0.003 0.003 4.345 4.401 dbt_tas_copy 511 11.5 2.390 2.397 4.221 4.334 multiply_cannon 30 9.1 0.007 0.010 4.153 4.208 multiply_cannon_loop 30 10.1 0.004 0.004 4.100 4.154 qs_ks_build_kohn_sham_matrix 18 6.9 0.002 0.002 3.720 3.733 qs_ks_update_qs_env 17 5.0 0.000 0.000 3.690 3.703 rebuild_ks_matrix 17 6.0 0.000 0.000 3.683 3.696 multiply_cannon_multrec 60 11.1 0.247 0.263 3.560 3.592 trace_sigma_gw 21 9.0 0.515 0.556 3.412 3.412 dbcsr_mm_accdrv_process 328 12.3 0.022 0.023 3.057 3.066 jit_kernel_multiply 17 11.6 3.029 3.037 3.029 3.037 qs_scf_new_mos 17 5.0 0.000 0.000 2.916 2.935 dbt_split_copyback 70 10.6 1.235 1.252 2.888 2.900 mp_sync 8688 11.6 2.841 2.878 2.841 2.878 get_2c_integrals 1 6.0 0.000 0.000 2.682 2.682 convert_to_new_pgrid 2421 14.1 0.035 0.035 2.407 2.430 dbm_copy 1614 15.1 2.372 2.395 2.372 2.395 mp2_ri_gpw_compute_in_copy_3c 6 6.0 0.221 0.226 2.229 2.380 fft_wrap_pw1pw2 301 10.2 0.005 0.005 2.349 2.357 compute_W_cubic_GW 10 7.0 0.003 0.004 2.217 2.219 parallel_gemm_fm_cosma 105 8.4 2.192 2.204 2.192 2.204 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="GW_PBE_4benzene", label="GW_PBE_4benzene", y=108.026, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_6cpu_1gpu", title="Timings of GW_PBE_4benzene with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="rest", label="rest", y=49.239999999999995, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=26.654, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=10.495, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=8.239, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=6.876, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_crop", label="dbt_crop", y=6.522, yerr=0.0 Running RI-HFX_H2O-32.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.023 0.023 209.453 209.453 qs_forces 1 2.0 0.000 0.000 209.014 209.014 rebuild_ks_matrix 7 6.6 0.000 0.000 204.252 204.252 qs_ks_build_kohn_sham_matrix 7 7.6 0.002 0.002 204.252 204.252 hfx_ks_matrix 7 8.6 0.000 0.000 198.415 198.423 dbt_total 849 11.0 0.009 0.009 148.623 148.623 hfx_ri_update_ks 7 9.6 0.000 0.000 113.233 113.233 hfx_ri_update_ks_Pmat 7 10.6 22.296 22.420 113.228 113.228 qs_energies 1 3.0 0.000 0.000 111.403 111.403 scf_env_do_scf 1 4.0 0.000 0.000 108.714 108.714 qs_ks_update_qs_env 8 6.0 0.000 0.000 106.683 106.684 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 97.576 97.576 dbt_contract 207 12.4 0.053 0.053 85.607 85.608 hfx_ri_update_forces 1 7.0 1.008 1.010 85.180 85.188 dbt_tas_total 369 13.4 0.080 0.081 69.355 69.355 dbt_tas_multiply 216 13.5 0.001 0.001 66.436 66.437 dbt_copy 423 11.8 0.046 0.048 57.692 58.515 scf_env_do_scf_inner_loop 6 5.0 0.000 0.001 55.036 55.036 init_scf_loop 2 5.0 0.000 0.000 53.676 53.676 dbt_tas_dbm 216 15.5 0.002 0.002 52.302 52.302 hfx_ri_forces_Pmat_3c 1 8.0 3.626 3.659 49.380 49.424 dbm_multiply 216 17.5 48.964 48.977 48.964 48.977 dbt_reshape 175 13.2 19.693 19.999 44.168 44.661 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 33.236 33.236 precalc_derivatives 1 8.0 1.878 1.890 29.676 29.676 mp_waitall_2 1022 16.5 22.736 22.750 22.736 22.750 dbt_tas_mm_2 91 16.5 0.001 0.001 22.081 22.081 dbt_crop 372 13.7 14.431 14.738 18.525 19.029 dbt_communicate_buffer 175 14.2 0.004 0.005 18.838 18.856 dbt_tas_reserve_blocks_index 1323 15.4 1.662 1.667 18.140 18.489 dbm_reserve_blocks 1491 16.3 17.104 17.454 17.104 17.454 build_3c_derivatives 3 9.0 2.656 2.822 16.851 16.852 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 16.806 16.807 dbt_tas_mm_3T 77 17.1 0.000 0.001 16.412 16.673 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 16.088 16.088 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 15.738 15.738 dbt_reserve_blocks_index 889 14.5 0.625 0.629 14.720 14.798 dbt_reserve_blocks_index_array 859 13.5 0.008 0.008 14.438 14.509 dbt_tas_mm_3N 37 15.4 0.000 0.000 11.183 11.333 dbt_tas_copy 248 12.5 4.551 4.578 8.358 8.649 mp_sync 2901 12.8 7.774 8.427 7.774 8.427 hfx_ri_pre_scf_Pmat_int 1 13.0 0.000 0.000 5.200 5.200 dbt_tas_replicate 168 15.1 2.244 2.245 4.984 4.987 hfx_ri_pre_scf_calc_tensors 1 14.0 0.003 0.003 4.473 4.474 hfx_ri_pre_scf_Pmat_copy_2 9 13.0 1.654 1.661 4.365 4.372 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-HFX_H2O-32", label="RI-HFX_H2O-32", y=209.453, yerr=0.0 Plot: name="RI-HFX_H2O-32_timings_6cpu_1gpu", title="Timings of RI-HFX_H2O-32 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="rest", label="rest", y=78.66, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=48.964, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=22.736, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=22.296, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=19.693, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=17.104, yerr=0.0 Running RI-MP2_ammonia.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.010 0.011 102.944 102.944 qs_energies 1 2.0 0.000 0.000 102.766 102.766 mp2_main 1 3.0 0.000 0.001 93.902 93.902 mp2_gpw_main 1 4.0 0.001 0.001 93.541 93.541 mp2_ri_gpw_compute_in 1 5.0 0.543 0.544 50.494 50.512 mp2_ri_gpw_compute_en 1 5.0 0.096 0.097 42.988 43.005 mp2_ri_gpw_compute_in_loop 1 6.0 0.014 0.015 42.645 42.662 mp2_ri_gpw_compute_en_RI_loop 1 6.0 12.808 12.837 40.368 40.368 dbcsr_multiply_generic 2666 8.0 0.159 0.159 21.147 21.457 ao_to_mo_and_store_B_mult_1 1328 7.0 0.015 0.015 20.285 20.596 mp2_eri_3c_integrate_gpw 1328 7.0 0.019 0.019 16.650 17.040 mp2_ri_gpw_compute_en_expansio 1040 7.0 0.721 0.722 16.211 16.295 local_gemm 1040 8.0 15.491 15.576 15.491 15.576 make_m2s 5332 9.0 0.055 0.056 12.263 12.345 make_images 5332 10.0 2.115 2.128 12.081 12.163 integrate_v_rspace 1338 8.0 0.993 1.002 10.125 10.290 multiply_cannon 2666 9.0 0.387 0.411 8.236 8.627 hybrid_alltoall_any 6683 11.6 8.130 8.219 8.399 8.485 make_images_data 5332 11.0 0.068 0.070 8.309 8.393 grid_integrate_task_list 1338 9.0 7.854 8.011 7.854 8.011 fft_wrap_pw1pw2 26668 10.4 0.142 0.148 7.435 7.652 multiply_cannon_loop 2666 10.0 0.191 0.193 7.162 7.527 get_2c_integrals 1 6.0 0.004 0.004 7.298 7.305 scf_env_do_scf 1 3.0 0.000 0.000 7.202 7.203 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 7.202 7.203 collocate_function 1328 8.0 4.850 4.898 6.831 7.076 compute_2c_integrals 1 7.0 0.006 0.007 6.756 6.757 compute_2c_integrals_loop_lm 1 8.0 0.013 0.021 6.545 6.583 mp2_eri_2c_integrate_gpw 1 9.0 1.948 1.989 6.532 6.578 mp2_ri_gpw_compute_en_comm 221 7.0 1.009 1.010 5.533 5.579 ao_to_mo_and_store_B_E_Ex_1 1328 7.0 3.667 3.705 5.483 5.548 mp2_ri_gpw_compute_en_ener 1040 7.0 4.684 4.685 4.684 4.685 fft_wrap_pw1pw2_20 10647 11.4 0.022 0.023 4.311 4.529 qs_scf_new_mos 10 5.0 0.000 0.000 4.034 4.058 multiply_cannon_multrec 2676 11.0 1.648 1.844 3.654 3.857 pw_gpu_r3dc1d_3d 13282 12.2 3.636 3.790 3.636 3.790 mp_sendrecv_dm3 442 8.0 3.536 3.577 3.536 3.577 eigensolver 11 5.8 0.001 0.001 2.767 2.769 pw_gpu_c1dr3d_3d 13280 12.7 2.626 2.695 2.626 2.695 potential_pw2rs 2666 10.0 0.097 0.100 2.603 2.662 qs_ks_update_qs_env 10 5.0 0.000 0.000 2.298 2.322 rebuild_ks_matrix 10 6.0 0.000 0.000 2.282 2.306 qs_ks_build_kohn_sham_matrix 10 7.0 0.001 0.002 2.282 2.306 copy_dbcsr_to_fm 1351 8.0 0.035 0.036 2.240 2.290 collocate_single_gaussian 1328 10.0 0.092 0.094 2.192 2.255 replicate_iaK_2intgroup 1 6.0 2.070 2.088 2.213 2.233 fft_wrap_pw1pw2_10 15957 11.5 0.020 0.020 2.222 2.225 mp2_eri_2c_integrate_gpw_pot_l 1328 10.0 0.004 0.004 2.078 2.145 cp_fm_diag_elpa 11 6.8 0.000 0.000 2.142 2.142 cp_fm_diag_elpa_base 11 7.8 2.058 2.075 2.140 2.141 fill_local_i_aL 884 7.5 2.119 2.133 2.119 2.133 sum_up_and_integrate 10 8.0 0.000 0.000 2.101 2.126 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-MP2_ammonia", label="RI-MP2_ammonia", y=102.944, yerr=0.0 Plot: name="RI-MP2_ammonia_timings_6cpu_1gpu", title="Timings of RI-MP2_ammonia with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="rest", label="rest", y=53.811, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="local_gemm", label="local_gemm", y=15.491, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=12.808, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=8.13, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.854, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="collocate_function", label="collocate_function", y=4.85, yerr=0.0 Running diag_cu144_broy.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/diag_cu144_broy_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.078 0.079 257.473 257.473 qs_energies 1 2.0 0.000 0.000 256.398 256.398 scf_env_do_scf 1 3.0 0.000 0.000 243.142 243.142 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 243.142 243.142 qs_ks_update_qs_env 15 5.0 0.000 0.000 161.846 161.892 rebuild_ks_matrix 15 6.0 0.000 0.000 161.643 161.690 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 161.643 161.690 sum_up_and_integrate 15 8.0 0.000 0.000 99.233 99.270 integrate_v_rspace 15 9.0 0.044 0.044 99.208 99.245 grid_integrate_task_list 15 10.0 92.500 92.547 92.500 92.547 qs_vxc_create 15 8.0 0.042 0.084 60.908 60.910 fft_wrap_pw1pw2 1086 10.0 0.026 0.027 52.108 53.149 calculate_dispersion_nonloc 15 9.0 11.184 11.744 52.620 52.653 qs_scf_new_mos 15 5.0 0.000 0.000 51.932 51.984 eigensolver 15 6.0 0.002 0.002 41.967 42.032 fft_wrap_pw1pw2_150 765 11.0 0.005 0.005 27.975 28.969 qs_rho_update_rho_low 16 5.0 0.000 0.000 27.502 27.504 calculate_rho_elec 16 6.0 0.173 0.174 27.502 27.504 pw_gpu_c1dr3d_3d_ps 585 12.1 5.441 5.497 26.739 26.851 pw_gpu_r3dc1d_3d_ps 501 11.9 5.015 5.109 25.336 26.265 cp_fm_diag_elpa 15 7.0 0.000 0.000 25.693 25.698 cp_fm_diag_elpa_base 15 8.0 23.912 24.458 25.688 25.689 grid_collocate_task_list 16 7.0 16.579 16.608 16.579 16.608 cp_fm_cholesky_restore 45 7.0 14.423 15.113 14.423 15.113 fft_wrap_pw1pw2_200 197 11.3 0.001 0.001 12.741 12.770 density_rs2pw 16 7.0 0.001 0.002 10.736 10.774 mp_alltoall_z22v 1086 14.0 9.845 10.767 9.845 10.767 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 9.360 9.360 vdW_energy 15 10.0 9.188 9.262 9.188 9.262 pw_gpu_ffc 585 13.1 8.633 8.707 8.633 8.707 xc_vxc_pw_create 15 9.0 0.174 0.175 8.246 8.252 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 8.097 8.150 pw_gpu_cff 501 12.9 8.080 8.081 8.080 8.081 copy_dbcsr_to_fm 16 5.9 0.001 0.001 6.793 6.854 yz_to_x 501 12.9 0.863 0.877 5.791 6.763 pw_gpu_sf 585 13.1 6.712 6.717 6.712 6.717 potential_pw2rs 15 10.0 0.006 0.007 6.663 6.672 pw_gpu_fg 501 12.9 6.397 6.447 6.397 6.447 x_to_yz 585 13.1 1.003 1.015 5.920 5.944 dbcsr_complete_redistribute 46 8.3 1.678 1.740 5.729 5.813 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="diag_cu144_broy", label="diag_cu144_broy", y=257.473, yerr=0.0 Plot: name="diag_cu144_broy_timings_6cpu_1gpu", title="Timings of diag_cu144_broy with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="rest", label="rest", y=98.875, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=92.5, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=23.912, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=16.579, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=14.423, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="calculate_dispersion_nonloc", label="calculate_dispersion_nonloc", y=11.184, yerr=0.0 Running bench_dftb.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/bench_dftb_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 1.980 1.990 155.751 155.751 qs_energies 1 2.0 0.000 0.000 153.682 153.684 ls_scf 1 3.0 0.000 0.000 146.673 146.676 ls_scf_main 1 4.0 0.000 0.001 135.519 135.519 density_matrix_trs4 5 5.0 0.003 0.004 108.268 108.282 dbcsr_multiply_generic 95 6.2 0.161 0.162 94.184 94.220 multiply_cannon 95 7.2 1.416 1.768 66.323 66.613 multiply_cannon_loop 95 8.2 0.162 0.164 55.770 56.046 multiply_cannon_multrec 190 9.2 42.738 42.761 47.793 47.867 ls_scf_dm_to_ks 5 5.0 0.000 0.000 25.467 25.468 make_m2s 190 7.2 0.015 0.015 23.513 23.533 make_images 190 8.2 5.133 5.426 22.975 22.996 matrix_ls_to_qs 5 6.0 0.000 0.000 16.826 16.859 dbcsr_complete_redistribute 11 7.5 10.115 10.164 14.382 14.440 matrix_decluster 5 7.0 0.000 0.000 13.118 13.165 arnoldi_extremal 6 6.2 0.000 0.000 10.811 10.812 arnoldi_normal_ev 6 7.2 0.005 0.005 10.811 10.812 qs_ks_update_qs_env 6 6.2 0.000 0.000 10.565 10.596 build_subspace 12 8.2 0.029 0.029 10.587 10.588 rebuild_ks_matrix 6 7.2 0.000 0.000 10.287 10.288 build_dftb_ks_matrix 6 8.2 0.001 0.001 10.287 10.288 make_images_data 190 9.2 0.006 0.006 9.759 10.085 build_dftb_coulomb 6 9.2 0.745 0.746 10.007 10.008 hybrid_alltoall_any 201 10.0 6.483 6.587 9.380 9.708 ls_scf_init_scf 1 4.0 0.000 0.000 9.506 9.507 dbcsr_matrix_vector_mult 310 9.0 0.068 0.068 9.474 9.500 dbcsr_matrix_vector_mult_local 310 10.0 9.013 9.038 9.016 9.042 tb_ewald_overlap 6 10.2 8.829 9.017 8.829 9.017 calculate_norms 380 9.2 7.510 7.705 7.510 7.705 ls_scf_init_matrix_S 1 5.0 0.000 0.000 7.549 7.554 dbcsr_finalize 277 7.6 0.104 0.106 7.252 7.295 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 6.952 6.953 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.000 6.873 6.873 dbcsr_merge_all 247 8.6 1.324 1.368 6.654 6.681 build_qs_neighbor_lists 1 4.0 0.000 0.000 6.379 6.443 build_neighbor_lists_sab_tbe 1 5.0 6.196 6.254 6.196 6.254 dbcsr_data_new 3509 9.3 4.402 4.858 4.402 4.858 dbcsr_copy 443 8.0 0.936 0.943 4.654 4.676 dbcsr_special_finalize 285 9.2 0.005 0.005 4.652 4.661 setup_rec_index_2d 190 8.2 4.600 4.644 4.600 4.644 dbcsr_sort_indices 643 10.1 4.358 4.359 4.358 4.359 dbcsr_add_d 130 6.0 0.001 0.001 4.089 4.122 dbcsr_add_anytype 130 7.0 1.776 1.779 4.088 4.122 dbcsr_mm_accdrv_process 8119 10.0 0.408 0.464 4.040 4.075 dbcsr_mm_multrec_init 95 8.2 0.000 0.000 3.593 3.900 dbcsr_mm_csr_init 95 9.2 0.005 0.005 3.593 3.900 dbcsr_dot 66 6.3 3.698 3.701 3.859 3.875 dbcsr_mm_sched_init 95 10.2 0.000 0.000 3.565 3.870 dbcsr_mm_accdrv_init 95 11.2 0.310 0.455 3.564 3.870 dbcsr_copy_into_existing 5 8.0 3.707 3.722 3.707 3.722 dbcsr_mm_accdrv_process_sort 8119 11.0 3.578 3.611 3.578 3.611 tree_to_linear_d 11 10.5 3.438 3.441 3.438 3.441 mp_waitall_1 2666 10.6 3.158 3.381 3.158 3.381 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="bench_dftb", label="bench_dftb", y=155.751, yerr=0.0 Plot: name="bench_dftb_timings_6cpu_1gpu", title="Timings of bench_dftb with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="rest", label="rest", y=77.546, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=42.738, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=10.115, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=9.013, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="tb_ewald_overlap", label="tb_ewald_overlap", y=8.829, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="calculate_norms", label="calculate_norms", y=7.51, yerr=0.0 Running dbcsr.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/dbcsr_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.004 0.005 47.647 47.647 lib_test 1 2.0 0.000 0.000 47.641 47.642 dbcsr_run_tests 3 3.0 0.000 0.000 47.640 47.641 test_multiplies_multiproc 3 4.0 0.001 0.001 36.968 36.969 dbcsr_multiply_generic 9 5.0 0.002 0.002 28.615 28.630 multiply_cannon 9 6.0 0.312 0.427 18.693 19.337 multiply_cannon_loop 9 7.0 0.003 0.003 17.194 17.594 multiply_cannon_multrec 18 8.0 9.188 9.435 15.902 16.295 dbcsr_make_random_matrix 9 4.0 7.348 7.352 10.530 10.531 dbcsr_finalize 27 5.7 0.001 0.001 7.217 7.230 dbcsr_merge_all 18 6.5 3.564 3.580 7.111 7.123 dbcsr_mm_accdrv_process 8199 9.0 1.177 1.237 6.483 6.636 dbcsr_redistribute 9 5.0 3.485 3.488 5.828 5.839 make_m2s 18 6.0 0.001 0.001 4.982 4.984 make_images 18 7.0 0.345 0.349 4.946 4.948 dbcsr_mm_accdrv_process_sort 8199 10.0 4.471 4.537 4.471 4.537 make_images_data 18 8.0 0.001 0.001 2.942 2.947 hybrid_alltoall_any 18 9.0 2.441 2.445 2.902 2.907 mp_alltoall_d11v 27 6.0 2.077 2.079 2.077 2.079 tree_to_linear_d 9 7.0 1.823 1.831 1.823 1.831 dbcsr_data_copy_aa2 18 7.5 1.590 1.592 1.590 1.592 dbcsr_data_release 507 7.7 1.380 1.386 1.380 1.386 mp_sum_l 61 4.9 0.652 1.281 0.652 1.281 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.651 1.280 dbcsr_data_new 354 7.4 0.957 1.080 0.957 1.080 dbcsr_checksum 6 5.0 0.985 0.996 1.001 1.001 jit_kernel_multiply 5 10.0 0.835 0.982 0.835 0.982 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="dbcsr", label="dbcsr", y=47.647, yerr=0.0 Plot: name="dbcsr_timings_6cpu_1gpu", title="Timings of dbcsr with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="rest", label="rest", y=19.590999999999998, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=9.188, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=7.348, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_mm_accdrv_process_sort", label="dbcsr_mm_accdrv_process_sort", y=4.471, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.564, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_redistribute", label="dbcsr_redistribute", y=3.485, yerr=0.0 Running MQAE_single_node.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/MQAE_single_node_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.046 0.046 205.281 205.281 qs_mol_dyn_low 1 2.0 0.004 0.004 203.770 203.802 qs_forces 6 3.8 0.001 0.001 131.494 131.494 qs_energies 6 4.8 0.000 0.000 124.401 124.401 scf_env_do_scf 6 5.8 0.000 0.000 117.074 117.074 scf_env_do_scf_inner_loop 113 6.2 0.005 0.007 108.636 108.636 velocity_verlet 5 3.0 0.003 0.003 95.553 95.600 rebuild_ks_matrix 119 8.1 0.000 0.000 91.736 91.737 qs_ks_build_kohn_sham_matrix 119 9.1 0.018 0.019 91.736 91.736 qs_ks_update_qs_env 119 7.3 0.001 0.001 86.739 86.740 fft_wrap_pw1pw2 2059 12.4 0.040 0.042 68.691 68.713 fft_wrap_pw1pw2_150 1321 13.9 0.008 0.008 65.905 66.018 qs_vxc_create 119 10.1 0.003 0.004 54.946 54.946 xc_vxc_pw_create 119 11.1 1.471 1.479 54.943 54.943 qmmm_el_coupling 6 3.8 0.000 0.000 38.814 38.825 qmmm_elec_with_gaussian 6 4.8 0.021 0.021 38.808 38.820 xc_pw_derive 714 13.1 0.009 0.009 38.558 38.644 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 37.213 37.539 pw_gpu_c1dr3d_3d_ps 1095 14.8 10.469 10.515 37.055 37.125 qmmm_elec_gaussian_low_G 6 6.8 32.450 32.626 32.450 32.626 pw_gpu_r3dc1d_3d_ps 964 14.0 9.412 9.448 31.584 31.678 qmmm_forces 6 3.8 0.001 0.001 30.779 30.779 qmmm_forces_with_gaussian 6 4.8 0.022 0.022 30.015 30.050 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 28.712 28.744 xc_rho_set_and_dset_create 119 12.1 2.360 2.375 27.444 27.457 xc_pw_divergence 119 12.1 0.005 0.005 25.634 25.649 qs_rho_update_rho_low 119 7.3 0.001 0.001 23.996 24.221 calculate_rho_elec 119 8.3 1.056 1.058 23.995 24.220 qmmm_forces_gaussian_low_G 6 6.8 23.879 23.884 23.879 23.884 sum_up_and_integrate 119 10.1 0.002 0.002 20.636 20.668 integrate_v_rspace 119 11.1 0.020 0.021 20.459 20.490 mp_alltoall_z22v 2059 16.4 17.259 17.281 17.259 17.281 density_rs2pw 119 9.3 0.007 0.007 16.913 17.202 x_to_yz 1095 15.8 2.271 2.278 11.659 11.678 dbcsr_multiply_generic 2598 12.3 0.097 0.099 10.945 11.145 grid_integrate_task_list 119 12.1 10.310 10.340 10.310 10.340 potential_pw2rs 119 12.1 0.033 0.033 10.127 10.129 yz_to_x 964 15.0 1.764 1.765 9.634 9.641 multiply_cannon 2598 13.3 0.217 0.221 9.298 9.548 qs_ks_ddapc 119 10.1 0.002 0.002 9.090 9.097 multiply_cannon_loop 2598 14.3 0.244 0.248 8.825 9.073 pw_gpu_sf 1095 15.8 8.606 8.608 8.606 8.608 init_scf_loop 6 6.8 0.000 0.000 8.434 8.435 pw_gpu_fg 964 15.0 7.591 7.767 7.591 7.767 qs_scf_new_mos 113 7.2 0.001 0.001 6.844 6.845 qs_scf_loop_do_ot 113 8.2 0.001 0.001 6.843 6.844 ot_scf_mini 113 9.2 0.002 0.002 6.564 6.566 multiply_cannon_multrec 5196 15.3 3.160 3.210 6.497 6.540 pw_gpu_ffc 1095 15.8 6.304 6.309 6.304 6.309 grid_collocate_task_list 119 9.3 6.003 6.068 6.003 6.068 init_scf_run 6 5.8 0.000 0.000 5.233 5.234 scf_env_initial_rho_setup 6 6.8 0.000 0.000 5.233 5.233 xc_functional_eval 238 13.1 0.003 0.003 5.055 5.069 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 5.027 5.027 pw_gpu_cff 964 15.0 4.881 4.933 4.881 4.933 qmmm_elec_gaussian_low_R 6 6.8 0.000 0.000 4.763 4.913 qmmm_elec_with_gaussian_LG 6 7.8 4.763 4.913 4.763 4.913 qmmm_forces_gaussian_low_R 6 6.8 0.000 0.000 4.833 4.870 qmmm_forces_with_gaussian_LG 6 7.8 4.832 4.870 4.832 4.870 pw_poisson_solve 125 9.9 0.003 0.003 4.678 4.689 ot_mini 113 10.2 0.001 0.001 4.597 4.597 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="MQAE_single_node", label="MQAE_single_node", y=205.281, yerr=0.0 Plot: name="MQAE_single_node_timings_6cpu_1gpu", title="Timings of MQAE_single_node with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="rest", label="rest", y=110.914, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=32.45, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=23.879, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=17.259, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.469, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=10.31, yerr=0.0 Summary: Performance test took 40 minutes. Status: OK ---> Removed intermediate container 2e663fb821a1 ---> 4c706b73c4e2 Step 45/46 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in f9cad16e5bf6 ---> Removed intermediate container f9cad16e5bf6 ---> 130aa9c67e73 Step 46/46 : ENTRYPOINT [] ---> Running in 51349079d907 ---> Removed intermediate container 51349079d907 ---> 091645ac52f2 [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built 091645ac52f2 Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2026-06-23 23:16:23+00:00