StartDate: 2026-07-03 06:12:30+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: 1f9fd2c1651405c53a36d4bba30698629e83e363 CommitTime: 2026-07-02 20:29:31 +0200 CommitAuthor: Frederick Stein CommitSubject: Migrate development-related wiki pages (#5503) #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=1f9fd2c1651405c53a36d4bba30698629e83e363 SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 421.1MB Step 1/46 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 02cb0e091e33: Waiting 9c3d619183d2: Waiting 7f7602a82106: Waiting 5a2aba542b08: Waiting 2cd52cbb1ebe: Waiting 6e8af4fd0a07: Waiting 15a17189b2df: Waiting 6cb9b761b877: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 2cd52cbb1ebe: Verifying Checksum 2cd52cbb1ebe: Download complete 6e8af4fd0a07: Verifying Checksum 6e8af4fd0a07: Download complete 02cb0e091e33: Verifying Checksum 02cb0e091e33: Download complete 9c3d619183d2: Verifying Checksum 9c3d619183d2: Download complete 7f7602a82106: Verifying Checksum 7f7602a82106: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/46 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/46 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/46 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/46 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/46 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/46 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> ae920e0abda3 Step 8/46 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> 94839a704e2d Step 9/46 : RUN mkdir scripts ---> Using cache ---> 433a8b0a0499 Step 10/46 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./scripts/ ---> Using cache ---> 1812799ab160 Step 11/46 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> 17d24cdf3aa0 Step 12/46 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --with-libgint=install --with-sirius=install --gpu-ver=V100 --dry-run ---> Running in e311d8eef098 No MPI installation detected. (Ignore this message if a fresh MPI installation is requested.) Toolchain script received the following options: --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --with-libgint=install --with-sirius=install --gpu-ver=V100 --dry-run Parsing options and resolving conflicts... WARNING: (./install_cp2k_toolchain.sh, line 1168) Installing dependencies and CP2K requires CMake but CMake is not enabled, so a new copy of CMake will be installed first.  Toolchain configuration summary ------------------------------- System specifications: -j = 12 --target-cpu = native --gpu-ver = V100 --mpi-mode = mpich --math-mode = openblas Enabled features: --enable-tsan = no --enable-cuda = yes --enable-gauxc-cutlass = no --enable-hip = no --enable-opencl = no --enable-cray = no Packages to be installed: - cmake - mpich - openblas - fftw - eigen - libint - libxc - libxsmm - libxs - cosma - scalapack - elpa - dbcsr - spfft - spla - gsl - spglib - hdf5 - libvdwxc - sirius - libvori - tblite - pugixml - fmt - libgint Packages to be detected from system: - gcc Packages not used: - intel - amd - ninja - openmpi - intelmpi - mkl - acml - gauxc - libxstream - cusolvermp - plumed - libtorch - deepmd - ace - dftd4 - libsmeagol - trexio - libfci - greenx - gmp - mcl With --dry-run option, this script concludes with above report. The setup, toolchain env and conf files are written to /opt/cp2k-toolchain/install. ---> Removed intermediate container e311d8eef098 ---> 5cbdab73b999 Step 13/46 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> 0c0bf617cd3f Step 14/46 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Running in e66dfebe2de1 ==================== Finding GCC from system paths ==================== path to gcc is /usr/bin/gcc path to g++ is /usr/bin/g++ path to gfortran is /usr/bin/gfortran GCC compiler version 13.3.0 found Step gcc took 0.00 seconds. Step intel took 0.00 seconds. Step amd took 0.00 seconds. ==================== Getting proc arch info using OpenBLAS tools ==================== wget --quiet https://www.cp2k.org/static/downloads/OpenBLAS-0.3.33.tar.gz -O OpenBLAS-0.3.33.tar.gz OpenBLAS-0.3.33.tar.gz: OK Checksum of OpenBLAS-0.3.33.tar.gz Ok OpenBLAS detected LIBCORE = skylakex OpenBLAS detected ARCH = x86_64 ==================== Installing CMake ==================== wget --quiet https://www.cp2k.org/static/downloads/cmake-4.3.0-linux-x86_64.tar.gz -O cmake-4.3.0-linux-x86_64.tar.gz cmake-4.3.0-linux-x86_64.tar.gz: OK Checksum of cmake-4.3.0-linux-x86_64.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/cmake-4.3.0 Step cmake took 6.00 seconds. Step ninja took 0.00 seconds. ---> Removed intermediate container e66dfebe2de1 ---> 13bf1d71ff74 Step 15/46 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> d93289369606 Step 16/46 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Running in ca693b36c1bc ==================== Installing MPICH ==================== wget --quiet https://www.cp2k.org/static/downloads/mpich-5.0.1.tar.gz -O mpich-5.0.1.tar.gz mpich-5.0.1.tar.gz: OK Checksum of mpich-5.0.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/mpich-5.0.1 for MPICH device ch4 Found directory /opt/cp2k-toolchain/install/mpich-5.0.1/bin Found directory /opt/cp2k-toolchain/install/mpich-5.0.1/lib Found directory /opt/cp2k-toolchain/install/mpich-5.0.1/include mpiexec is installed as /opt/cp2k-toolchain/install/mpich-5.0.1/bin/mpiexec mpicc is installed as /opt/cp2k-toolchain/install/mpich-5.0.1/bin/mpicc mpicxx is installed as /opt/cp2k-toolchain/install/mpich-5.0.1/bin/mpicxx mpifort is installed as /opt/cp2k-toolchain/install/mpich-5.0.1/bin/mpifort Step mpich took 590.00 seconds. ---> Removed intermediate container ca693b36c1bc ---> c545d5e1ba3a Step 17/46 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> a8fb5e70c13e Step 18/46 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Running in b5c623e8fa2e ==================== Installing OpenBLAS ==================== wget --quiet https://www.cp2k.org/static/downloads/OpenBLAS-0.3.33.tar.gz -O OpenBLAS-0.3.33.tar.gz OpenBLAS-0.3.33.tar.gz: OK Checksum of OpenBLAS-0.3.33.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/openblas-0.3.33 Installing OpenBLAS library for target SKYLAKEX Step openblas took 308.00 seconds. Step gmp took 0.00 seconds. ---> Removed intermediate container b5c623e8fa2e ---> aa9d9bfa478a Step 19/46 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> 650dff6a4858 Step 20/46 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Running in f05b1bc0ec85 ==================== Installing FFTW ==================== wget --quiet https://www.cp2k.org/static/downloads/fftw-3.3.11.tar.gz -O fftw-3.3.11.tar.gz fftw-3.3.11.tar.gz: OK Checksum of fftw-3.3.11.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/fftw-3.3.11 Step fftw took 170.00 seconds. ==================== Installing Eigen ==================== wget --quiet https://www.cp2k.org/static/downloads/eigen-5.0.1.tar.gz -O eigen-5.0.1.tar.gz eigen-5.0.1.tar.gz: OK Checksum of eigen-5.0.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/eigen-5.0.1 Step eigen took 4.00 seconds. ==================== Installing LIBINT ==================== wget --quiet https://www.cp2k.org/static/downloads/libint-v2.13.1-cp2k-lmax-5.tar.xz -O libint-v2.13.1-cp2k-lmax-5.tar.xz libint-v2.13.1-cp2k-lmax-5.tar.xz: OK Checksum of libint-v2.13.1-cp2k-lmax-5.tar.xz Ok Installing from scratch into /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5 Step libint took 545.00 seconds. ==================== Installing LIBXC ==================== wget --quiet https://www.cp2k.org/static/downloads/libxc-7.0.0.tar.bz2 -O libxc-7.0.0.tar.bz2 libxc-7.0.0.tar.bz2: OK Checksum of libxc-7.0.0.tar.bz2 Ok Installing from scratch into /opt/cp2k-toolchain/install/libxc-7.0.0 Step libxc took 431.00 seconds. Step greenx took 0.00 seconds. ---> Removed intermediate container f05b1bc0ec85 ---> 216e2e8fa88e Step 21/46 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> 14806338a3a0 Step 22/46 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Running in 4001432e0947 ==================== Installing Libxsmm ==================== wget --quiet https://www.cp2k.org/static/downloads/libxsmm-2.0.0.tar.gz -O libxsmm-2.0.0.tar.gz libxsmm-2.0.0.tar.gz: OK Checksum of libxsmm-2.0.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libxsmm-2.0.0 Step libxsmm took 22.00 seconds. ==================== Installing LIBXS ==================== wget --quiet https://www.cp2k.org/static/downloads/libxs-1.0.0.tar.gz -O libxs-1.0.0.tar.gz libxs-1.0.0.tar.gz: OK Checksum of libxs-1.0.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libxs-1.0.0 Step libxs took 9.00 seconds. Step libxstream took 0.00 seconds. ==================== Installing libGint ==================== wget --quiet https://www.cp2k.org/static/downloads/libGint-v1.tar.gz -O libGint-v1.tar.gz libGint-v1.tar.gz: OK Checksum of libGint-v1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libGint-v1 Step libGint took 122.00 seconds. ==================== Installing ScaLAPACK ==================== wget --quiet https://www.cp2k.org/static/downloads/scalapack-2.2.3.tar.gz -O scalapack-2.2.3.tar.gz scalapack-2.2.3.tar.gz: OK Checksum of scalapack-2.2.3.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/scalapack-2.2.3 Step scalapack took 38.00 seconds. Step cusolvermp took 0.00 seconds. ==================== Installing COSMA ==================== wget --quiet https://www.cp2k.org/static/downloads/COSMA-v2.8.4.tar.gz -O COSMA-v2.8.4.tar.gz COSMA-v2.8.4.tar.gz: OK Checksum of COSMA-v2.8.4.tar.gz Ok wget --quiet https://www.cp2k.org/static/downloads/COSTA-v2.3.2.tar.gz -O COSTA-v2.3.2.tar.gz COSTA-v2.3.2.tar.gz: OK Checksum of COSTA-v2.3.2.tar.gz Ok wget --quiet https://www.cp2k.org/static/downloads/Tiled-MM-v2.3.2.tar.gz -O Tiled-MM-v2.3.2.tar.gz Tiled-MM-v2.3.2.tar.gz: OK Checksum of Tiled-MM-v2.3.2.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/COSMA-2.8.4 Step cosma took 67.00 seconds. ---> Removed intermediate container 4001432e0947 ---> 9e48ec862dc3 Step 23/46 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> 030ba9471980 Step 24/46 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Running in b35938c48b52 ==================== Installing ELPA ==================== wget --quiet https://www.cp2k.org/static/downloads/elpa-2026.02.001.tar.gz -O elpa-2026.02.001.tar.gz elpa-2026.02.001.tar.gz: OK Checksum of elpa-2026.02.001.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/elpa-2026.02.001 Installing from scratch into /opt/cp2k-toolchain/install/elpa-2026.02.001/cpu Installing from scratch into /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia Step elpa took 317.00 seconds. ---> Removed intermediate container b35938c48b52 ---> 66a3a01140d6 Step 25/46 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> 6e342403e3cf Step 26/46 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Running in 469fb1e94174 ==================== Installing GSL ==================== wget --quiet https://www.cp2k.org/static/downloads/gsl-2.8.tar.gz -O gsl-2.8.tar.gz gsl-2.8.tar.gz: OK Checksum of gsl-2.8.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/gsl-2.8 Step gsl took 75.00 seconds. Step plumed took 0.00 seconds. Step libtorch took 0.00 seconds. Step gauxc took 0.00 seconds. Step deepmd took 0.00 seconds. Step ace took 0.00 seconds. ---> Removed intermediate container 469fb1e94174 ---> 137ac97eacaa Step 27/46 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> 2d17b335d9d4 Step 28/46 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Running in 5ea8eedc3163 ==================== Installing HDF5 ==================== wget --quiet https://www.cp2k.org/static/downloads/hdf5-2.1.1.tar.gz -O hdf5-2.1.1.tar.gz hdf5-2.1.1.tar.gz: OK Checksum of hdf5-2.1.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/hdf5-2.1.1 Step hdf5 took 131.00 seconds. ==================== Installing libvdwxc ==================== wget --quiet https://www.cp2k.org/static/downloads/libvdwxc-0.5.0.tar.gz -O libvdwxc-0.5.0.tar.gz libvdwxc-0.5.0.tar.gz: OK Checksum of libvdwxc-0.5.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libvdwxc-0.5.0 Step libvdwxc took 15.00 seconds. ==================== Installing Spglib ==================== wget --quiet https://www.cp2k.org/static/downloads/spglib-2.7.0.tar.gz -O spglib-2.7.0.tar.gz spglib-2.7.0.tar.gz: OK Checksum of spglib-2.7.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/spglib-2.7.0 Step spglib took 4.00 seconds. ==================== Installing libvori ==================== wget --quiet https://www.cp2k.org/static/downloads/libvori-220621.tar.gz -O libvori-220621.tar.gz libvori-220621.tar.gz: OK Checksum of libvori-220621.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libvori-220621 Step libvori took 13.00 seconds. Step libsmeagol took 0.00 seconds. Step libfci took 0.00 seconds. ==================== Installing fmt ==================== wget --quiet https://www.cp2k.org/static/downloads/fmt-12.1.0.zip -O fmt-12.1.0.zip fmt-12.1.0.zip: OK Checksum of fmt-12.1.0.zip Ok Installing from scratch into /opt/cp2k-toolchain/install/fmt-12.1.0 Step fmt took 9.00 seconds. ---> Removed intermediate container 5ea8eedc3163 ---> c90e0d2c1519 Step 29/46 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> 3bfe66a85c97 Step 30/46 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Running in 484f5922435f Step dftd4 took 0.00 seconds. ==================== Installing tblite ==================== wget --quiet https://www.cp2k.org/static/downloads/tblite-0.6.0.tar.xz -O tblite-0.6.0.tar.xz tblite-0.6.0.tar.xz: OK Checksum of tblite-0.6.0.tar.xz Ok Step tblite took 42.00 seconds. ==================== Installing pugixml ==================== wget --quiet https://www.cp2k.org/static/downloads/pugixml-1.15.tar.gz -O pugixml-1.15.tar.gz pugixml-1.15.tar.gz: OK Checksum of pugixml-1.15.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/pugixml-1.15 Step pugixml took 9.00 seconds. ==================== Installing SpFFT ==================== wget --quiet https://www.cp2k.org/static/downloads/SpFFT-1.1.1.tar.gz -O SpFFT-1.1.1.tar.gz SpFFT-1.1.1.tar.gz: OK Checksum of SpFFT-1.1.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/SpFFT-1.1.1 Step spfft took 22.00 seconds. ==================== Installing SpLA ==================== wget --quiet https://www.cp2k.org/static/downloads/SpLA-1.6.1.tar.gz -O SpLA-1.6.1.tar.gz SpLA-1.6.1.tar.gz: OK Checksum of SpLA-1.6.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/SpLA-1.6.1 Step spla took 24.00 seconds. ==================== Installing SIRIUS ==================== wget --quiet https://www.cp2k.org/static/downloads/SIRIUS-7.11.1.tar.gz -O SIRIUS-7.11.1.tar.gz SIRIUS-7.11.1.tar.gz: OK Checksum of SIRIUS-7.11.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/sirius-7.11.1 Installing from scratch into /opt/cp2k-toolchain/install/sirius-7.11.1/cuda Step sirius took 451.00 seconds. Step trexio took 0.00 seconds. Step MCL took 0.00 seconds. ---> Removed intermediate container 484f5922435f ---> af8c371731a0 Step 31/46 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> 14ea3d12ea28 Step 32/46 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Running in 004add0bdc72 ==================== Installing DBCSR ==================== wget --quiet https://www.cp2k.org/static/downloads/dbcsr-2.10.0.tar.gz -O dbcsr-2.10.0.tar.gz dbcsr-2.10.0.tar.gz: OK Checksum of dbcsr-2.10.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/dbcsr-2.10.0 Installing from scratch into /opt/cp2k-toolchain/install/dbcsr-2.10.0-cuda Step DBCSR took 129.00 seconds. ---> Removed intermediate container 004add0bdc72 ---> 813cb79687f6 Step 33/46 : WORKDIR /opt/cp2k ---> Running in 87896d8dafcb ---> Removed intermediate container 87896d8dafcb ---> 4bb884db19f7 Step 34/46 : COPY ./src ./src ---> 6d77e3f94f0b Step 35/46 : COPY ./data ./data ---> fde372580c02 Step 36/46 : COPY ./tools/build_utils ./tools/build_utils ---> db67cbc7f82d Step 37/46 : COPY ./cmake ./cmake ---> 7ecd655b9e76 Step 38/46 : COPY ./CMakeLists.txt . ---> 1310f98a8c5a Step 39/46 : COPY ./tools/docker/scripts/build_cp2k.sh ./tools/docker/scripts/cmake_cp2k.sh ./ ---> 80e076b79008 Step 40/46 : RUN ./build_cp2k.sh toolchain_cuda_V100 psmp ---> Running in c291dd5f1ea9 ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpi.so (found version "5.0") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpicxx.so (found version "5.0") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpifort.so (found version "5.0") -- Found MPI: TRUE (found version "5.0") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "5.0") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.33 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a -- Using LIBXS + LIBXSMM for Small Matrix Multiplication -- CP2K_WITH_GPU is deprecated in favor of CMAKE_HIP_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "5.0") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 70 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "2.1.1") found components: C Fortran -- Found MPI: TRUE (found version "5.0") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.11/include -- Boost detected. satisfied by headers bundled with Libint2 distribution -- Found LibGint: /opt/cp2k-toolchain/install/libGint-v1/lib/libcp2kGint.a -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- mctc-lib: Find installed package -- multicharge: Find installed package -- DFTD4: found version 4.2.0, using v4.2+ API -- toml-f: Find installed package -- s-dftd3: Find installed package -- DFTD4: found version 4.2.0, using v4.2+ API -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.7.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "2.1.1") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.11/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Checking for one of the modules 's-dftd3' -- Checking for one of the modules 'mctc-lib' -- Found DFTD3: /opt/cp2k-toolchain/install/tblite-0.6.0/lib/libs-dftd3.a -- Checking for one of the modules 'dftd4' -- Checking for one of the modules 'multicharge' -- Found DFTD4: /opt/cp2k-toolchain/install/tblite-0.6.0/lib/libdftd4.a -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'scalapack' -- Checking for one of the modules 'elpa;elpa_openmp;elpa-openmp-2019.05.001;elpa_openmp-2019.11.001;elpa_openmp-2020.05.001;elpa-2019.05.001;elpa-2019.11.001;elpa-2020.05.001' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so -- Checking for module 'libvdwxc>=0.5.0' -- Found libvdwxc, version 0.5.0 -- Checking for module 'fftw3' -- Found fftw3, version 3.3.11 -- Found LibVDWXC: vdwxc;fftw3 (Required is at least version "0.5.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -- Adding libxs_jit.F from dependency libxs for compilation -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - Vendor: OpenBLAS - Include directories: /opt/cp2k-toolchain/install/openblas-0.3.33/include - Libraries: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so - LAPACK - Include directories: /opt/cp2k-toolchain/install/openblas-0.3.33/include - Libraries: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so - MPI - Include directories: /opt/cp2k-toolchain/install/mpich-5.0.1/include - Libraries: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpi.so - MPI_F08: Enabled - ScaLAPACK - Vendor: auto - Include directories: - Libraries: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a - Hardware acceleration - Backend: CUDA - GPU architectures: 70 - GPU profiling enabled: OFF - GPU-accelerated modules - ELPA: ON - GRID: ON - DBM: ON - PW: ON - LibXC - Include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - Libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - Include directories: /opt/cp2k-toolchain/install/hdf5-2.1.1/include - Libraries: hdf5-shared - FFTW3 - Include directories: /opt/cp2k-toolchain/install/fftw-3.3.11/include - Libraries: /opt/cp2k-toolchain/install/fftw-3.3.11/lib/libfftw3.a - LIBXS - Include directories: - Libraries: - SpLA - Include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - Libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - DFTD4 - Enabled via TBLITE - Include directories: /opt/cp2k-toolchain/install/tblite-0.6.0/include;/opt/cp2k-toolchain/install/tblite-0.6.0/include/dftd4/GNU-13.3.0 - Libraries: - TBLITE - Include directories: /opt/cp2k-toolchain/install/tblite-0.6.0/include;/opt/cp2k-toolchain/install/tblite-0.6.0/include/tblite/GNU-13.3.0 - Libraries: - SIRIUS - Include directories: - Libraries: - COSMA - Include directories: /opt/cp2k-toolchain/install/COSMA-2.8.4-cuda/include - Libraries: MPI::MPI_CXX;costa::costa;$;$;$<$:cosma::BLAS::blas>;$;$<$:Tiled-MM::Tiled-MM>;$<$:Tiled-MM::Tiled-MM>;$<$:semiprof::semiprof>;$<$:cosma::scalapack::scalapack> - Libint2 - Include directories: - Libraries: - LibGint - include directories: /opt/cp2k-toolchain/install/libGint-v1/include - libraries: /opt/cp2k-toolchain/install/libGint-v1/lib/libcp2kGint.a - ELPA - Include directories: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/include/elpa_openmp-2026.02.001 - Libraries: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -------------------------------------------------------------------- - - - Dependencies not included in this build - - - -------------------------------------------------------------------- - DeePMD - PEXSI - ACE (libpace) - Spglib - LibSMEAGOL - MiMiC - DLA-Future - PLUMED - LibFCI - GauXC - Libvori - LibTorch - TREXIO - OpenPMD - GreenX After building and installing CP2K, run the regtests with: /opt/cp2k/tests/do_regtest.py /opt/cp2k/bin psmp -- Configuring done (13.6s) -- Generating done (0.6s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container c291dd5f1ea9 ---> a31164a9239c Step 41/46 : COPY ./benchmarks ./benchmarks ---> 9a4b6d9acd3b Step 42/46 : COPY ./tools/regtesting ./tools/regtesting ---> bf9b9f348ce4 Step 43/46 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> f8197a2625af Step 44/46 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in 912a26155513 ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma libxs libxsmm dbcsr_acc libdftd4 dftd4_v4_2 s_dftd3 mctc-lib tblite sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 libGint ========== Checking Benchmark Inputs ========= Found 83 input files and 0 errors. ========== Running Performance Test ========== Plot: name="total_timings_6cpu_1gpu", title="Total Timings with 6 CPU Cores and 1 GPU", ylabel="time [s]" Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.030 0.031 103.425 103.425 qs_mol_dyn_low 1 2.0 0.005 0.005 102.972 102.975 qs_forces 11 3.9 0.002 0.002 102.918 102.918 qs_energies 11 4.9 0.001 0.002 91.362 91.364 scf_env_do_scf 11 5.9 0.001 0.001 75.545 75.546 scf_env_do_scf_inner_loop 108 6.5 0.007 0.011 64.401 64.401 velocity_verlet 10 3.0 0.002 0.002 63.755 63.774 rebuild_ks_matrix 119 8.3 0.001 0.001 27.805 27.809 qs_ks_build_kohn_sham_matrix 119 9.3 0.020 0.020 27.804 27.808 dbcsr_multiply_generic 2286 12.5 0.156 0.157 26.089 26.108 qs_ks_update_qs_env 119 7.6 0.001 0.001 25.795 25.800 qs_rho_update_rho_low 119 7.7 0.001 0.001 22.365 22.387 calculate_rho_elec 119 8.7 0.914 0.926 22.364 22.386 qs_scf_new_mos 108 7.5 0.001 0.001 21.345 21.361 qs_scf_loop_do_ot 108 8.5 0.001 0.001 21.344 21.361 ot_scf_mini 108 9.5 0.003 0.003 19.322 19.324 fft_wrap_pw1pw2 1201 11.6 0.024 0.024 17.078 17.111 fft_wrap_pw1pw2_140 487 12.2 0.003 0.003 14.695 14.723 sum_up_and_integrate 119 10.3 0.003 0.003 14.629 14.677 integrate_v_rspace 119 11.3 0.361 0.364 14.529 14.578 multiply_cannon 2286 13.5 0.349 0.352 13.091 13.135 multiply_cannon_loop 2286 14.5 0.272 0.273 11.964 12.018 make_m2s 4572 13.5 0.048 0.048 11.287 11.293 density_rs2pw 119 9.7 0.008 0.008 11.169 11.283 ot_mini 108 10.5 0.001 0.001 11.234 11.237 make_images 4572 14.5 1.202 1.213 11.100 11.106 init_scf_loop 11 6.9 0.000 0.000 11.059 11.059 grid_collocate_task_list 119 9.7 10.250 10.327 10.250 10.327 pw_gpu_r3dc1d_3d_ps 606 13.1 2.414 2.426 8.765 8.767 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 8.249 8.330 pw_gpu_c1dr3d_3d_ps 595 14.2 2.291 2.313 8.283 8.317 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.048 8.048 prepare_preconditioner 11 7.9 0.000 0.000 7.671 7.673 make_preconditioner 11 8.9 0.000 0.000 7.671 7.673 grid_integrate_task_list 119 12.3 7.534 7.585 7.534 7.585 init_scf_run 11 5.9 0.000 0.000 7.083 7.084 scf_env_initial_rho_setup 11 6.9 0.001 0.001 7.083 7.083 qs_ot_get_derivative 108 11.5 0.002 0.002 6.852 6.854 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.493 6.764 hybrid_alltoall_any 4725 16.4 4.878 4.886 6.677 6.699 potential_pw2rs 119 12.3 0.038 0.039 6.634 6.635 make_images_data 4572 15.5 0.061 0.061 6.552 6.559 multiply_cannon_multrec 4572 15.5 2.093 2.095 6.479 6.481 mp_alltoall_z22v 1201 15.6 4.391 4.459 4.391 4.459 ot_diis_step 108 11.5 0.006 0.006 4.357 4.357 build_core_ppl_forces 11 5.9 4.184 4.240 4.184 4.240 build_core_hamiltonian_matrix 11 6.9 0.001 0.002 4.097 4.154 wfi_extrapolate 11 7.9 0.001 0.002 4.142 4.142 dbcsr_mm_accdrv_process 9594 16.2 0.803 0.971 3.994 4.006 mp_waitall_1 64495 16.9 3.743 3.812 3.743 3.812 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.772 3.776 apply_single 119 13.6 0.001 0.001 3.772 3.776 dbcsr_complete_redistribute 329 12.2 1.268 1.321 3.438 3.713 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.557 3.558 calculate_dm_sparse 119 9.5 0.001 0.001 3.486 3.507 qs_ot_get_p 119 10.4 0.001 0.001 3.436 3.440 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.153 3.154 multiply_cannon_sync_h2d 4572 15.5 3.040 3.066 3.040 3.066 transfer_rs2pw 487 10.6 0.009 0.009 2.760 2.914 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.850 2.850 pw_poisson_solve 119 10.3 0.003 0.003 2.745 2.755 yz_to_x 606 14.1 0.470 0.477 2.713 2.740 qs_create_task_list 11 7.9 0.000 0.000 2.660 2.718 generate_qs_task_list 11 8.9 1.219 1.237 2.660 2.718 jit_kernel_multiply 12 15.7 2.559 2.713 2.559 2.713 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.688 2.692 x_to_yz 595 15.2 0.505 0.511 2.653 2.681 transfer_rs2pw_140 130 11.5 1.643 1.664 2.305 2.473 calculate_first_density_matrix 1 7.0 0.000 0.000 2.463 2.463 cp_fm_cholesky_invert 11 10.9 2.416 2.416 2.416 2.416 qs_ot_get_derivative_taylor 59 13.0 0.003 0.003 2.380 2.381 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.302 2.304 qs_ot_p2m_diag 50 11.0 0.091 0.093 2.224 2.225 pw_gpu_fg 606 14.1 2.216 2.219 2.216 2.219 build_core_ppl 11 7.9 2.134 2.178 2.134 2.178 dbcsr_special_finalize 6858 15.5 0.042 0.042 2.114 2.119 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.077 2.081 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64", label="H2O-64", y=103.425, yerr=0.0 Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=72.18799999999999, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=10.25, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.534, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.878, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=4.391, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=4.184, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.028 0.028 95.323 95.323 qs_mol_dyn_low 1 2.0 0.005 0.005 94.863 94.865 qs_forces 11 3.9 0.002 0.002 94.812 94.812 qs_energies 11 4.9 0.001 0.001 83.260 83.262 scf_env_do_scf 11 5.9 0.001 0.001 66.780 66.780 velocity_verlet 10 3.0 0.002 0.002 60.072 60.090 scf_env_do_scf_inner_loop 96 6.5 0.006 0.009 55.387 55.387 rebuild_ks_matrix 107 8.3 0.001 0.001 25.479 25.481 qs_ks_build_kohn_sham_matrix 107 9.3 0.018 0.018 25.478 25.481 dbcsr_multiply_generic 1966 12.4 0.137 0.138 23.746 23.778 qs_ks_update_qs_env 107 7.6 0.001 0.001 23.339 23.342 qs_scf_new_mos 96 7.5 0.001 0.001 19.027 19.035 qs_scf_loop_do_ot 96 8.5 0.001 0.001 19.027 19.034 qs_rho_update_rho_low 107 7.7 0.001 0.001 17.900 17.912 calculate_rho_elec 107 8.7 0.825 0.826 17.900 17.911 ot_scf_mini 96 9.5 0.003 0.003 17.238 17.240 fft_wrap_pw1pw2 1081 11.6 0.022 0.022 15.541 15.548 sum_up_and_integrate 107 10.3 0.002 0.002 13.614 13.632 integrate_v_rspace 107 11.3 0.334 0.336 13.525 13.545 fft_wrap_pw1pw2_140 439 12.2 0.003 0.003 13.372 13.408 multiply_cannon 1966 13.4 0.319 0.322 11.990 12.034 init_scf_loop 11 6.9 0.000 0.000 11.308 11.308 multiply_cannon_loop 1966 14.4 0.238 0.243 10.997 11.000 make_m2s 3932 13.4 0.042 0.042 10.232 10.268 density_rs2pw 107 9.7 0.007 0.007 10.111 10.221 make_images 3932 14.4 1.072 1.081 10.066 10.103 ot_mini 96 10.5 0.001 0.001 10.095 10.098 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.949 8.949 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 8.205 8.354 pw_gpu_r3dc1d_3d_ps 546 13.1 2.204 2.206 7.986 8.007 prepare_preconditioner 11 7.9 0.000 0.000 7.911 7.919 make_preconditioner 11 8.9 0.000 0.000 7.911 7.919 pw_gpu_c1dr3d_3d_ps 535 14.2 2.076 2.090 7.527 7.555 grid_integrate_task_list 107 12.3 7.173 7.189 7.173 7.189 grid_collocate_task_list 107 9.7 6.939 7.022 6.939 7.022 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.699 6.975 init_scf_run 11 5.9 0.000 0.000 6.851 6.851 scf_env_initial_rho_setup 11 6.9 0.001 0.001 6.850 6.850 qs_ot_get_derivative 96 11.5 0.002 0.002 6.186 6.189 hybrid_alltoall_any 4079 16.3 4.443 4.488 6.099 6.153 multiply_cannon_multrec 3932 15.4 1.806 1.851 6.128 6.133 potential_pw2rs 107 12.3 0.034 0.034 6.019 6.019 make_images_data 3932 15.4 0.052 0.052 5.962 5.992 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.461 4.560 build_core_ppl_forces 11 5.9 4.148 4.268 4.148 4.268 build_core_hamiltonian_matrix 11 6.9 0.001 0.002 4.069 4.115 mp_alltoall_z22v 1081 15.6 3.982 4.008 3.982 4.008 dbcsr_mm_accdrv_process 8450 16.1 0.932 0.942 3.963 4.000 dbcsr_complete_redistribute 317 12.2 1.324 1.355 3.635 3.913 ot_diis_step 96 11.5 0.005 0.005 3.889 3.889 wfi_extrapolate 11 7.9 0.001 0.001 3.877 3.877 qs_create_task_list 11 7.9 0.000 0.000 3.571 3.622 generate_qs_task_list 11 8.9 1.522 1.548 3.571 3.622 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.437 3.442 apply_single 107 13.6 0.001 0.001 3.437 3.442 mp_waitall_1 55487 16.8 3.362 3.394 3.362 3.394 calculate_dm_sparse 107 9.5 0.001 0.001 3.304 3.312 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.194 3.195 qs_ot_get_p 107 10.4 0.001 0.002 3.017 3.019 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.854 2.868 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.802 2.803 multiply_cannon_sync_h2d 3932 15.4 2.706 2.715 2.706 2.715 transfer_rs2pw 439 10.6 0.007 0.008 2.508 2.659 calculate_first_density_matrix 1 7.0 0.000 0.000 2.519 2.520 jit_kernel_multiply 10 15.6 2.464 2.485 2.464 2.485 pw_poisson_solve 107 10.3 0.003 0.003 2.467 2.470 yz_to_x 546 14.1 0.426 0.429 2.459 2.468 cp_fm_cholesky_invert 11 10.9 2.448 2.448 2.448 2.448 x_to_yz 535 15.2 0.454 0.454 2.403 2.422 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.252 2.267 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.260 2.261 transfer_rs2pw_140 118 11.5 1.486 1.501 2.098 2.255 build_core_ppl 11 7.9 2.104 2.140 2.104 2.140 copy_fm_to_dbcsr 170 11.1 0.002 0.002 1.790 2.071 qs_ot_get_derivative_taylor 53 13.0 0.002 0.002 2.070 2.071 pw_gpu_fg 546 14.1 2.016 2.053 2.016 2.053 build_kinetic_matrix_low 22 6.9 1.901 1.903 2.000 2.000 qs_ot_p2m_diag 44 11.0 0.081 0.082 1.971 1.972 build_overlap_matrix_low 22 6.9 1.861 1.869 1.957 1.966 dbcsr_special_finalize 5898 15.4 0.037 0.037 1.912 1.926 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64_nonortho", label="H2O-64_nonortho", y=95.323, yerr=0.0 Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=68.63799999999999, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.173, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=6.939, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.443, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=4.148, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=3.982, yerr=0.0 Running w64PBE.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/w64PBE_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.045 0.045 246.718 246.718 qs_mol_dyn_low 1 2.0 0.005 0.005 245.953 245.955 qs_forces 11 3.9 0.002 0.002 245.900 245.901 qs_energies 11 4.9 0.001 0.001 213.310 213.311 velocity_verlet 10 3.0 0.001 0.002 194.040 194.059 scf_env_do_scf 11 5.9 0.001 0.002 191.713 191.714 scf_env_do_scf_inner_loop 106 6.8 0.006 0.009 167.149 167.149 rebuild_ks_matrix 117 8.5 0.001 0.001 126.836 126.836 qs_ks_build_kohn_sham_matrix 117 9.5 0.020 0.020 126.835 126.835 qs_ks_update_qs_env 120 7.8 0.001 0.001 112.824 112.825 fft_wrap_pw1pw2 2000 12.9 0.046 0.048 70.494 70.522 qs_vxc_create 117 10.5 0.004 0.004 67.559 67.568 xc_vxc_pw_create 117 11.5 1.521 1.532 67.555 67.565 fft_wrap_pw1pw2_200 1298 14.3 0.009 0.009 66.815 66.841 qs_rho_update_rho_low 117 7.9 0.001 0.001 61.899 61.907 calculate_rho_elec 117 8.9 1.283 1.283 61.898 61.906 sum_up_and_integrate 117 10.5 0.003 0.003 44.735 44.794 integrate_v_rspace 117 11.5 0.220 0.221 44.541 44.602 grid_collocate_task_list 117 9.9 41.894 42.008 41.894 42.008 xc_rho_set_and_dset_create 117 12.5 0.960 0.969 39.557 39.632 xc_pw_derive 702 13.5 0.010 0.010 39.271 39.279 pw_gpu_c1dr3d_3d_ps 1053 15.2 10.812 10.856 37.720 37.774 grid_integrate_task_list 117 12.5 33.217 33.270 33.217 33.270 pw_gpu_r3dc1d_3d_ps 947 14.5 9.722 9.727 32.715 32.742 xc_pw_divergence 117 12.5 0.006 0.006 26.068 26.100 init_scf_loop 14 6.8 0.001 0.001 24.496 24.496 mp_alltoall_z22v 2000 16.9 19.161 19.296 19.161 19.296 density_rs2pw 117 9.9 0.009 0.009 18.695 18.815 dbcsr_multiply_generic 2035 12.5 0.147 0.148 18.450 18.510 xc_functional_eval 117 13.5 0.002 0.002 18.191 18.241 pbe_lda_eval 117 14.5 18.190 18.240 18.190 18.240 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 17.566 17.775 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 14.809 14.809 qs_scf_new_mos 106 7.8 0.001 0.001 13.600 13.601 qs_scf_loop_do_ot 106 8.8 0.001 0.001 13.599 13.600 x_to_yz 1053 16.2 2.503 2.505 12.575 12.670 ot_scf_mini 106 9.8 0.003 0.003 12.178 12.178 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 11.941 11.941 potential_pw2rs 117 12.5 0.060 0.060 11.104 11.110 yz_to_x 947 15.5 1.819 1.820 10.908 10.944 multiply_cannon 2035 13.5 0.311 0.315 9.202 9.230 init_scf_run 11 5.9 0.000 0.000 9.134 9.135 scf_env_initial_rho_setup 11 6.9 0.000 0.001 9.134 9.134 build_core_ppl_forces 11 5.9 8.926 9.121 8.926 9.121 prepare_preconditioner 14 7.8 0.000 0.000 8.706 8.712 make_preconditioner 14 8.8 0.000 0.000 8.706 8.712 pw_gpu_sf 1053 16.2 8.427 8.432 8.427 8.432 multiply_cannon_loop 2035 14.5 0.245 0.245 8.155 8.167 make_m2s 4070 13.5 0.045 0.046 7.733 7.743 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 7.621 7.708 pw_gpu_fg 947 15.5 7.482 7.600 7.482 7.600 make_images 4070 14.5 1.054 1.073 7.547 7.557 ot_mini 106 10.8 0.001 0.001 7.394 7.394 wfi_extrapolate 11 7.9 0.002 0.002 7.085 7.085 pw_gpu_ffc 1053 16.2 5.887 5.890 5.887 5.890 build_overlap_matrix_low 22 6.9 5.466 5.498 5.548 5.578 build_kinetic_matrix_low 22 6.9 5.427 5.452 5.520 5.544 pw_poisson_solve 117 10.5 0.003 0.003 4.767 4.767 transfer_rs2pw 479 10.8 0.009 0.010 4.473 4.681 pw_gpu_cff 947 15.5 4.537 4.596 4.537 4.596 qs_ot_get_derivative 106 11.8 0.002 0.002 4.561 4.562 multiply_cannon_multrec 4070 15.5 1.750 1.755 4.310 4.310 make_full_single_inverse 14 9.8 0.002 0.002 4.309 4.309 pw_derive 1053 13.8 4.074 4.090 4.074 4.090 make_images_data 4070 15.5 0.055 0.055 4.041 4.056 hybrid_alltoall_any 4213 16.4 2.805 2.825 4.028 4.042 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.920 3.973 transfer_rs2pw_200 128 11.7 2.687 2.721 3.732 3.943 make_full_inverse_cholesky 14 9.8 0.000 0.000 3.600 3.756 build_core_ppl 11 7.9 3.491 3.573 3.491 3.573 mp_waitall_1 57459 16.9 3.500 3.506 3.500 3.506 transfer_pw2rs 479 13.4 0.006 0.006 3.200 3.205 ot_diis_step 106 11.8 0.005 0.005 2.812 2.812 arnoldi_generalized_ev 14 10.8 0.000 0.000 2.715 2.717 fft_wrap_pw1pw2_70 234 13.2 0.002 0.002 2.663 2.704 pw_copy 1755 13.0 2.691 2.691 2.691 2.691 dbcsr_sym_matrix_vector_mult 1269 12.5 0.037 0.037 2.681 2.682 qs_create_task_list 11 7.9 0.000 0.000 2.548 2.582 generate_qs_task_list 11 8.9 1.415 1.415 2.548 2.582 transfer_pw2rs_200 128 14.1 1.644 1.668 2.569 2.573 gev_build_subspace 23 11.5 0.010 0.010 2.504 2.504 dbcsr_complete_redistribute 323 11.8 0.948 0.963 2.283 2.478 apply_preconditioner_dbcsr 120 12.8 0.000 0.000 2.413 2.416 apply_single 120 13.8 0.001 0.001 2.413 2.416 dbcsr_sym_matrix_vector_mult_l 1269 13.5 2.335 2.344 2.342 2.350 dbcsr_mm_accdrv_process 9388 16.2 0.681 1.043 2.292 2.297 calculate_dm_sparse 117 9.7 0.001 0.001 2.149 2.149 pw_poisson_set 118 11.5 0.005 0.005 2.125 2.125 cp_dbcsr_sm_fm_multiply 46 9.3 0.002 0.002 2.019 2.020 qs_ot_get_derivative_taylor 89 12.9 0.004 0.004 2.011 2.013 pw_integral_ab_c1d_c1d_gs 117 11.5 1.870 1.872 1.888 1.890 multiply_cannon_sync_h2d 4070 15.5 1.757 1.823 1.757 1.823 qs_ot_get_p 120 10.5 0.001 0.001 1.701 1.704 pw_axpy 1170 12.0 1.598 1.609 1.598 1.609 copy_dbcsr_to_fm 143 10.8 0.004 0.004 1.521 1.571 copy_fm_to_dbcsr 180 10.8 0.002 0.002 1.390 1.547 dbcsr_special_finalize 6105 15.5 0.035 0.035 1.512 1.514 cp_dbcsr_sm_fm_multiply_core 46 10.3 0.000 0.000 1.509 1.510 mp_sendrecv_dv 479 12.8 1.274 1.449 1.274 1.449 jit_kernel_multiply 13 15.1 1.084 1.442 1.084 1.442 cp_fm_cholesky_invert 14 10.8 1.416 1.416 1.416 1.416 calculate_rho_core 11 7.9 0.170 0.171 1.319 1.405 dbcsr_merge_single_wm 4070 16.5 0.134 0.136 1.396 1.399 multiply_cannon_metrocomm1 4070 15.5 0.012 0.012 1.259 1.318 dbcsr_dot 1125 12.2 1.197 1.198 1.261 1.266 calculate_first_density_matrix 1 7.0 0.000 0.000 1.217 1.217 transfer_dbcsr_to_fm 14 10.8 0.002 0.002 1.004 1.045 dbcsr_sort_data 4070 17.5 0.979 0.979 0.979 0.979 cp_dbcsr_plus_fm_fm_t 22 8.9 0.001 0.001 0.977 0.978 dbcsr_finalize 4628 13.9 0.062 0.063 0.924 0.961 transfer_fm_to_dbcsr 14 9.8 0.000 0.000 0.797 0.947 dbcsr_merge_all 4098 15.1 0.179 0.179 0.813 0.851 qs_ot_get_orbitals 106 10.8 0.001 0.001 0.821 0.821 grid_create_task_list 11 9.9 0.791 0.815 0.791 0.815 build_core_ppnl_forces 11 5.9 0.807 0.814 0.807 0.814 dbcsr_copy 7812 13.3 0.201 0.202 0.805 0.807 mp_alltoall_d11v 1899 13.8 0.799 0.801 0.799 0.801 qs_ot_p2m_diag 19 11.0 0.038 0.039 0.799 0.800 evaluate_core_matrix_traces 117 8.5 0.001 0.001 0.796 0.797 calculate_ptrace_kp 234 9.5 0.001 0.001 0.795 0.796 cp_fm_cholesky_decompose 28 10.5 0.716 0.758 0.716 0.758 mp_sum_d 3821 11.6 0.465 0.746 0.465 0.746 fft_wrap_pw1pw2_30 234 13.2 0.001 0.001 0.706 0.723 make_images_pack 4070 15.5 0.659 0.667 0.673 0.681 qs_init_subsys 1 2.0 0.001 0.001 0.675 0.675 cp_dbcsr_syevd 19 12.0 0.002 0.002 0.672 0.672 qs_env_setup 1 3.0 0.000 0.000 0.667 0.668 qs_env_rebuild_pw_env 23 5.3 0.000 0.000 0.667 0.668 pw_env_rebuild 1 5.0 0.000 0.000 0.667 0.668 cp_fm_uplo_to_full 47 13.4 0.495 0.656 0.495 0.656 pw_grid_setup 4 6.0 0.000 0.000 0.640 0.641 cp_fm_diag_elpa 19 13.0 0.000 0.000 0.636 0.637 cp_fm_diag_elpa_base 19 14.0 0.625 0.627 0.635 0.636 pw_grid_setup_internal 4 7.0 0.007 0.007 0.629 0.630 transfer_rs2pw_70 117 11.9 0.395 0.396 0.575 0.578 make_basis_sm 14 9.3 0.001 0.001 0.575 0.576 qs_ot_get_derivative_diag 17 12.0 0.001 0.001 0.553 0.555 dbcsr_copy_into_existing 22 7.9 0.546 0.547 0.547 0.548 pw_zero 585 13.0 0.536 0.541 0.536 0.541 acc_transpose_blocks 4070 15.5 0.024 0.024 0.532 0.536 dbcsr_mm_accdrv_process_sort 9388 17.2 0.528 0.528 0.528 0.528 calculate_ecore_overlap 22 5.9 0.001 0.002 0.269 0.511 pw_grid_sort 4 8.0 0.368 0.373 0.500 0.507 transfer_pw2rs_70 117 14.5 0.319 0.319 0.490 0.490 dbcsr_sort_indices 10929 16.5 0.446 0.446 0.446 0.446 ot_scf_init 14 7.8 0.002 0.002 0.432 0.438 compute_matrix_w 11 5.9 0.000 0.000 0.431 0.433 calculate_w_matrix_ot 11 6.9 0.003 0.003 0.431 0.433 parallel_gemm_fm_cosma 96 8.9 0.421 0.424 0.421 0.424 reorthogonalize_vectors 10 9.0 0.000 0.000 0.402 0.403 dbcsr_data_copy_aa2 2343 15.5 0.392 0.400 0.392 0.400 mp_sum_l 6134 13.5 0.331 0.376 0.331 0.376 mp_alltoall_i22 633 13.6 0.216 0.372 0.216 0.372 cp_dbcsr_alloc_block_from_nbl 88 7.7 0.231 0.232 0.355 0.358 dbcsr_desymmetrize_deep 143 11.8 0.092 0.093 0.341 0.341 build_qs_neighbor_lists 11 6.9 0.001 0.001 0.340 0.340 integrate_v_core_rspace 11 7.9 0.072 0.073 0.325 0.332 dbcsr_add_d 1795 13.1 0.003 0.003 0.329 0.329 dbcsr_add_anytype 1795 14.1 0.176 0.179 0.326 0.326 distribute_tasks 11 9.9 0.325 0.326 0.325 0.326 setup_rec_index_2d 4070 14.5 0.309 0.310 0.309 0.310 pw_scale 468 12.0 0.298 0.300 0.298 0.300 multiply_cannon_multrec_finali 2035 16.5 0.005 0.005 0.267 0.268 fft_wrap_pw1pw2_10 234 13.2 0.001 0.001 0.264 0.267 dbcsr_mm_multrec_finalize 2035 17.5 0.022 0.023 0.262 0.263 dbcsr_make_untransposed_blocks 2481 13.4 0.242 0.244 0.254 0.256 pw_multiply_with 117 11.5 0.253 0.254 0.253 0.254 acc_transpose_blocks_kernels 4070 16.5 0.051 0.051 0.246 0.249 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="w64PBE", label="w64PBE", y=246.718, yerr=0.0 Plot: name="w64PBE_timings_6cpu_1gpu", title="Timings of w64PBE with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="rest", label="rest", y=123.44399999999999, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=41.894, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=33.217, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=19.161, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="pbe_lda_eval", label="pbe_lda_eval", y=18.19, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.812, yerr=0.0 Running w64SCAN.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/w64SCAN_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.204 0.207 933.095 933.095 qs_mol_dyn_low 1 2.0 0.004 0.004 930.722 930.726 qs_forces 11 3.9 0.002 0.002 930.672 930.673 qs_energies 11 4.9 0.001 0.001 836.555 836.557 scf_env_do_scf 11 5.9 0.001 0.002 797.241 797.242 velocity_verlet 10 3.0 0.002 0.002 741.893 741.911 scf_env_do_scf_inner_loop 106 6.8 0.006 0.009 717.631 717.631 rebuild_ks_matrix 117 8.5 0.001 0.001 645.501 645.505 qs_ks_build_kohn_sham_matrix 117 9.5 0.021 0.022 645.500 645.504 qs_ks_update_qs_env 119 7.8 0.001 0.001 566.268 566.273 fft_wrap_pw1pw2 3053 12.6 0.072 0.072 441.835 442.301 fft_wrap_pw1pw2_400 1649 13.9 0.010 0.011 423.843 424.248 qs_vxc_create 117 10.5 0.004 0.004 401.386 401.406 xc_vxc_pw_create 117 11.5 4.830 4.842 401.382 401.402 xc_rho_set_and_dset_create 117 12.5 6.178 6.195 270.373 270.877 qs_rho_update_rho_low 117 7.9 0.001 0.001 229.924 229.931 calculate_rho_elec 234 8.9 7.035 7.043 229.922 229.929 pw_gpu_c1dr3d_3d_ps 1521 15.1 124.580 124.890 221.747 221.753 pw_gpu_r3dc1d_3d_ps 1532 14.1 125.929 126.136 219.997 220.470 xc_pw_derive 702 13.5 0.011 0.012 189.536 189.941 sum_up_and_integrate 117 10.5 0.005 0.005 189.411 189.820 integrate_v_rspace 234 11.5 0.455 0.462 188.535 188.940 density_rs2pw 234 9.9 0.021 0.021 168.557 168.991 xc_functional_eval 234 13.5 0.003 0.003 165.556 166.046 libxc_lda_eval 234 14.5 165.546 166.036 165.553 166.043 xc_pw_divergence 117 12.5 0.007 0.007 124.801 125.193 potential_pw2rs 234 12.5 0.298 0.299 99.722 99.857 grid_integrate_task_list 234 12.5 88.357 88.904 88.357 88.904 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 80.018 80.018 init_scf_loop 13 6.8 0.001 0.001 79.543 79.543 mp_alltoall_z22v 3053 16.6 74.788 76.002 74.788 76.002 grid_collocate_task_list 234 9.9 54.166 54.631 54.166 54.631 yz_to_x 1532 15.1 7.798 7.808 45.987 46.789 x_to_yz 1521 16.1 9.122 9.127 45.721 46.149 transfer_rs2pw 947 10.9 0.020 0.020 36.805 37.318 transfer_rs2pw_400 245 11.8 26.242 26.450 32.153 32.624 pw_gpu_sf 1521 16.1 31.284 31.385 31.284 31.385 pw_gpu_fg 1532 15.1 30.503 30.593 30.503 30.593 transfer_pw2rs 947 13.5 0.016 0.016 30.311 30.318 transfer_pw2rs_400 245 14.3 21.446 21.713 26.916 26.947 init_scf_run 11 5.9 0.000 0.000 25.028 25.028 scf_env_initial_rho_setup 11 6.9 0.000 0.001 25.028 25.028 wfi_extrapolate 11 7.9 0.002 0.002 21.355 21.355 pw_gpu_ffc 1521 16.1 20.134 20.159 20.134 20.159 dbcsr_multiply_generic 2100 12.6 0.150 0.151 18.916 19.334 pw_poisson_solve 117 10.5 0.003 0.004 17.727 17.730 pw_gpu_cff 1532 15.1 17.417 17.449 17.417 17.449 fft_wrap_pw1pw2_140 468 13.2 0.003 0.003 14.160 14.170 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 13.928 14.095 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 13.788 13.788 qs_scf_new_mos 106 7.8 0.001 0.001 13.772 13.775 qs_scf_loop_do_ot 106 8.8 0.001 0.001 13.771 13.774 ot_scf_mini 106 9.8 0.003 0.003 12.344 12.349 pw_derive 1053 13.8 12.149 12.152 12.149 12.152 multiply_cannon 2100 13.6 0.311 0.312 9.280 9.281 mp_waitall_1 59747 17.0 9.016 9.248 9.016 9.248 pw_copy 2223 13.1 9.132 9.135 9.132 9.135 prepare_preconditioner 13 7.8 0.000 0.000 8.675 8.684 make_preconditioner 13 8.8 0.000 0.000 8.675 8.683 pw_integral_ab_c1d_c1d_gs 117 11.5 8.321 8.482 8.664 8.670 multiply_cannon_loop 2100 14.6 0.249 0.250 8.222 8.233 mp_sendrecv_dv 947 12.9 7.147 7.858 7.147 7.858 make_m2s 4200 13.6 0.044 0.044 7.627 7.639 ot_mini 106 10.8 0.001 0.001 7.480 7.484 make_images 4200 14.6 1.046 1.049 7.442 7.453 qs_env_update_s_mstruct 11 6.9 0.000 0.000 7.158 7.167 pw_poisson_set 118 11.5 0.006 0.006 6.715 6.718 build_core_ppl_forces 11 5.9 6.491 6.664 6.491 6.664 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 6.248 6.287 pw_axpy 1638 11.7 5.966 5.972 5.966 5.972 calculate_rho_core 11 7.9 0.458 0.459 5.098 5.135 build_kinetic_matrix_low 22 6.9 4.691 4.699 4.776 4.784 build_overlap_matrix_low 22 6.9 4.629 4.629 4.702 4.703 qs_ot_get_derivative 106 11.8 0.002 0.002 4.648 4.655 multiply_cannon_multrec 4200 15.6 1.794 1.810 4.358 4.374 make_full_single_inverse 13 9.8 0.002 0.002 4.248 4.249 hybrid_alltoall_any 4338 16.5 2.760 2.768 3.978 3.979 transfer_rs2pw_140 234 11.9 2.931 2.941 3.922 3.976 make_images_data 4200 15.6 0.054 0.054 3.963 3.964 make_full_inverse_cholesky 13 9.8 0.000 0.000 3.652 3.800 fft_wrap_pw1pw2_50 468 13.2 0.003 0.003 2.841 2.873 ot_diis_step 106 11.8 0.006 0.006 2.812 2.812 transfer_pw2rs_140 234 14.5 1.744 1.750 2.737 2.760 arnoldi_generalized_ev 13 10.8 0.000 0.000 2.716 2.716 build_core_ppl 11 7.9 2.664 2.704 2.664 2.704 dbcsr_sym_matrix_vector_mult 1206 12.5 0.036 0.036 2.685 2.685 gev_build_subspace 22 11.5 0.010 0.010 2.504 2.504 dbcsr_complete_redistribute 312 11.8 1.014 1.016 2.309 2.467 apply_preconditioner_dbcsr 119 12.8 0.000 0.000 2.374 2.383 apply_single 119 13.8 0.001 0.001 2.373 2.382 dbcsr_sym_matrix_vector_mult_l 1206 13.5 2.298 2.308 2.304 2.314 dbcsr_mm_accdrv_process 9484 16.3 0.622 0.623 2.301 2.304 pw_zero 702 12.6 2.161 2.167 2.161 2.167 qs_ot_get_derivative_taylor 89 12.9 0.004 0.004 2.135 2.141 calculate_dm_sparse 117 9.7 0.001 0.001 2.131 2.134 qs_init_subsys 1 2.0 0.001 0.001 2.083 2.083 qs_env_setup 1 3.0 0.000 0.000 2.073 2.075 qs_env_rebuild_pw_env 23 5.3 0.000 0.000 2.073 2.075 pw_env_rebuild 1 5.0 0.000 0.000 2.073 2.075 pw_grid_setup 4 6.0 0.000 0.000 2.007 2.009 pw_grid_setup_internal 4 7.0 0.022 0.023 1.973 1.975 cp_dbcsr_sm_fm_multiply 45 9.4 0.002 0.002 1.944 1.946 qs_create_task_list 11 7.9 0.000 0.000 1.818 1.847 generate_qs_task_list 11 8.9 0.933 0.940 1.818 1.847 mp_sum_d 3885 11.5 1.243 1.774 1.243 1.774 qs_ot_get_p 119 10.6 0.001 0.001 1.770 1.772 multiply_cannon_sync_h2d 4200 15.6 1.757 1.758 1.757 1.758 copy_dbcsr_to_fm 138 10.8 0.004 0.004 1.719 1.738 pw_grid_sort 4 8.0 1.199 1.203 1.625 1.630 dbcsr_special_finalize 6300 15.6 0.035 0.036 1.496 1.496 copy_fm_to_dbcsr 174 10.8 0.002 0.002 1.334 1.482 cp_dbcsr_sm_fm_multiply_core 45 10.4 0.000 0.000 1.459 1.460 dbcsr_merge_single_wm 4200 16.6 0.132 0.134 1.380 1.380 integrate_v_core_rspace 11 7.9 0.157 0.158 1.377 1.377 cp_fm_cholesky_invert 13 10.8 1.317 1.318 1.317 1.318 multiply_cannon_metrocomm1 4200 15.6 0.012 0.012 1.271 1.301 dbcsr_dot 1134 12.2 1.192 1.193 1.258 1.259 mp_sum_l 6329 13.5 0.826 1.250 0.826 1.250 transfer_dbcsr_to_fm 13 10.8 0.001 0.002 1.227 1.239 calculate_first_density_matrix 1 7.0 0.000 0.000 1.186 1.186 jit_kernel_multiply 12 15.0 1.143 1.145 1.143 1.145 pw_scale 585 11.9 1.121 1.122 1.121 1.122 dbcsr_sort_data 4200 17.6 0.968 0.968 0.968 0.968 dbcsr_finalize 4788 14.0 0.064 0.064 0.954 0.964 fft_wrap_pw1pw2_20 468 13.2 0.002 0.002 0.920 0.959 cp_dbcsr_plus_fm_fm_t 22 8.9 0.001 0.001 0.944 0.945 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="w64SCAN", label="w64SCAN", y=933.095, yerr=0.0 Plot: name="w64SCAN_timings_6cpu_1gpu", title="Timings of w64SCAN with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="rest", label="rest", y=353.895, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="libxc_lda_eval", label="libxc_lda_eval", y=165.546, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=125.929, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=124.58, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=88.357, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=74.788, yerr=0.0 Running GW_PBE_4benzene.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.019 0.021 113.146 113.146 qs_energies 1 2.0 0.000 0.000 112.792 112.794 mp2_main 1 3.0 0.000 0.000 106.023 106.025 mp2_gpw_main 1 4.0 0.000 0.000 104.284 104.286 rpa_ri_compute_en 1 5.0 0.000 0.000 95.936 95.938 rpa_num_int 1 6.0 0.001 0.001 95.926 95.928 dbt_total 2336 9.6 0.022 0.022 76.462 76.464 compute_mat_P_omega 1 7.0 0.001 0.002 72.954 72.956 compute_mat_P_omega_contract 10 8.0 5.587 5.609 72.618 72.636 dbt_contract 787 11.0 0.051 0.052 49.808 49.809 dbt_tas_total 1149 12.2 0.152 0.152 38.264 38.265 dbt_tas_multiply 807 12.1 0.003 0.003 37.530 37.531 dbt_tas_dbm 807 14.1 0.006 0.006 28.787 28.787 dbm_multiply 807 16.1 27.281 27.408 27.281 27.408 dbt_copy 1107 10.7 0.070 0.071 27.269 27.275 compute_mat_P_omega_calc_M_occ 250 9.0 5.635 5.678 25.548 25.548 dbt_tas_mm_1N 524 15.1 0.003 0.003 18.545 18.776 dbt_reshape 594 11.8 7.722 7.786 18.504 18.635 compute_QP_energies 1 7.0 0.000 0.000 16.354 16.354 compute_self_energy_cubic_gw 1 8.0 0.124 0.126 16.354 16.354 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 15.876 15.877 dbt_tas_reserve_blocks_index 3266 14.3 0.697 0.712 11.225 11.379 dbm_reserve_blocks 3634 15.3 10.848 10.992 10.848 10.992 dbt_crop 1042 12.0 7.236 7.269 9.569 9.588 dbt_reserve_blocks_index 2347 13.0 0.336 0.346 9.445 9.462 dbt_reserve_blocks_index_array 2289 12.1 0.012 0.013 9.245 9.246 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 9.047 9.047 mp_waitall_2 2656 15.9 8.874 8.969 8.874 8.969 mp2_ri_gpw_compute_in 1 5.0 0.001 0.002 8.337 8.337 dbt_communicate_buffer 594 12.8 0.013 0.013 8.013 8.129 dbt_tas_mm_2 251 15.0 0.003 0.003 7.686 7.686 contract_cubic_gw 21 9.0 0.000 0.000 7.581 7.581 scf_env_do_scf 1 3.0 0.000 0.000 6.213 6.213 scf_env_do_scf_inner_loop 17 4.0 0.001 0.002 6.213 6.213 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 6.003 6.006 compute_mat_P_omega_copy_M_occ 250 9.0 0.002 0.002 5.713 5.728 dbcsr_multiply_generic 30 8.1 0.003 0.003 4.851 4.879 dbt_tas_copy 511 11.5 2.605 2.610 4.516 4.669 multiply_cannon 30 9.1 0.009 0.014 4.643 4.669 multiply_cannon_loop 30 10.1 0.005 0.005 4.579 4.608 multiply_cannon_multrec 60 11.1 0.276 0.290 3.915 3.950 get_2c_integrals 1 6.0 0.000 0.000 3.843 3.843 trace_sigma_gw 21 9.0 0.596 0.609 3.828 3.828 dbcsr_mm_accdrv_process 328 12.3 0.022 0.022 3.339 3.371 jit_kernel_multiply 17 11.6 3.310 3.342 3.310 3.342 mp_sync 8688 11.6 3.185 3.335 3.185 3.335 qs_scf_new_mos 17 5.0 0.000 0.000 3.176 3.210 compute_2c_integrals 1 7.0 0.000 0.000 3.048 3.048 dbt_split_copyback 70 10.6 1.169 1.191 2.743 2.754 convert_to_new_pgrid 2421 14.1 0.037 0.038 2.515 2.518 qs_ks_build_kohn_sham_matrix 18 6.9 0.002 0.002 2.502 2.505 fft_wrap_pw1pw2 301 10.2 0.005 0.005 2.496 2.503 dbm_copy 1614 15.1 2.477 2.481 2.477 2.481 qs_ks_update_qs_env 17 5.0 0.000 0.000 2.471 2.473 fill_fm_L_from_L_loc_non_block 1 8.0 0.000 0.000 2.444 2.470 rebuild_ks_matrix 17 6.0 0.000 0.000 2.464 2.466 mp2_ri_gpw_compute_in_copy_3c 6 6.0 0.235 0.238 2.313 2.445 fill_fm_L_from_L_loc_non_block 1 9.0 2.344 2.370 2.344 2.370 build_3c_integrals 5 6.0 1.514 1.564 2.172 2.304 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="GW_PBE_4benzene", label="GW_PBE_4benzene", y=113.146, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_6cpu_1gpu", title="Timings of GW_PBE_4benzene with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="rest", label="rest", y=51.185, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=27.281, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=10.848, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=8.874, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=7.722, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_crop", label="dbt_crop", y=7.236, yerr=0.0 Running RI-HFX_H2O-32.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.021 0.022 212.255 212.255 qs_forces 1 2.0 0.000 0.000 211.788 211.788 rebuild_ks_matrix 7 6.6 0.000 0.000 207.694 207.694 qs_ks_build_kohn_sham_matrix 7 7.6 0.002 0.002 207.694 207.694 hfx_ks_matrix 7 8.6 0.000 0.000 203.677 203.677 dbt_total 849 11.0 0.009 0.009 152.661 152.661 hfx_ri_update_ks 7 9.6 0.000 0.000 118.618 118.618 hfx_ri_update_ks_Pmat 7 10.6 23.277 23.377 118.612 118.613 qs_energies 1 3.0 0.000 0.000 113.839 113.839 scf_env_do_scf 1 4.0 0.000 0.000 111.881 111.881 qs_ks_update_qs_env 8 6.0 0.000 0.000 109.791 109.791 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 97.910 97.910 dbt_contract 207 12.4 0.056 0.056 87.697 87.697 hfx_ri_update_forces 1 7.0 1.085 1.093 85.057 85.057 dbt_tas_total 369 13.4 0.083 0.083 71.163 71.163 dbt_tas_multiply 216 13.5 0.001 0.001 68.164 68.165 dbt_copy 423 11.8 0.047 0.047 59.940 60.597 scf_env_do_scf_inner_loop 6 5.0 0.000 0.001 57.797 57.797 init_scf_loop 2 5.0 0.000 0.000 54.082 54.082 dbt_tas_dbm 216 15.5 0.002 0.002 53.395 53.396 hfx_ri_forces_Pmat_3c 1 8.0 3.525 3.553 50.562 50.563 dbm_multiply 216 17.5 50.076 50.125 50.076 50.125 dbt_reshape 175 13.2 20.680 21.085 46.394 46.875 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 33.591 33.591 precalc_derivatives 1 8.0 1.879 1.884 28.283 28.283 mp_waitall_2 1022 16.5 23.775 23.916 23.775 23.916 dbt_tas_mm_2 91 16.5 0.001 0.001 22.366 22.366 dbt_communicate_buffer 175 14.2 0.005 0.005 19.710 19.864 dbt_tas_reserve_blocks_index 1323 15.4 1.758 1.768 19.084 19.317 dbt_crop 372 13.7 14.755 14.766 19.073 19.136 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 18.475 18.475 dbm_reserve_blocks 1491 16.3 17.980 18.211 17.980 18.211 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 17.417 17.417 dbt_tas_mm_3T 77 17.1 0.001 0.001 16.867 17.309 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 16.100 16.100 dbt_reserve_blocks_index 889 14.5 0.633 0.640 15.618 15.823 build_3c_derivatives 3 9.0 2.633 2.648 15.635 15.635 dbt_reserve_blocks_index_array 859 13.5 0.007 0.007 15.331 15.531 dbt_tas_mm_3N 37 15.4 0.000 0.000 11.555 11.631 dbt_tas_copy 248 12.5 4.466 4.527 8.312 8.403 mp_sync 2901 12.8 7.149 7.961 7.149 7.961 hfx_ri_pre_scf_Pmat_copy_2 9 13.0 2.060 2.087 5.595 5.622 hfx_ri_pre_scf_Pmat_int 1 13.0 0.000 0.000 5.412 5.412 dbt_tas_replicate 168 15.1 2.428 2.450 5.320 5.344 hfx_ri_pre_scf_calc_tensors 1 14.0 0.003 0.004 4.670 4.670 hfx_ri_pre_scf_Pmat_RIx3C 9 13.0 0.000 0.000 4.429 4.443 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-HFX_H2O-32", label="RI-HFX_H2O-32", y=212.255, yerr=0.0 Plot: name="RI-HFX_H2O-32_timings_6cpu_1gpu", title="Timings of RI-HFX_H2O-32 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="rest", label="rest", y=76.46699999999998, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=50.076, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=23.775, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=23.277, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=20.68, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=17.98, yerr=0.0 Running RI-MP2_ammonia.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.010 0.011 105.416 105.416 qs_energies 1 2.0 0.000 0.000 105.235 105.235 mp2_main 1 3.0 0.000 0.000 98.116 98.116 mp2_gpw_main 1 4.0 0.001 0.001 97.716 97.716 mp2_ri_gpw_compute_in 1 5.0 0.585 0.586 52.767 52.773 mp2_ri_gpw_compute_en 1 5.0 0.099 0.100 44.885 44.891 mp2_ri_gpw_compute_in_loop 1 6.0 0.014 0.015 44.469 44.475 mp2_ri_gpw_compute_en_RI_loop 1 6.0 13.048 13.051 42.163 42.163 dbcsr_multiply_generic 2666 8.0 0.166 0.167 23.228 23.372 ao_to_mo_and_store_B_mult_1 1328 7.0 0.015 0.015 22.302 22.445 mp2_ri_gpw_compute_en_expansio 1040 7.0 0.754 0.757 16.641 16.666 mp2_eri_3c_integrate_gpw 1328 7.0 0.019 0.019 16.134 16.193 local_gemm 1040 8.0 15.887 15.916 15.887 15.916 make_m2s 5332 9.0 0.056 0.057 13.022 13.102 make_images 5332 10.0 2.323 2.335 12.832 12.913 multiply_cannon 2666 9.0 0.422 0.424 9.511 9.739 hybrid_alltoall_any 6683 11.6 8.588 8.650 8.874 8.939 make_images_data 5332 11.0 0.074 0.076 8.780 8.849 multiply_cannon_loop 2666 10.0 0.207 0.210 8.261 8.432 integrate_v_rspace 1338 8.0 1.079 1.086 7.770 7.791 fft_wrap_pw1pw2 26668 10.4 0.143 0.144 7.700 7.732 get_2c_integrals 1 6.0 0.005 0.005 7.712 7.713 collocate_function 1328 8.0 5.209 5.215 7.230 7.254 compute_2c_integrals 1 7.0 0.007 0.008 7.109 7.110 compute_2c_integrals_loop_lm 1 8.0 0.014 0.023 6.929 6.931 mp2_eri_2c_integrate_gpw 1 9.0 2.110 2.116 6.914 6.925 scf_env_do_scf 1 3.0 0.000 0.000 6.191 6.192 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 6.191 6.192 mp2_ri_gpw_compute_en_comm 221 7.0 1.069 1.081 5.919 5.970 ao_to_mo_and_store_B_E_Ex_1 1328 7.0 3.903 3.937 5.790 5.860 mp2_ri_gpw_compute_en_ener 1040 7.0 5.353 5.383 5.353 5.383 grid_integrate_task_list 1338 9.0 5.363 5.382 5.363 5.382 qs_scf_new_mos 10 5.0 0.000 0.000 4.518 4.522 fft_wrap_pw1pw2_20 10647 11.4 0.023 0.024 4.395 4.421 multiply_cannon_multrec 2676 11.0 1.978 2.094 4.094 4.211 pw_gpu_r3dc1d_3d 13282 12.2 3.758 3.819 3.758 3.819 mp_sendrecv_dm3 442 8.0 3.772 3.807 3.772 3.807 eigensolver 11 5.8 0.002 0.002 3.172 3.174 potential_pw2rs 2666 10.0 0.105 0.106 2.717 2.782 pw_gpu_c1dr3d_3d 13280 12.7 2.687 2.715 2.687 2.715 cp_fm_diag_elpa 11 6.8 0.000 0.000 2.503 2.504 cp_fm_diag_elpa_base 11 7.8 2.413 2.431 2.502 2.502 copy_dbcsr_to_fm 1351 8.0 0.035 0.035 2.334 2.367 collocate_single_gaussian 1328 10.0 0.098 0.099 2.293 2.361 fft_wrap_pw1pw2_10 15957 11.5 0.021 0.021 2.334 2.342 replicate_iaK_2intgroup 1 6.0 2.181 2.182 2.325 2.325 fill_local_i_aL 884 7.5 2.280 2.285 2.280 2.285 mp2_eri_2c_integrate_gpw_pot_l 1328 10.0 0.004 0.004 2.181 2.241 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-MP2_ammonia", label="RI-MP2_ammonia", y=105.416, yerr=0.0 Plot: name="RI-MP2_ammonia_timings_6cpu_1gpu", title="Timings of RI-MP2_ammonia with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="rest", label="rest", y=57.177, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="local_gemm", label="local_gemm", y=15.887, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=13.048, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=8.588, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=5.363, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_ener", label="mp2_ri_gpw_compute_en_ener", y=5.353, yerr=0.0 Running diag_cu144_broy.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/diag_cu144_broy_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.080 0.080 207.824 207.824 qs_energies 1 2.0 0.000 0.000 206.688 206.688 scf_env_do_scf 1 3.0 0.000 0.000 193.091 193.092 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 193.091 193.091 qs_ks_update_qs_env 15 5.0 0.000 0.000 105.118 105.138 rebuild_ks_matrix 15 6.0 0.000 0.000 104.908 104.929 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 104.908 104.929 qs_vxc_create 15 8.0 0.060 0.120 61.486 61.513 qs_scf_new_mos 15 5.0 0.001 0.001 55.998 56.006 fft_wrap_pw1pw2 1086 10.0 0.028 0.028 54.043 54.045 calculate_dispersion_nonloc 15 9.0 11.120 11.171 52.794 52.825 eigensolver 15 6.0 0.002 0.002 45.799 45.961 sum_up_and_integrate 15 8.0 0.000 0.000 41.887 41.895 integrate_v_rspace 15 9.0 0.049 0.050 41.862 41.870 grid_integrate_task_list 15 10.0 34.310 34.331 34.310 34.331 qs_rho_update_rho_low 16 5.0 0.000 0.000 29.368 29.369 calculate_rho_elec 16 6.0 0.187 0.188 29.368 29.369 pw_gpu_c1dr3d_3d_ps 585 12.1 5.743 5.795 28.155 28.228 cp_fm_diag_elpa 15 7.0 0.000 0.000 28.027 28.032 cp_fm_diag_elpa_base 15 8.0 26.163 26.786 28.021 28.022 fft_wrap_pw1pw2_150 765 11.0 0.005 0.005 27.911 27.934 pw_gpu_r3dc1d_3d_ps 501 11.9 5.348 5.632 25.854 25.924 grid_collocate_task_list 16 7.0 17.431 17.458 17.431 17.458 cp_fm_cholesky_restore 45 7.0 15.787 16.563 15.787 16.563 fft_wrap_pw1pw2_200 197 11.3 0.001 0.001 13.643 13.679 density_rs2pw 16 7.0 0.002 0.002 11.732 11.753 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 10.174 10.174 mp_alltoall_z22v 1086 14.0 9.577 9.911 9.577 9.911 vdW_energy 15 10.0 9.715 9.755 9.715 9.755 pw_gpu_ffc 585 13.1 9.171 9.191 9.171 9.191 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 8.862 8.875 xc_vxc_pw_create 15 9.0 0.186 0.187 8.631 8.633 pw_gpu_cff 501 12.9 8.550 8.589 8.550 8.589 potential_pw2rs 15 10.0 0.007 0.007 7.502 7.531 pw_gpu_sf 585 13.1 7.041 7.053 7.041 7.053 copy_dbcsr_to_fm 16 5.9 0.001 0.001 6.968 7.022 pw_gpu_fg 501 12.9 6.577 6.602 6.577 6.602 x_to_yz 585 13.1 1.038 1.039 6.167 6.220 dbcsr_complete_redistribute 46 8.3 1.818 1.835 5.976 6.137 fft_wrap_pw1pw2_10 62 10.5 0.000 0.000 5.633 5.636 yz_to_x 501 12.9 0.873 0.875 5.322 5.601 xc_pw_derive 90 11.0 0.001 0.001 5.068 5.084 cp_fm_uplo_to_full 30 8.0 3.841 5.072 3.841 5.072 xc_rho_set_and_dset_create 15 10.0 0.132 0.134 5.001 5.016 build_core_ppnl 1 5.0 4.965 4.967 4.965 4.967 gspace_mixing 14 5.0 0.132 0.132 4.248 4.248 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="diag_cu144_broy", label="diag_cu144_broy", y=207.824, yerr=0.0 Plot: name="diag_cu144_broy_timings_6cpu_1gpu", title="Timings of diag_cu144_broy with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="rest", label="rest", y=103.013, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=34.31, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=26.163, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=17.431, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=15.787, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="calculate_dispersion_nonloc", label="calculate_dispersion_nonloc", y=11.12, yerr=0.0 Running bench_dftb.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/bench_dftb_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 2.161 2.283 164.633 164.633 qs_energies 1 2.0 0.000 0.000 162.273 162.274 ls_scf 1 3.0 0.000 0.000 154.934 154.934 ls_scf_main 1 4.0 0.000 0.001 143.359 143.359 density_matrix_trs4 5 5.0 0.004 0.004 114.366 114.377 dbcsr_multiply_generic 95 6.2 0.165 0.166 98.683 98.688 multiply_cannon 95 7.2 1.648 2.204 69.697 69.752 multiply_cannon_loop 95 8.2 0.178 0.179 58.070 58.104 multiply_cannon_multrec 190 9.2 44.047 44.105 49.367 49.436 ls_scf_dm_to_ks 5 5.0 0.000 0.000 27.036 27.037 make_m2s 190 7.2 0.015 0.015 24.696 24.710 make_images 190 8.2 5.448 5.555 24.125 24.139 matrix_ls_to_qs 5 6.0 0.000 0.000 17.949 17.968 dbcsr_complete_redistribute 11 7.5 10.918 10.972 15.432 15.481 matrix_decluster 5 7.0 0.000 0.000 14.015 14.048 arnoldi_extremal 6 6.2 0.000 0.000 11.766 11.769 arnoldi_normal_ev 6 7.2 0.005 0.005 11.766 11.769 build_subspace 12 8.2 0.033 0.033 11.527 11.527 qs_ks_update_qs_env 6 6.2 0.000 0.000 11.002 11.022 rebuild_ks_matrix 6 7.2 0.000 0.000 10.660 10.661 build_dftb_ks_matrix 6 8.2 0.001 0.001 10.660 10.661 dbcsr_matrix_vector_mult 310 9.0 0.078 0.079 10.431 10.447 build_dftb_coulomb 6 9.2 0.800 0.806 10.346 10.347 make_images_data 190 9.2 0.007 0.007 10.149 10.225 dbcsr_matrix_vector_mult_local 310 10.0 9.929 9.945 9.933 9.949 hybrid_alltoall_any 201 10.0 6.627 6.664 9.769 9.839 ls_scf_init_scf 1 4.0 0.000 0.000 9.808 9.808 tb_ewald_overlap 6 10.2 9.231 9.264 9.231 9.264 calculate_norms 380 9.2 8.054 8.072 8.054 8.072 dbcsr_finalize 277 7.6 0.092 0.092 7.856 7.935 ls_scf_init_matrix_S 1 5.0 0.000 0.000 7.856 7.859 dbcsr_merge_all 247 8.6 1.596 1.683 7.235 7.317 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 7.278 7.279 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.000 7.126 7.128 build_qs_neighbor_lists 1 4.0 0.000 0.000 6.665 6.705 build_neighbor_lists_sab_tbe 1 5.0 6.473 6.513 6.473 6.513 dbcsr_data_new 3509 9.3 4.566 5.249 4.566 5.249 setup_rec_index_2d 190 8.2 5.220 5.236 5.220 5.236 dbcsr_special_finalize 285 9.2 0.005 0.005 4.873 4.878 dbcsr_copy 443 8.0 0.921 0.964 4.867 4.877 dbcsr_add_d 130 6.0 0.001 0.001 4.567 4.653 dbcsr_add_anytype 130 7.0 1.915 1.919 4.566 4.652 dbcsr_sort_indices 643 10.1 4.611 4.616 4.611 4.616 dbcsr_dot 66 6.3 3.934 3.954 4.187 4.241 dbcsr_mm_accdrv_process 8119 10.0 0.427 0.479 4.223 4.225 dbcsr_mm_multrec_init 95 8.2 0.000 0.000 3.682 4.185 dbcsr_mm_csr_init 95 9.2 0.006 0.006 3.681 4.185 dbcsr_mm_sched_init 95 10.2 0.000 0.000 3.649 4.152 dbcsr_mm_accdrv_init 95 11.2 0.303 0.481 3.649 4.152 dbcsr_copy_into_existing 5 8.0 3.934 3.986 3.934 3.986 dbcsr_mm_accdrv_process_sort 8119 11.0 3.741 3.742 3.741 3.742 tree_to_linear_d 11 10.5 3.650 3.657 3.650 3.657 mp_waitall_1 2666 10.6 3.542 3.558 3.542 3.558 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="bench_dftb", label="bench_dftb", y=164.633, yerr=0.0 Plot: name="bench_dftb_timings_6cpu_1gpu", title="Timings of bench_dftb with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="rest", label="rest", y=82.45400000000001, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=44.047, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=10.918, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=9.929, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="tb_ewald_overlap", label="tb_ewald_overlap", y=9.231, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="calculate_norms", label="calculate_norms", y=8.054, yerr=0.0 Running dbcsr.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/dbcsr_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.004 0.005 50.134 50.134 lib_test 1 2.0 0.000 0.000 50.127 50.128 dbcsr_run_tests 3 3.0 0.000 0.001 50.127 50.127 test_multiplies_multiproc 3 4.0 0.001 0.001 38.779 38.805 dbcsr_multiply_generic 9 5.0 0.003 0.004 29.972 29.981 multiply_cannon 9 6.0 0.202 0.387 19.687 20.131 multiply_cannon_loop 9 7.0 0.003 0.003 18.226 18.555 multiply_cannon_multrec 18 8.0 9.613 9.904 16.932 17.263 dbcsr_make_random_matrix 9 4.0 7.865 7.888 11.203 11.230 dbcsr_finalize 27 5.7 0.001 0.001 7.600 7.614 dbcsr_merge_all 18 6.5 3.771 3.781 7.483 7.492 dbcsr_mm_accdrv_process 8199 9.0 1.166 1.244 7.102 7.135 dbcsr_redistribute 9 5.0 3.661 3.679 6.151 6.160 make_m2s 18 6.0 0.001 0.001 5.275 5.286 make_images 18 7.0 0.381 0.383 5.239 5.250 dbcsr_mm_accdrv_process_sort 8199 10.0 4.845 4.869 4.845 4.869 make_images_data 18 8.0 0.001 0.001 3.090 3.102 hybrid_alltoall_any 18 9.0 2.546 2.551 3.038 3.048 mp_alltoall_d11v 27 6.0 2.210 2.211 2.210 2.211 tree_to_linear_d 9 7.0 1.916 1.918 1.916 1.918 dbcsr_data_copy_aa2 18 7.5 1.652 1.653 1.652 1.653 dbcsr_data_release 507 7.7 1.460 1.460 1.460 1.460 jit_kernel_multiply 6 10.0 1.091 1.160 1.091 1.160 dbcsr_checksum 6 5.0 1.085 1.093 1.098 1.098 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="dbcsr", label="dbcsr", y=50.134, yerr=0.0 Plot: name="dbcsr_timings_6cpu_1gpu", title="Timings of dbcsr with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="rest", label="rest", y=20.379, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=9.613, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=7.865, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_mm_accdrv_process_sort", label="dbcsr_mm_accdrv_process_sort", y=4.845, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.771, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_redistribute", label="dbcsr_redistribute", y=3.661, yerr=0.0 Running MQAE_single_node.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/MQAE_single_node_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.045 0.047 215.803 215.803 qs_mol_dyn_low 1 2.0 0.004 0.005 214.180 214.216 qs_forces 6 3.8 0.001 0.001 131.687 131.687 qs_energies 6 4.8 0.001 0.001 124.179 124.179 scf_env_do_scf 6 5.8 0.000 0.000 117.185 117.185 scf_env_do_scf_inner_loop 113 6.2 0.006 0.008 109.826 109.826 velocity_verlet 5 3.0 0.003 0.003 104.364 104.415 rebuild_ks_matrix 119 8.1 0.000 0.000 90.813 90.816 qs_ks_build_kohn_sham_matrix 119 9.1 0.020 0.020 90.813 90.816 qs_ks_update_qs_env 119 7.3 0.001 0.001 85.670 85.672 fft_wrap_pw1pw2 2059 12.4 0.044 0.045 71.428 71.499 fft_wrap_pw1pw2_150 1321 13.9 0.009 0.010 68.487 68.571 qs_vxc_create 119 10.1 0.004 0.004 57.475 57.477 xc_vxc_pw_create 119 11.1 1.599 1.603 57.471 57.473 qmmm_el_coupling 6 3.8 0.000 0.000 43.811 43.828 qmmm_elec_with_gaussian 6 4.8 0.023 0.023 43.805 43.822 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 42.209 42.310 xc_pw_derive 714 13.1 0.010 0.010 40.022 40.024 pw_gpu_c1dr3d_3d_ps 1095 14.8 10.711 10.727 38.426 38.506 qmmm_elec_gaussian_low_G 6 6.8 37.245 37.330 37.245 37.330 qmmm_forces 6 3.8 0.002 0.002 35.637 35.637 qmmm_forces_with_gaussian 6 4.8 0.023 0.024 34.498 35.266 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 33.081 33.847 pw_gpu_r3dc1d_3d_ps 964 14.0 9.659 9.671 32.945 32.952 xc_rho_set_and_dset_create 119 12.1 2.522 2.527 28.720 28.737 qmmm_forces_gaussian_low_G 6 6.8 27.848 28.618 27.848 28.618 xc_pw_divergence 119 12.1 0.006 0.006 26.755 26.755 qs_rho_update_rho_low 119 7.3 0.001 0.001 23.976 24.208 calculate_rho_elec 119 8.3 1.158 1.158 23.975 24.207 mp_alltoall_z22v 2059 16.4 18.421 18.488 18.421 18.488 density_rs2pw 119 9.3 0.008 0.008 17.643 17.880 sum_up_and_integrate 119 10.1 0.002 0.002 16.544 16.550 integrate_v_rspace 119 11.1 0.022 0.023 16.355 16.360 x_to_yz 1095 15.8 2.342 2.348 12.355 12.388 dbcsr_multiply_generic 2598 12.3 0.103 0.105 11.358 11.528 potential_pw2rs 119 12.1 0.035 0.035 10.629 10.631 yz_to_x 964 15.0 1.809 1.821 10.217 10.233 multiply_cannon 2598 13.3 0.233 0.234 9.693 9.940 qs_ks_ddapc 119 10.1 0.002 0.002 9.504 9.536 multiply_cannon_loop 2598 14.3 0.262 0.263 9.182 9.424 pw_gpu_sf 1095 15.8 8.658 8.669 8.658 8.669 pw_gpu_fg 964 15.0 8.002 8.023 8.002 8.023 init_scf_loop 6 6.8 0.000 0.000 7.356 7.356 qs_scf_new_mos 113 7.2 0.001 0.001 7.064 7.066 qs_scf_loop_do_ot 113 8.2 0.001 0.001 7.063 7.065 multiply_cannon_multrec 5196 15.3 3.172 3.256 6.783 6.864 ot_scf_mini 113 9.2 0.002 0.002 6.770 6.771 pw_gpu_ffc 1095 15.8 6.684 6.771 6.684 6.771 grid_integrate_task_list 119 12.1 5.704 5.712 5.704 5.712 xc_functional_eval 238 13.1 0.003 0.003 5.528 5.534 qmmm_forces_gaussian_low_R 6 6.8 0.000 0.000 5.233 5.236 qmmm_forces_with_gaussian_LG 6 7.8 5.233 5.236 5.233 5.236 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 5.175 5.175 grid_collocate_task_list 119 9.3 5.140 5.152 5.140 5.152 pw_gpu_cff 964 15.0 4.998 5.021 4.998 5.021 qmmm_elec_gaussian_low_R 6 6.8 0.000 0.000 4.964 4.980 qmmm_elec_with_gaussian_LG 6 7.8 4.964 4.980 4.964 4.980 pw_poisson_solve 125 9.9 0.003 0.003 4.841 4.848 ot_mini 113 10.2 0.001 0.001 4.726 4.728 init_scf_run 6 5.8 0.000 0.000 4.695 4.695 scf_env_initial_rho_setup 6 6.8 0.000 0.000 4.695 4.695 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="MQAE_single_node", label="MQAE_single_node", y=215.803, yerr=0.0 Plot: name="MQAE_single_node_timings_6cpu_1gpu", title="Timings of MQAE_single_node with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="rest", label="rest", y=111.919, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=37.245, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=27.848, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=18.421, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.711, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=9.659, yerr=0.0 Summary: Performance test took 41 minutes. Status: OK ---> Removed intermediate container 912a26155513 ---> 97fcb8e89004 Step 45/46 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in 729c35f47985 ---> Removed intermediate container 729c35f47985 ---> c22f99180b74 Step 46/46 : ENTRYPOINT [] ---> Running in 0226c207f2a0 ---> Removed intermediate container 0226c207f2a0 ---> 91cdda4156ce [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built 91cdda4156ce Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2026-07-03 08:16:00+00:00