StartDate: 2026-07-01 07:12:50+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: 0d88fc2a8ea56536466e8e4a348537a19445409b CommitTime: 2026-06-30 13:32:13 +0200 CommitAuthor: SY Wang CommitSubject: Toolchain: Download libxsmm from cp2k.org (#5493) #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=0d88fc2a8ea56536466e8e4a348537a19445409b SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 420.9MB Step 1/46 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 2cd52cbb1ebe: Waiting 6e8af4fd0a07: Waiting 02cb0e091e33: Waiting 15a17189b2df: Waiting 9c3d619183d2: Waiting 7f7602a82106: Waiting 5a2aba542b08: Waiting 6cb9b761b877: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 2cd52cbb1ebe: Verifying Checksum 2cd52cbb1ebe: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 6e8af4fd0a07: Verifying Checksum 6e8af4fd0a07: Download complete 02cb0e091e33: Verifying Checksum 02cb0e091e33: Download complete 9c3d619183d2: Verifying Checksum 9c3d619183d2: Download complete 7f7602a82106: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/46 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/46 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/46 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/46 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/46 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/46 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> ae920e0abda3 Step 8/46 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> 94839a704e2d Step 9/46 : RUN mkdir scripts ---> Using cache ---> 433a8b0a0499 Step 10/46 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./tools/build_utils/fypp ./scripts/ ---> Using cache ---> f0caf6fbc4ae Step 11/46 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> cc9288039445 Step 12/46 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --with-sirius=install --gpu-ver=V100 --dry-run ---> Using cache ---> 9dccd3fdd246 Step 13/46 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> 2d0ceec22e70 Step 14/46 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> 12677ea44ea4 Step 15/46 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 40359a46f237 Step 16/46 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> d18178dd8dab Step 17/46 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> b803b9884d53 Step 18/46 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> 9efdaa647616 Step 19/46 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> 5d101777f380 Step 20/46 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> 3921f8a71127 Step 21/46 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> 7438ff89ba89 Step 22/46 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Running in 1117b89c641a ==================== Installing Libxsmm ==================== wget --quiet https://www.cp2k.org/static/downloads/libxsmm-2.0.0.tar.gz -O libxsmm-2.0.0.tar.gz libxsmm-2.0.0.tar.gz: OK Checksum of libxsmm-2.0.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libxsmm-2.0.0 Step libxsmm took 21.00 seconds. ==================== Installing LIBXS ==================== wget --quiet https://www.cp2k.org/static/downloads/libxs-1.0.0.tar.gz -O libxs-1.0.0.tar.gz libxs-1.0.0.tar.gz: OK Checksum of libxs-1.0.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libxs-1.0.0 Step libxs took 8.00 seconds. Step libxstream took 0.00 seconds. ==================== Installing ScaLAPACK ==================== wget --quiet https://www.cp2k.org/static/downloads/scalapack-2.2.3.tar.gz -O scalapack-2.2.3.tar.gz scalapack-2.2.3.tar.gz: OK Checksum of scalapack-2.2.3.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/scalapack-2.2.3 Step scalapack took 37.00 seconds. Step cusolvermp took 0.00 seconds. ==================== Installing COSMA ==================== wget --quiet https://www.cp2k.org/static/downloads/COSMA-v2.8.4.tar.gz -O COSMA-v2.8.4.tar.gz COSMA-v2.8.4.tar.gz: OK Checksum of COSMA-v2.8.4.tar.gz Ok wget --quiet https://www.cp2k.org/static/downloads/COSTA-v2.3.2.tar.gz -O COSTA-v2.3.2.tar.gz COSTA-v2.3.2.tar.gz: OK Checksum of COSTA-v2.3.2.tar.gz Ok wget --quiet https://www.cp2k.org/static/downloads/Tiled-MM-v2.3.2.tar.gz -O Tiled-MM-v2.3.2.tar.gz Tiled-MM-v2.3.2.tar.gz: OK Checksum of Tiled-MM-v2.3.2.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/COSMA-2.8.4 Step cosma took 65.00 seconds. ---> Removed intermediate container 1117b89c641a ---> 69168eba9484 Step 23/46 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> 08d70e365478 Step 24/46 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Running in b1368cf01654 ==================== Installing ELPA ==================== wget --quiet https://www.cp2k.org/static/downloads/elpa-2026.02.001.tar.gz -O elpa-2026.02.001.tar.gz elpa-2026.02.001.tar.gz: OK Checksum of elpa-2026.02.001.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/elpa-2026.02.001 Installing from scratch into /opt/cp2k-toolchain/install/elpa-2026.02.001/cpu Installing from scratch into /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia Step elpa took 307.00 seconds. ---> Removed intermediate container b1368cf01654 ---> e098e1240f33 Step 25/46 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> b9002037a1a4 Step 26/46 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Running in 3be30465cc21 ==================== Installing GSL ==================== wget --quiet https://www.cp2k.org/static/downloads/gsl-2.8.tar.gz -O gsl-2.8.tar.gz gsl-2.8.tar.gz: OK Checksum of gsl-2.8.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/gsl-2.8 Step gsl took 73.00 seconds. Step plumed took 0.00 seconds. Step libtorch took 0.00 seconds. Step gauxc took 0.00 seconds. Step deepmd took 0.00 seconds. Step ace took 0.00 seconds. ---> Removed intermediate container 3be30465cc21 ---> b4318797ecab Step 27/46 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> 24912d066e0a Step 28/46 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Running in 4c85539490f5 ==================== Installing HDF5 ==================== wget --quiet https://www.cp2k.org/static/downloads/hdf5-2.1.1.tar.gz -O hdf5-2.1.1.tar.gz hdf5-2.1.1.tar.gz: OK Checksum of hdf5-2.1.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/hdf5-2.1.1 Step hdf5 took 127.00 seconds. ==================== Installing libvdwxc ==================== wget --quiet https://www.cp2k.org/static/downloads/libvdwxc-0.5.0.tar.gz -O libvdwxc-0.5.0.tar.gz libvdwxc-0.5.0.tar.gz: OK Checksum of libvdwxc-0.5.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libvdwxc-0.5.0 Step libvdwxc took 14.00 seconds. ==================== Installing Spglib ==================== wget --quiet https://www.cp2k.org/static/downloads/spglib-2.7.0.tar.gz -O spglib-2.7.0.tar.gz spglib-2.7.0.tar.gz: OK Checksum of spglib-2.7.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/spglib-2.7.0 Step spglib took 5.00 seconds. ==================== Installing libvori ==================== wget --quiet https://www.cp2k.org/static/downloads/libvori-220621.tar.gz -O libvori-220621.tar.gz libvori-220621.tar.gz: OK Checksum of libvori-220621.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libvori-220621 Step libvori took 12.00 seconds. Step libsmeagol took 0.00 seconds. Step libfci took 0.00 seconds. ==================== Installing fmt ==================== wget --quiet https://www.cp2k.org/static/downloads/fmt-12.1.0.zip -O fmt-12.1.0.zip fmt-12.1.0.zip: OK Checksum of fmt-12.1.0.zip Ok Installing from scratch into /opt/cp2k-toolchain/install/fmt-12.1.0 Step fmt took 8.00 seconds. ---> Removed intermediate container 4c85539490f5 ---> fb9aad5b7309 Step 29/46 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> 4aafb4c3107e Step 30/46 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Running in c6d2c22b170e Step dftd4 took 0.00 seconds. ==================== Installing tblite ==================== wget --quiet https://www.cp2k.org/static/downloads/tblite-0.6.0.tar.xz -O tblite-0.6.0.tar.xz tblite-0.6.0.tar.xz: OK Checksum of tblite-0.6.0.tar.xz Ok Step tblite took 41.00 seconds. ==================== Installing pugixml ==================== wget --quiet https://www.cp2k.org/static/downloads/pugixml-1.15.tar.gz -O pugixml-1.15.tar.gz pugixml-1.15.tar.gz: OK Checksum of pugixml-1.15.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/pugixml-1.15 Step pugixml took 8.00 seconds. ==================== Installing SpFFT ==================== wget --quiet https://www.cp2k.org/static/downloads/SpFFT-1.1.1.tar.gz -O SpFFT-1.1.1.tar.gz SpFFT-1.1.1.tar.gz: OK Checksum of SpFFT-1.1.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/SpFFT-1.1.1 Step spfft took 22.00 seconds. ==================== Installing SpLA ==================== wget --quiet https://www.cp2k.org/static/downloads/SpLA-1.6.1.tar.gz -O SpLA-1.6.1.tar.gz SpLA-1.6.1.tar.gz: OK Checksum of SpLA-1.6.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/SpLA-1.6.1 Step spla took 24.00 seconds. ==================== Installing SIRIUS ==================== wget --quiet https://www.cp2k.org/static/downloads/SIRIUS-7.11.1.tar.gz -O SIRIUS-7.11.1.tar.gz SIRIUS-7.11.1.tar.gz: OK Checksum of SIRIUS-7.11.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/sirius-7.11.1 Installing from scratch into /opt/cp2k-toolchain/install/sirius-7.11.1/cuda Step sirius took 434.00 seconds. Step trexio took 0.00 seconds. Step MCL took 0.00 seconds. ---> Removed intermediate container c6d2c22b170e ---> 0c59cd7325bb Step 31/46 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> d1a5f3f45a16 Step 32/46 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Running in 9e3b08cea482 ==================== Installing DBCSR ==================== wget --quiet https://www.cp2k.org/static/downloads/dbcsr-2.10.0.tar.gz -O dbcsr-2.10.0.tar.gz dbcsr-2.10.0.tar.gz: OK Checksum of dbcsr-2.10.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/dbcsr-2.10.0 Installing from scratch into /opt/cp2k-toolchain/install/dbcsr-2.10.0-cuda Step DBCSR took 124.00 seconds. ---> Removed intermediate container 9e3b08cea482 ---> a28fcbeee182 Step 33/46 : WORKDIR /opt/cp2k ---> Running in 4ab5943ddc50 ---> Removed intermediate container 4ab5943ddc50 ---> 170d23e7c1d3 Step 34/46 : COPY ./src ./src ---> a1f7ec37c3eb Step 35/46 : COPY ./data ./data ---> 49b449f7d6c2 Step 36/46 : COPY ./tools/build_utils ./tools/build_utils ---> 4e5a5df1d854 Step 37/46 : COPY ./cmake ./cmake ---> cdc81e035375 Step 38/46 : COPY ./CMakeLists.txt . ---> 9bfbc6aa1430 Step 39/46 : COPY ./tools/docker/scripts/build_cp2k.sh ./tools/docker/scripts/cmake_cp2k.sh ./ ---> 98c57f3c9866 Step 40/46 : RUN ./build_cp2k.sh toolchain_cuda_V100 psmp ---> Running in bc6b16487174 ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpi.so (found version "5.0") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpicxx.so (found version "5.0") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpifort.so (found version "5.0") -- Found MPI: TRUE (found version "5.0") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "5.0") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.33 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a -- Using LIBXS + LIBXSMM for Small Matrix Multiplication -- CP2K_WITH_GPU is deprecated in favor of CMAKE_HIP_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "5.0") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 70 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "2.1.1") found components: C Fortran -- Found MPI: TRUE (found version "5.0") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.11/include -- Boost detected. satisfied by headers bundled with Libint2 distribution -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- mctc-lib: Find installed package -- multicharge: Find installed package -- DFTD4: found version 4.2.0, using v4.2+ API -- toml-f: Find installed package -- s-dftd3: Find installed package -- DFTD4: found version 4.2.0, using v4.2+ API -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.7.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "2.1.1") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.11/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Checking for one of the modules 's-dftd3' -- Checking for one of the modules 'mctc-lib' -- Found DFTD3: /opt/cp2k-toolchain/install/tblite-0.6.0/lib/libs-dftd3.a -- Checking for one of the modules 'dftd4' -- Checking for one of the modules 'multicharge' -- Found DFTD4: /opt/cp2k-toolchain/install/tblite-0.6.0/lib/libdftd4.a -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'scalapack' -- Checking for one of the modules 'elpa;elpa_openmp;elpa-openmp-2019.05.001;elpa_openmp-2019.11.001;elpa_openmp-2020.05.001;elpa-2019.05.001;elpa-2019.11.001;elpa-2020.05.001' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so -- Checking for module 'libvdwxc>=0.5.0' -- Found libvdwxc, version 0.5.0 -- Checking for module 'fftw3' -- Found fftw3, version 3.3.11 -- Found LibVDWXC: vdwxc;fftw3 (Required is at least version "0.5.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -- Adding libxs_jit.F from dependency libxs for compilation -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - Vendor: OpenBLAS - Include directories: /opt/cp2k-toolchain/install/openblas-0.3.33/include - Libraries: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so - LAPACK - Include directories: /opt/cp2k-toolchain/install/openblas-0.3.33/include - Libraries: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so - MPI - Include directories: /opt/cp2k-toolchain/install/mpich-5.0.1/include - Libraries: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpi.so - MPI_F08: Enabled - ScaLAPACK - Vendor: auto - Include directories: - Libraries: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a - Hardware acceleration - Backend: CUDA - GPU architectures: 70 - GPU profiling enabled: OFF - GPU-accelerated modules - ELPA: ON - GRID: ON - DBM: ON - PW: ON - LibXC - Include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - Libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - Include directories: /opt/cp2k-toolchain/install/hdf5-2.1.1/include - Libraries: hdf5-shared - FFTW3 - Include directories: /opt/cp2k-toolchain/install/fftw-3.3.11/include - Libraries: /opt/cp2k-toolchain/install/fftw-3.3.11/lib/libfftw3.a - LIBXS - Include directories: - Libraries: - SpLA - Include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - Libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - DFTD4 - Enabled via TBLITE - Include directories: /opt/cp2k-toolchain/install/tblite-0.6.0/include;/opt/cp2k-toolchain/install/tblite-0.6.0/include/dftd4/GNU-13.3.0 - Libraries: - TBLITE - Include directories: /opt/cp2k-toolchain/install/tblite-0.6.0/include;/opt/cp2k-toolchain/install/tblite-0.6.0/include/tblite/GNU-13.3.0 - Libraries: - SIRIUS - Include directories: - Libraries: - COSMA - Include directories: /opt/cp2k-toolchain/install/COSMA-2.8.4-cuda/include - Libraries: MPI::MPI_CXX;costa::costa;$;$;$<$:cosma::BLAS::blas>;$;$<$:Tiled-MM::Tiled-MM>;$<$:Tiled-MM::Tiled-MM>;$<$:semiprof::semiprof>;$<$:cosma::scalapack::scalapack> - Libint2 - Include directories: - Libraries: - ELPA - Include directories: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/include/elpa_openmp-2026.02.001 - Libraries: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -------------------------------------------------------------------- - - - Dependencies not included in this build - - - -------------------------------------------------------------------- - DeePMD - PEXSI - ACE (libpace) - Spglib - LibSMEAGOL - MiMiC - DLA-Future - PLUMED - LibFCI - GauXC - Libvori - LibTorch - TREXIO - GreenX After building and installing CP2K, run the regtests with: /opt/cp2k/tests/do_regtest.py /opt/cp2k/bin psmp -- Configuring done (13.0s) -- Generating done (0.5s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container bc6b16487174 ---> 5480759859af Step 41/46 : COPY ./benchmarks ./benchmarks ---> cd6417135e16 Step 42/46 : COPY ./tools/regtesting ./tools/regtesting ---> 2edd3952803c Step 43/46 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> 5d96feca5c13 Step 44/46 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in dbd1eb247fb8 ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma libxs libxsmm dbcsr_acc libdftd4 dftd4_v4_2 s_dftd3 mctc-lib tblite sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 83 input files and 0 errors. ========== Running Performance Test ========== Plot: name="total_timings_6cpu_1gpu", title="Total Timings with 6 CPU Cores and 1 GPU", ylabel="time [s]" Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.028 0.028 100.589 100.589 qs_mol_dyn_low 1 2.0 0.004 0.004 100.171 100.174 qs_forces 11 3.9 0.002 0.002 100.122 100.122 qs_energies 11 4.9 0.001 0.001 89.109 89.109 scf_env_do_scf 11 5.9 0.001 0.001 74.008 74.008 scf_env_do_scf_inner_loop 108 6.5 0.006 0.009 63.225 63.225 velocity_verlet 10 3.0 0.001 0.002 61.828 61.846 rebuild_ks_matrix 119 8.3 0.001 0.001 27.204 27.205 qs_ks_build_kohn_sham_matrix 119 9.3 0.020 0.020 27.203 27.204 dbcsr_multiply_generic 2286 12.5 0.152 0.155 25.514 25.530 qs_ks_update_qs_env 119 7.6 0.001 0.001 25.235 25.236 qs_rho_update_rho_low 119 7.7 0.001 0.001 21.992 22.008 calculate_rho_elec 119 8.7 0.873 0.878 21.991 22.008 qs_scf_new_mos 108 7.5 0.001 0.001 20.964 20.971 qs_scf_loop_do_ot 108 8.5 0.001 0.001 20.963 20.970 ot_scf_mini 108 9.5 0.003 0.003 18.968 18.972 fft_wrap_pw1pw2 1201 11.6 0.022 0.022 16.719 16.763 fft_wrap_pw1pw2_140 487 12.2 0.003 0.003 14.390 14.449 sum_up_and_integrate 119 10.3 0.003 0.003 14.340 14.367 integrate_v_rspace 119 11.3 0.346 0.347 14.244 14.271 multiply_cannon 2286 13.5 0.346 0.351 12.790 12.792 multiply_cannon_loop 2286 14.5 0.268 0.271 11.698 11.706 density_rs2pw 119 9.7 0.008 0.008 10.978 11.149 make_m2s 4572 13.5 0.046 0.047 11.044 11.049 ot_mini 108 10.5 0.001 0.001 11.043 11.043 make_images 4572 14.5 1.135 1.141 10.867 10.873 init_scf_loop 11 6.9 0.000 0.000 10.701 10.701 grid_collocate_task_list 119 9.7 10.110 10.245 10.110 10.245 pw_gpu_r3dc1d_3d_ps 606 13.1 2.378 2.389 8.564 8.573 pw_gpu_c1dr3d_3d_ps 595 14.2 2.253 2.273 8.126 8.161 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.801 7.937 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 7.619 7.619 grid_integrate_task_list 119 12.3 7.452 7.481 7.452 7.481 prepare_preconditioner 11 7.9 0.000 0.000 7.409 7.411 make_preconditioner 11 8.9 0.000 0.000 7.409 7.411 init_scf_run 11 5.9 0.000 0.000 6.808 6.808 scf_env_initial_rho_setup 11 6.9 0.000 0.001 6.808 6.808 qs_ot_get_derivative 108 11.5 0.002 0.002 6.722 6.723 hybrid_alltoall_any 4725 16.4 4.845 4.849 6.565 6.569 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.292 6.551 make_images_data 4572 15.5 0.058 0.060 6.448 6.451 potential_pw2rs 119 12.3 0.036 0.037 6.445 6.445 multiply_cannon_multrec 4572 15.5 2.079 2.090 6.278 6.283 ot_diis_step 108 11.5 0.006 0.006 4.297 4.298 mp_alltoall_z22v 1201 15.6 4.221 4.284 4.221 4.284 build_core_ppl_forces 11 5.9 3.937 4.035 3.937 4.035 wfi_extrapolate 11 7.9 0.001 0.001 4.001 4.001 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.869 3.915 dbcsr_mm_accdrv_process 9594 16.2 0.790 0.930 3.809 3.818 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.733 3.733 apply_single 119 13.6 0.001 0.001 3.732 3.733 mp_waitall_1 64495 16.9 3.575 3.611 3.575 3.611 dbcsr_complete_redistribute 329 12.2 1.226 1.255 3.270 3.530 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.384 3.410 calculate_dm_sparse 119 9.5 0.001 0.001 3.396 3.407 qs_ot_get_p 119 10.4 0.001 0.001 3.376 3.380 multiply_cannon_sync_h2d 4572 15.5 3.075 3.098 3.075 3.098 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.070 3.071 transfer_rs2pw 487 10.6 0.008 0.008 2.717 2.916 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.718 2.719 pw_poisson_solve 119 10.3 0.003 0.003 2.677 2.685 yz_to_x 606 14.1 0.466 0.469 2.613 2.646 qs_create_task_list 11 7.9 0.000 0.000 2.528 2.601 generate_qs_task_list 11 8.9 1.146 1.157 2.528 2.601 x_to_yz 595 15.2 0.502 0.507 2.576 2.598 jit_kernel_multiply 11 15.7 2.431 2.580 2.431 2.580 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.578 2.579 transfer_rs2pw_140 130 11.5 1.604 1.633 2.272 2.487 calculate_first_density_matrix 1 7.0 0.000 0.000 2.353 2.354 cp_fm_cholesky_invert 11 10.9 2.326 2.326 2.326 2.326 qs_ot_get_derivative_taylor 59 13.0 0.003 0.003 2.323 2.325 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.204 2.205 qs_ot_p2m_diag 50 11.0 0.085 0.087 2.189 2.191 pw_gpu_fg 606 14.1 2.164 2.184 2.164 2.184 dbcsr_special_finalize 6858 15.5 0.041 0.042 2.077 2.081 build_core_ppl 11 7.9 1.997 2.037 1.997 2.037 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64", label="H2O-64", y=100.589, yerr=0.0 Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=70.024, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=10.11, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.452, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.845, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=4.221, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.937, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.028 0.028 93.553 93.553 qs_mol_dyn_low 1 2.0 0.004 0.004 93.120 93.123 qs_forces 11 3.9 0.002 0.002 93.072 93.072 qs_energies 11 4.9 0.001 0.001 81.938 81.938 scf_env_do_scf 11 5.9 0.001 0.001 65.972 65.973 velocity_verlet 10 3.0 0.001 0.002 59.063 59.081 scf_env_do_scf_inner_loop 96 6.5 0.005 0.008 54.973 54.973 rebuild_ks_matrix 107 8.3 0.001 0.001 25.189 25.190 qs_ks_build_kohn_sham_matrix 107 9.3 0.018 0.018 25.188 25.190 dbcsr_multiply_generic 1966 12.4 0.134 0.134 23.672 23.683 qs_ks_update_qs_env 107 7.6 0.001 0.001 23.055 23.056 qs_scf_new_mos 96 7.5 0.001 0.001 18.995 18.996 qs_scf_loop_do_ot 96 8.5 0.001 0.001 18.994 18.996 qs_rho_update_rho_low 107 7.7 0.001 0.001 17.727 17.745 calculate_rho_elec 107 8.7 0.789 0.792 17.726 17.744 ot_scf_mini 96 9.5 0.003 0.003 17.231 17.231 fft_wrap_pw1pw2 1081 11.6 0.020 0.021 15.285 15.301 sum_up_and_integrate 107 10.3 0.002 0.002 13.426 13.434 integrate_v_rspace 107 11.3 0.318 0.321 13.339 13.346 fft_wrap_pw1pw2_140 439 12.2 0.003 0.003 13.154 13.174 multiply_cannon 1966 13.4 0.299 0.300 11.815 11.925 init_scf_loop 11 6.9 0.000 0.000 10.913 10.913 multiply_cannon_loop 1966 14.4 0.236 0.240 10.779 10.801 make_m2s 3932 13.4 0.041 0.041 10.352 10.454 make_images 3932 14.4 1.067 1.124 10.194 10.295 density_rs2pw 107 9.7 0.007 0.008 9.983 10.081 ot_mini 96 10.5 0.001 0.001 10.061 10.062 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.704 8.704 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.844 7.984 pw_gpu_r3dc1d_3d_ps 546 13.1 2.180 2.208 7.853 7.877 prepare_preconditioner 11 7.9 0.000 0.000 7.540 7.552 make_preconditioner 11 8.9 0.000 0.000 7.540 7.552 pw_gpu_c1dr3d_3d_ps 535 14.2 2.048 2.068 7.405 7.444 grid_integrate_task_list 107 12.3 7.145 7.154 7.145 7.154 grid_collocate_task_list 107 9.7 6.929 7.005 6.929 7.005 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.399 6.670 init_scf_run 11 5.9 0.000 0.000 6.589 6.590 scf_env_initial_rho_setup 11 6.9 0.000 0.001 6.589 6.589 hybrid_alltoall_any 4079 16.3 4.482 4.560 6.243 6.277 qs_ot_get_derivative 96 11.5 0.002 0.002 6.110 6.110 make_images_data 3932 15.4 0.050 0.051 6.080 6.096 multiply_cannon_multrec 3932 15.4 1.810 1.848 5.933 5.940 potential_pw2rs 107 12.3 0.033 0.033 5.875 5.877 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.326 4.490 build_core_ppl_forces 11 5.9 3.949 4.049 3.949 4.049 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.950 4.012 ot_diis_step 96 11.5 0.005 0.005 3.932 3.932 mp_alltoall_z22v 1081 15.6 3.854 3.875 3.854 3.875 wfi_extrapolate 11 7.9 0.001 0.001 3.806 3.806 dbcsr_mm_accdrv_process 8450 16.1 1.232 1.520 3.775 3.794 dbcsr_complete_redistribute 317 12.2 1.229 1.235 3.449 3.715 qs_create_task_list 11 7.9 0.000 0.000 3.444 3.548 generate_qs_task_list 11 8.9 1.433 1.452 3.444 3.548 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.533 3.533 apply_single 107 13.6 0.001 0.001 3.533 3.533 mp_waitall_1 55487 16.8 3.354 3.447 3.354 3.447 calculate_dm_sparse 107 9.5 0.001 0.001 3.163 3.165 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.144 3.145 qs_ot_get_p 107 10.4 0.001 0.001 2.970 2.970 multiply_cannon_sync_h2d 3932 15.4 2.766 2.780 2.766 2.780 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.743 2.760 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.002 2.703 2.704 transfer_rs2pw 439 10.6 0.007 0.008 2.473 2.631 pw_poisson_solve 107 10.3 0.003 0.003 2.449 2.451 yz_to_x 546 14.1 0.424 0.428 2.392 2.414 calculate_first_density_matrix 1 7.0 0.000 0.000 2.350 2.350 x_to_yz 535 15.2 0.456 0.456 2.341 2.347 cp_fm_cholesky_invert 11 10.9 2.306 2.306 2.306 2.306 jit_kernel_multiply 10 15.3 2.006 2.277 2.006 2.277 transfer_rs2pw_140 118 11.5 1.478 1.497 2.071 2.241 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.185 2.185 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.162 2.177 build_core_ppl 11 7.9 2.027 2.069 2.027 2.069 pw_gpu_fg 546 14.1 1.994 2.031 1.994 2.031 qs_ot_get_derivative_taylor 53 13.0 0.002 0.002 2.019 2.020 copy_fm_to_dbcsr 170 11.1 0.002 0.002 1.706 1.979 build_kinetic_matrix_low 22 6.9 1.835 1.843 1.932 1.941 qs_ot_p2m_diag 44 11.0 0.076 0.077 1.932 1.933 dbcsr_special_finalize 5898 15.4 0.037 0.037 1.893 1.903 build_overlap_matrix_low 22 6.9 1.777 1.793 1.868 1.884 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64_nonortho", label="H2O-64_nonortho", y=93.553, yerr=0.0 Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=67.19399999999999, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.145, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=6.929, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.482, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.949, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=3.854, yerr=0.0 Running w64PBE.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/w64PBE_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.044 0.046 242.565 242.565 qs_mol_dyn_low 1 2.0 0.004 0.004 241.851 241.854 qs_forces 11 3.9 0.002 0.002 241.802 241.802 qs_energies 11 4.9 0.001 0.001 210.457 210.458 velocity_verlet 10 3.0 0.001 0.002 190.587 190.605 scf_env_do_scf 11 5.9 0.001 0.002 189.586 189.587 scf_env_do_scf_inner_loop 106 6.8 0.006 0.009 165.619 165.619 rebuild_ks_matrix 117 8.5 0.001 0.001 125.159 125.167 qs_ks_build_kohn_sham_matrix 117 9.5 0.020 0.020 125.158 125.166 qs_ks_update_qs_env 120 7.8 0.001 0.002 111.371 111.377 fft_wrap_pw1pw2 2000 12.9 0.044 0.045 69.849 69.854 qs_vxc_create 117 10.5 0.004 0.004 66.341 66.372 xc_vxc_pw_create 117 11.5 1.447 1.455 66.337 66.368 fft_wrap_pw1pw2_200 1298 14.3 0.008 0.008 66.205 66.223 qs_rho_update_rho_low 117 7.9 0.001 0.001 61.794 61.806 calculate_rho_elec 117 8.9 1.221 1.221 61.794 61.805 sum_up_and_integrate 117 10.5 0.003 0.003 44.511 44.552 integrate_v_rspace 117 11.5 0.212 0.212 44.321 44.361 grid_collocate_task_list 117 9.9 42.067 42.178 42.067 42.178 xc_pw_derive 702 13.5 0.010 0.010 39.018 39.034 xc_rho_set_and_dset_create 117 12.5 0.947 0.949 38.501 38.547 pw_gpu_c1dr3d_3d_ps 1053 15.2 10.707 10.763 37.360 37.362 grid_integrate_task_list 117 12.5 33.261 33.309 33.261 33.309 pw_gpu_r3dc1d_3d_ps 947 14.5 9.605 9.617 32.433 32.435 xc_pw_divergence 117 12.5 0.005 0.005 26.017 26.038 init_scf_loop 14 6.8 0.001 0.001 23.903 23.904 density_rs2pw 117 9.9 0.009 0.009 18.480 18.596 mp_alltoall_z22v 2000 16.9 18.493 18.543 18.493 18.543 dbcsr_multiply_generic 2035 12.5 0.143 0.144 18.147 18.200 xc_functional_eval 117 13.5 0.002 0.002 17.356 17.372 pbe_lda_eval 117 14.5 17.355 17.370 17.355 17.370 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 16.618 16.785 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 14.554 14.554 qs_scf_new_mos 106 7.8 0.001 0.001 13.411 13.425 qs_scf_loop_do_ot 106 8.8 0.001 0.001 13.410 13.424 x_to_yz 1053 16.2 2.510 2.513 12.284 12.337 ot_scf_mini 106 9.8 0.003 0.003 11.997 12.006 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 11.368 11.368 potential_pw2rs 117 12.5 0.058 0.059 10.848 10.856 yz_to_x 947 15.5 1.836 1.837 10.555 10.561 multiply_cannon 2035 13.5 0.310 0.314 9.095 9.126 init_scf_run 11 5.9 0.000 0.000 9.006 9.006 scf_env_initial_rho_setup 11 6.9 0.000 0.001 9.005 9.005 build_core_ppl_forces 11 5.9 8.439 8.611 8.439 8.611 pw_gpu_sf 1053 16.2 8.415 8.423 8.415 8.423 prepare_preconditioner 14 7.8 0.000 0.000 8.340 8.343 make_preconditioner 14 8.8 0.000 0.000 8.340 8.343 multiply_cannon_loop 2035 14.5 0.243 0.244 8.096 8.121 pw_gpu_fg 947 15.5 7.550 7.569 7.550 7.569 make_m2s 4070 13.5 0.045 0.045 7.552 7.558 make_images 4070 14.5 0.996 1.006 7.372 7.379 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 7.250 7.311 ot_mini 106 10.8 0.001 0.001 7.299 7.307 wfi_extrapolate 11 7.9 0.001 0.001 7.020 7.020 pw_gpu_ffc 1053 16.2 5.935 5.940 5.935 5.940 build_kinetic_matrix_low 22 6.9 5.186 5.194 5.276 5.285 build_overlap_matrix_low 22 6.9 5.125 5.138 5.201 5.215 pw_poisson_solve 117 10.5 0.003 0.003 4.704 4.717 pw_gpu_cff 947 15.5 4.659 4.662 4.659 4.662 transfer_rs2pw 479 10.8 0.009 0.009 4.350 4.536 qs_ot_get_derivative 106 11.8 0.002 0.002 4.529 4.535 multiply_cannon_multrec 4070 15.5 1.763 1.767 4.273 4.282 make_full_single_inverse 14 9.8 0.002 0.002 4.211 4.211 pw_derive 1053 13.8 4.093 4.099 4.093 4.099 make_images_data 4070 15.5 0.053 0.054 3.970 3.984 hybrid_alltoall_any 4213 16.4 2.779 2.797 3.960 3.972 transfer_rs2pw_200 128 11.7 2.648 2.689 3.631 3.821 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.737 3.797 make_full_inverse_cholesky 14 9.8 0.000 0.000 3.380 3.534 mp_waitall_1 57459 16.9 3.361 3.383 3.361 3.383 build_core_ppl 11 7.9 3.282 3.350 3.282 3.350 transfer_pw2rs 479 13.4 0.006 0.006 3.067 3.069 ot_diis_step 106 11.8 0.005 0.005 2.750 2.750 pw_copy 1755 13.0 2.699 2.706 2.699 2.706 fft_wrap_pw1pw2_70 234 13.2 0.001 0.002 2.664 2.699 arnoldi_generalized_ev 14 10.8 0.000 0.000 2.632 2.634 dbcsr_sym_matrix_vector_mult 1269 12.5 0.035 0.035 2.598 2.599 transfer_pw2rs_200 128 14.1 1.585 1.588 2.451 2.452 gev_build_subspace 23 11.5 0.010 0.010 2.428 2.428 qs_create_task_list 11 7.9 0.000 0.000 2.410 2.413 generate_qs_task_list 11 8.9 1.343 1.356 2.409 2.413 apply_preconditioner_dbcsr 120 12.8 0.000 0.000 2.368 2.373 apply_single 120 13.8 0.001 0.001 2.368 2.372 dbcsr_complete_redistribute 323 11.8 0.874 0.898 2.110 2.296 dbcsr_sym_matrix_vector_mult_l 1269 13.5 2.244 2.267 2.250 2.274 dbcsr_mm_accdrv_process 9388 16.2 0.901 1.021 2.245 2.247 pw_poisson_set 118 11.5 0.005 0.005 2.137 2.150 calculate_dm_sparse 117 9.7 0.001 0.001 2.118 2.124 qs_ot_get_derivative_taylor 89 12.9 0.004 0.004 1.999 2.004 cp_dbcsr_sm_fm_multiply 46 9.3 0.002 0.002 1.946 1.948 multiply_cannon_sync_h2d 4070 15.5 1.779 1.821 1.779 1.821 pw_integral_ab_c1d_c1d_gs 117 11.5 1.793 1.795 1.810 1.812 qs_ot_get_p 120 10.5 0.001 0.001 1.671 1.677 pw_axpy 1170 12.0 1.606 1.611 1.606 1.611 dbcsr_special_finalize 6105 15.5 0.035 0.035 1.498 1.498 copy_fm_to_dbcsr 180 10.8 0.002 0.002 1.314 1.469 cp_dbcsr_sm_fm_multiply_core 46 10.3 0.000 0.000 1.461 1.463 copy_dbcsr_to_fm 143 10.8 0.004 0.004 1.398 1.444 dbcsr_merge_single_wm 4070 16.5 0.134 0.135 1.383 1.384 mp_sendrecv_dv 479 12.8 1.202 1.347 1.202 1.347 cp_fm_cholesky_invert 14 10.8 1.338 1.338 1.338 1.338 calculate_rho_core 11 7.9 0.164 0.164 1.272 1.333 multiply_cannon_metrocomm1 4070 15.5 0.012 0.012 1.227 1.238 dbcsr_dot 1125 12.2 1.152 1.153 1.213 1.223 calculate_first_density_matrix 1 7.0 0.000 0.000 1.180 1.181 dbcsr_sort_data 4070 17.5 0.969 0.970 0.969 0.970 jit_kernel_multiply 10 15.0 0.844 0.969 0.844 0.969 transfer_dbcsr_to_fm 14 10.8 0.001 0.001 0.912 0.952 cp_dbcsr_plus_fm_fm_t 22 8.9 0.001 0.001 0.943 0.943 dbcsr_finalize 4628 13.9 0.061 0.061 0.881 0.917 transfer_fm_to_dbcsr 14 9.8 0.000 0.000 0.749 0.907 qs_ot_get_orbitals 106 10.8 0.001 0.001 0.819 0.820 dbcsr_copy 7812 13.3 0.202 0.203 0.813 0.813 dbcsr_merge_all 4098 15.1 0.182 0.182 0.772 0.807 build_core_ppnl_forces 11 5.9 0.771 0.781 0.771 0.781 qs_ot_p2m_diag 19 11.0 0.035 0.035 0.780 0.780 mp_alltoall_d11v 1899 13.8 0.759 0.777 0.759 0.777 grid_create_task_list 11 9.9 0.749 0.776 0.749 0.776 evaluate_core_matrix_traces 117 8.5 0.001 0.001 0.765 0.766 calculate_ptrace_kp 234 9.5 0.001 0.001 0.764 0.765 cp_fm_cholesky_decompose 28 10.5 0.689 0.729 0.689 0.729 fft_wrap_pw1pw2_30 234 13.2 0.001 0.001 0.677 0.682 cp_dbcsr_syevd 19 12.0 0.002 0.002 0.658 0.658 make_images_pack 4070 15.5 0.631 0.639 0.645 0.653 cp_fm_uplo_to_full 47 13.4 0.472 0.632 0.472 0.632 qs_init_subsys 1 2.0 0.001 0.001 0.630 0.630 cp_fm_diag_elpa 19 13.0 0.000 0.000 0.624 0.625 cp_fm_diag_elpa_base 19 14.0 0.614 0.617 0.624 0.624 qs_env_setup 1 3.0 0.000 0.000 0.623 0.624 qs_env_rebuild_pw_env 23 5.3 0.000 0.000 0.622 0.623 pw_env_rebuild 1 5.0 0.000 0.000 0.622 0.623 pw_grid_setup 4 6.0 0.000 0.000 0.597 0.598 pw_grid_setup_internal 4 7.0 0.007 0.007 0.587 0.588 transfer_rs2pw_70 117 11.9 0.387 0.388 0.558 0.563 make_basis_sm 14 9.3 0.001 0.001 0.558 0.559 dbcsr_copy_into_existing 22 7.9 0.556 0.556 0.557 0.557 qs_ot_get_derivative_diag 17 12.0 0.001 0.001 0.547 0.549 mp_sum_d 3821 11.6 0.384 0.547 0.384 0.547 pw_zero 585 13.0 0.530 0.535 0.530 0.535 acc_transpose_blocks 4070 15.5 0.023 0.023 0.522 0.524 dbcsr_mm_accdrv_process_sort 9388 17.2 0.500 0.503 0.500 0.503 transfer_pw2rs_70 117 14.5 0.312 0.312 0.476 0.477 pw_grid_sort 4 8.0 0.345 0.347 0.468 0.470 dbcsr_sort_indices 10929 16.5 0.435 0.435 0.435 0.435 parallel_gemm_fm_cosma 96 8.9 0.410 0.412 0.410 0.412 compute_matrix_w 11 5.9 0.000 0.000 0.410 0.410 calculate_w_matrix_ot 11 6.9 0.003 0.003 0.410 0.410 ot_scf_init 14 7.8 0.002 0.002 0.406 0.409 calculate_ecore_overlap 22 5.9 0.001 0.001 0.228 0.396 reorthogonalize_vectors 10 9.0 0.000 0.000 0.394 0.394 dbcsr_data_copy_aa2 2343 15.5 0.370 0.391 0.370 0.391 mp_alltoall_i22 633 13.6 0.207 0.352 0.207 0.352 cp_dbcsr_alloc_block_from_nbl 88 7.7 0.223 0.226 0.343 0.347 mp_sum_l 6134 13.5 0.320 0.347 0.320 0.347 dbcsr_desymmetrize_deep 143 11.8 0.092 0.092 0.332 0.333 dbcsr_add_d 1795 13.1 0.003 0.003 0.319 0.321 build_qs_neighbor_lists 11 6.9 0.001 0.001 0.320 0.321 dbcsr_add_anytype 1795 14.1 0.171 0.172 0.316 0.318 distribute_tasks 11 9.9 0.302 0.305 0.302 0.305 pw_scale 468 12.0 0.298 0.300 0.298 0.300 setup_rec_index_2d 4070 14.5 0.285 0.287 0.285 0.287 integrate_v_core_rspace 11 7.9 0.068 0.068 0.279 0.279 multiply_cannon_multrec_finali 2035 16.5 0.005 0.005 0.265 0.268 fft_wrap_pw1pw2_10 234 13.2 0.001 0.001 0.258 0.266 dbcsr_mm_multrec_finalize 2035 17.5 0.021 0.022 0.260 0.263 pw_multiply_with 117 11.5 0.254 0.255 0.254 0.255 dbcsr_make_untransposed_blocks 2481 13.4 0.241 0.241 0.253 0.253 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="w64PBE", label="w64PBE", y=242.565, yerr=0.0 Plot: name="w64PBE_timings_6cpu_1gpu", title="Timings of w64PBE with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="rest", label="rest", y=120.68199999999999, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=42.067, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=33.261, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=18.493, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="pbe_lda_eval", label="pbe_lda_eval", y=17.355, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.707, yerr=0.0 Running w64SCAN.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/w64SCAN_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.200 0.205 911.041 911.041 qs_mol_dyn_low 1 2.0 0.004 0.004 908.827 908.830 qs_forces 11 3.9 0.002 0.002 908.779 908.780 qs_energies 11 4.9 0.001 0.001 816.806 816.806 scf_env_do_scf 11 5.9 0.001 0.002 778.175 778.176 velocity_verlet 10 3.0 0.001 0.002 725.814 725.831 scf_env_do_scf_inner_loop 106 6.8 0.006 0.009 700.583 700.584 rebuild_ks_matrix 117 8.5 0.001 0.001 628.900 628.904 qs_ks_build_kohn_sham_matrix 117 9.5 0.021 0.021 628.899 628.904 qs_ks_update_qs_env 119 7.8 0.001 0.001 551.256 551.258 fft_wrap_pw1pw2 3053 12.6 0.069 0.070 431.278 431.486 fft_wrap_pw1pw2_400 1649 13.9 0.010 0.010 413.563 413.744 qs_vxc_create 117 10.5 0.004 0.004 387.724 387.737 xc_vxc_pw_create 117 11.5 4.647 4.647 387.720 387.733 xc_rho_set_and_dset_create 117 12.5 6.049 6.083 260.468 260.583 qs_rho_update_rho_low 117 7.9 0.001 0.001 226.234 226.235 calculate_rho_elec 234 8.9 6.760 6.768 226.232 226.233 pw_gpu_c1dr3d_3d_ps 1521 15.1 120.917 120.971 216.705 216.742 pw_gpu_r3dc1d_3d_ps 1532 14.1 122.359 122.369 214.485 214.657 sum_up_and_integrate 117 10.5 0.005 0.005 187.422 187.840 integrate_v_rspace 234 11.5 0.427 0.430 186.558 186.972 xc_pw_derive 702 13.5 0.012 0.012 184.995 185.076 density_rs2pw 234 9.9 0.021 0.022 164.967 165.314 xc_functional_eval 234 13.5 0.003 0.003 157.637 157.736 libxc_lda_eval 234 14.5 157.627 157.726 157.633 157.732 xc_pw_divergence 117 12.5 0.007 0.007 121.181 121.250 potential_pw2rs 234 12.5 0.288 0.292 97.414 97.546 grid_integrate_task_list 234 12.5 88.716 89.264 88.716 89.264 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 78.421 78.422 init_scf_loop 13 6.8 0.000 0.000 77.526 77.526 mp_alltoall_z22v 3053 16.6 72.925 73.115 72.925 73.115 grid_collocate_task_list 234 9.9 54.355 54.719 54.355 54.719 x_to_yz 1521 16.1 9.102 9.109 44.882 44.956 yz_to_x 1532 15.1 7.623 7.624 44.769 44.879 transfer_rs2pw 947 10.9 0.020 0.020 36.059 36.446 transfer_rs2pw_400 245 11.8 25.829 25.881 31.471 31.848 pw_gpu_sf 1521 16.1 30.909 30.929 30.909 30.929 pw_gpu_fg 1532 15.1 30.039 30.086 30.039 30.086 transfer_pw2rs 947 13.5 0.016 0.016 29.581 29.584 transfer_pw2rs_400 245 14.3 21.040 21.103 26.300 26.310 init_scf_run 11 5.9 0.000 0.000 24.784 24.784 scf_env_initial_rho_setup 11 6.9 0.000 0.001 24.784 24.784 wfi_extrapolate 11 7.9 0.002 0.002 21.202 21.202 pw_gpu_ffc 1521 16.1 19.970 19.974 19.970 19.974 dbcsr_multiply_generic 2100 12.6 0.147 0.151 18.660 19.090 pw_poisson_solve 117 10.5 0.003 0.003 17.536 17.544 pw_gpu_cff 1532 15.1 17.169 17.174 17.169 17.174 fft_wrap_pw1pw2_140 468 13.2 0.003 0.003 13.919 14.005 qs_scf_new_mos 106 7.8 0.001 0.001 13.591 13.596 qs_scf_loop_do_ot 106 8.8 0.001 0.001 13.591 13.595 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 13.418 13.547 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 13.342 13.342 ot_scf_mini 106 9.8 0.003 0.003 12.183 12.187 pw_derive 1053 13.8 12.117 12.126 12.117 12.126 multiply_cannon 2100 13.6 0.314 0.321 9.140 9.161 pw_copy 2223 13.1 9.140 9.148 9.140 9.148 mp_waitall_1 59747 17.0 8.679 8.700 8.679 8.700 pw_integral_ab_c1d_c1d_gs 117 11.5 8.168 8.208 8.464 8.466 prepare_preconditioner 13 7.8 0.000 0.000 8.251 8.254 make_preconditioner 13 8.8 0.000 0.000 8.251 8.254 multiply_cannon_loop 2100 14.6 0.249 0.250 8.121 8.124 make_m2s 4200 13.6 0.044 0.044 7.534 7.539 ot_mini 106 10.8 0.001 0.001 7.399 7.403 make_images 4200 14.6 1.005 1.007 7.353 7.360 mp_sendrecv_dv 947 12.9 6.832 7.255 6.832 7.255 qs_env_update_s_mstruct 11 6.9 0.000 0.000 6.963 6.989 pw_poisson_set 118 11.5 0.006 0.006 6.732 6.740 build_core_ppl_forces 11 5.9 6.232 6.369 6.232 6.369 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 6.026 6.053 pw_axpy 1638 11.7 5.906 5.925 5.906 5.925 calculate_rho_core 11 7.9 0.441 0.442 4.969 4.998 qs_ot_get_derivative 106 11.8 0.002 0.002 4.623 4.625 build_kinetic_matrix_low 22 6.9 4.507 4.517 4.591 4.602 build_overlap_matrix_low 22 6.9 4.467 4.492 4.538 4.563 multiply_cannon_multrec 4200 15.6 1.788 1.797 4.273 4.277 hybrid_alltoall_any 4338 16.5 2.747 2.765 3.940 3.972 make_full_single_inverse 13 9.8 0.002 0.002 3.969 3.969 make_images_data 4200 15.6 0.055 0.055 3.928 3.948 transfer_rs2pw_140 234 11.9 2.915 2.929 3.864 3.889 make_full_inverse_cholesky 13 9.8 0.000 0.000 3.509 3.650 fft_wrap_pw1pw2_50 468 13.2 0.003 0.003 2.836 2.885 ot_diis_step 106 11.8 0.006 0.006 2.756 2.756 transfer_pw2rs_140 234 14.5 1.704 1.706 2.644 2.650 build_core_ppl 11 7.9 2.570 2.613 2.570 2.613 arnoldi_generalized_ev 13 10.8 0.000 0.000 2.505 2.505 dbcsr_sym_matrix_vector_mult 1206 12.5 0.034 0.034 2.471 2.471 dbcsr_complete_redistribute 312 11.8 0.980 0.990 2.267 2.419 apply_preconditioner_dbcsr 119 12.8 0.000 0.000 2.342 2.342 apply_single 119 13.8 0.001 0.001 2.341 2.342 gev_build_subspace 22 11.5 0.010 0.010 2.309 2.309 dbcsr_mm_accdrv_process 9484 16.3 0.671 1.018 2.220 2.223 dbcsr_sym_matrix_vector_mult_l 1206 13.5 2.139 2.143 2.145 2.148 pw_zero 702 12.6 2.138 2.146 2.138 2.146 qs_ot_get_derivative_taylor 89 12.9 0.004 0.004 2.138 2.139 calculate_dm_sparse 117 9.7 0.001 0.001 2.107 2.107 qs_init_subsys 1 2.0 0.001 0.001 1.935 1.935 cp_dbcsr_sm_fm_multiply 45 9.4 0.002 0.002 1.933 1.935 qs_env_setup 1 3.0 0.000 0.000 1.927 1.928 qs_env_rebuild_pw_env 23 5.3 0.000 0.000 1.927 1.928 pw_env_rebuild 1 5.0 0.000 0.000 1.927 1.928 multiply_cannon_sync_h2d 4200 15.6 1.793 1.876 1.793 1.876 pw_grid_setup 4 6.0 0.000 0.000 1.864 1.865 pw_grid_setup_internal 4 7.0 0.019 0.020 1.832 1.833 qs_create_task_list 11 7.9 0.000 0.000 1.755 1.809 generate_qs_task_list 11 8.9 0.903 0.911 1.755 1.809 qs_ot_get_p 119 10.6 0.001 0.001 1.716 1.720 copy_dbcsr_to_fm 138 10.8 0.004 0.004 1.674 1.694 pw_grid_sort 4 8.0 1.121 1.124 1.509 1.512 dbcsr_special_finalize 6300 15.6 0.036 0.036 1.496 1.503 mp_sum_d 3885 11.5 1.192 1.487 1.192 1.487 copy_fm_to_dbcsr 174 10.8 0.002 0.002 1.328 1.470 cp_dbcsr_sm_fm_multiply_core 45 10.4 0.000 0.000 1.446 1.447 dbcsr_merge_single_wm 4200 16.6 0.133 0.134 1.380 1.386 jit_kernel_multiply 13 15.1 1.034 1.377 1.034 1.377 integrate_v_core_rspace 11 7.9 0.152 0.153 1.346 1.348 multiply_cannon_metrocomm1 4200 15.6 0.013 0.013 1.226 1.300 cp_fm_cholesky_invert 13 10.8 1.263 1.263 1.263 1.263 dbcsr_dot 1134 12.2 1.161 1.168 1.231 1.238 mp_sum_l 6329 13.5 0.816 1.225 0.816 1.225 transfer_dbcsr_to_fm 13 10.8 0.001 0.001 1.186 1.201 calculate_first_density_matrix 1 7.0 0.000 0.000 1.158 1.158 pw_scale 585 11.9 1.094 1.095 1.094 1.095 dbcsr_sort_data 4200 17.6 0.964 0.970 0.964 0.970 dbcsr_finalize 4788 14.0 0.063 0.063 0.941 0.954 cp_dbcsr_plus_fm_fm_t 22 8.9 0.001 0.001 0.936 0.937 transfer_fm_to_dbcsr 13 9.8 0.000 0.000 0.773 0.912 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="w64SCAN", label="w64SCAN", y=911.041, yerr=0.0 Plot: name="w64SCAN_timings_6cpu_1gpu", title="Timings of w64SCAN with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="rest", label="rest", y=348.49700000000007, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="libxc_lda_eval", label="libxc_lda_eval", y=157.627, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=122.359, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=120.917, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=88.716, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=72.925, yerr=0.0 Running GW_PBE_4benzene.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.018 0.020 109.080 109.080 qs_energies 1 2.0 0.000 0.000 108.752 108.753 mp2_main 1 3.0 0.000 0.000 102.118 102.119 mp2_gpw_main 1 4.0 0.000 0.000 100.469 100.471 rpa_ri_compute_en 1 5.0 0.000 0.000 92.083 92.084 rpa_num_int 1 6.0 0.001 0.001 92.075 92.076 dbt_total 2336 9.6 0.022 0.022 74.070 74.071 compute_mat_P_omega 1 7.0 0.001 0.002 70.572 70.573 compute_mat_P_omega_contract 10 8.0 5.324 5.346 70.259 70.270 dbt_contract 787 11.0 0.050 0.050 48.185 48.188 dbt_tas_total 1149 12.2 0.147 0.151 37.209 37.209 dbt_tas_multiply 807 12.1 0.003 0.003 36.514 36.514 dbt_tas_dbm 807 14.1 0.006 0.006 28.148 28.148 dbm_multiply 807 16.1 26.731 27.064 26.731 27.064 dbt_copy 1107 10.7 0.069 0.070 26.517 26.819 compute_mat_P_omega_calc_M_occ 250 9.0 5.303 5.369 24.622 24.622 dbt_tas_mm_1N 524 15.1 0.003 0.003 18.175 18.537 dbt_reshape 594 11.8 7.341 7.475 17.691 17.807 compute_QP_energies 1 7.0 0.000 0.000 15.583 15.584 compute_self_energy_cubic_gw 1 8.0 0.115 0.116 15.583 15.583 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 15.427 15.427 dbt_tas_reserve_blocks_index 3266 14.3 0.655 0.659 10.917 11.196 dbm_reserve_blocks 3634 15.3 10.573 10.842 10.573 10.842 dbt_reserve_blocks_index 2347 13.0 0.318 0.320 9.030 9.305 dbt_crop 1042 12.0 6.814 6.916 9.104 9.248 dbt_reserve_blocks_index_array 2289 12.1 0.011 0.012 8.814 9.115 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 8.974 8.974 mp_waitall_2 2656 15.9 8.496 8.528 8.496 8.528 mp2_ri_gpw_compute_in 1 5.0 0.001 0.002 8.376 8.376 dbt_communicate_buffer 594 12.8 0.012 0.012 7.666 7.699 dbt_tas_mm_2 251 15.0 0.003 0.003 7.640 7.640 contract_cubic_gw 21 9.0 0.000 0.000 7.287 7.287 scf_env_do_scf 1 3.0 0.000 0.000 6.089 6.089 scf_env_do_scf_inner_loop 17 4.0 0.001 0.001 6.089 6.089 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 5.633 5.633 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.002 5.529 5.552 dbt_tas_copy 511 11.5 2.568 2.617 4.581 4.638 dbcsr_multiply_generic 30 8.1 0.003 0.003 4.399 4.427 multiply_cannon 30 9.1 0.009 0.010 4.208 4.234 multiply_cannon_loop 30 10.1 0.004 0.004 4.154 4.181 mp_sync 8688 11.6 3.085 3.839 3.085 3.839 get_2c_integrals 1 6.0 0.000 0.000 3.719 3.719 multiply_cannon_multrec 60 11.1 0.234 0.239 3.621 3.622 trace_sigma_gw 21 9.0 0.547 0.553 3.443 3.444 qs_scf_new_mos 17 5.0 0.000 0.000 3.094 3.122 dbcsr_mm_accdrv_process 328 12.3 0.022 0.023 3.118 3.119 jit_kernel_multiply 17 11.6 3.090 3.091 3.090 3.091 compute_2c_integrals 1 7.0 0.000 0.000 2.934 2.934 dbt_split_copyback 70 10.6 1.212 1.271 2.732 2.833 mp2_ri_gpw_compute_in_copy_3c 6 6.0 0.221 0.224 2.485 2.675 convert_to_new_pgrid 2421 14.1 0.036 0.037 2.482 2.492 fft_wrap_pw1pw2 301 10.2 0.005 0.005 2.466 2.467 qs_ks_build_kohn_sham_matrix 18 6.9 0.003 0.003 2.464 2.466 dbm_copy 1614 15.1 2.445 2.454 2.445 2.454 qs_ks_update_qs_env 17 5.0 0.000 0.000 2.434 2.436 rebuild_ks_matrix 17 6.0 0.000 0.000 2.427 2.428 fill_fm_L_from_L_loc_non_block 1 8.0 0.000 0.000 2.365 2.377 build_3c_integrals 5 6.0 1.507 1.509 2.162 2.353 fill_fm_L_from_L_loc_non_block 1 9.0 2.270 2.283 2.270 2.283 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="GW_PBE_4benzene", label="GW_PBE_4benzene", y=109.08, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_6cpu_1gpu", title="Timings of GW_PBE_4benzene with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="rest", label="rest", y=49.12499999999999, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=26.731, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=10.573, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=8.496, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=7.341, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_crop", label="dbt_crop", y=6.814, yerr=0.0 Running RI-HFX_H2O-32.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.026 0.030 209.111 209.112 qs_forces 1 2.0 0.000 0.000 208.633 208.633 rebuild_ks_matrix 7 6.6 0.000 0.000 204.608 204.608 qs_ks_build_kohn_sham_matrix 7 7.6 0.002 0.002 204.608 204.608 hfx_ks_matrix 7 8.6 0.000 0.000 200.640 200.641 dbt_total 849 11.0 0.010 0.010 152.138 152.139 hfx_ri_update_ks 7 9.6 0.000 0.000 114.460 114.460 hfx_ri_update_ks_Pmat 7 10.6 21.886 21.936 114.455 114.455 qs_energies 1 3.0 0.000 0.000 109.769 109.769 scf_env_do_scf 1 4.0 0.000 0.000 107.858 107.858 qs_ks_update_qs_env 8 6.0 0.000 0.000 105.788 105.788 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 98.827 98.827 dbt_contract 207 12.4 0.055 0.056 88.784 88.785 hfx_ri_update_forces 1 7.0 1.052 1.057 86.178 86.179 dbt_tas_total 369 13.4 0.084 0.086 72.190 72.190 dbt_tas_multiply 216 13.5 0.001 0.001 69.192 69.192 dbt_copy 423 11.8 0.046 0.047 58.066 58.802 scf_env_do_scf_inner_loop 6 5.0 0.000 0.001 55.918 55.918 dbt_tas_dbm 216 15.5 0.002 0.002 54.129 54.129 init_scf_loop 2 5.0 0.000 0.000 51.939 51.939 hfx_ri_forces_Pmat_3c 1 8.0 3.460 3.473 51.667 51.694 dbm_multiply 216 17.5 50.713 50.823 50.713 50.823 dbt_reshape 175 13.2 20.215 20.258 45.013 45.157 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 33.249 33.249 precalc_derivatives 1 8.0 1.840 1.842 28.131 28.133 mp_waitall_2 1022 16.5 23.078 23.125 23.078 23.125 dbt_tas_mm_2 91 16.5 0.001 0.001 22.632 22.632 dbt_crop 372 13.7 14.940 15.133 19.304 19.551 dbt_communicate_buffer 175 14.2 0.005 0.005 18.949 19.010 dbt_tas_reserve_blocks_index 1323 15.4 1.709 1.714 18.243 18.754 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 17.974 17.974 dbm_reserve_blocks 1491 16.3 17.201 17.724 17.201 17.724 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 16.771 16.772 dbt_tas_mm_3T 77 17.1 0.001 0.001 16.446 16.769 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 15.793 15.793 dbt_reserve_blocks_index 889 14.5 0.615 0.618 15.129 15.318 build_3c_derivatives 3 9.0 2.388 2.405 15.256 15.259 dbt_reserve_blocks_index_array 859 13.5 0.008 0.008 14.845 15.027 dbt_tas_mm_3N 37 15.4 0.000 0.000 12.379 12.593 dbt_tas_copy 248 12.5 4.265 4.447 7.735 8.241 mp_sync 2901 12.8 7.429 7.987 7.429 7.987 hfx_ri_pre_scf_Pmat_copy_2 9 13.0 2.010 2.014 5.509 5.513 dbt_tas_replicate 168 15.1 2.486 2.493 5.452 5.503 hfx_ri_pre_scf_Pmat_int 1 13.0 0.000 0.000 5.384 5.384 hfx_ri_pre_scf_calc_tensors 1 14.0 0.003 0.003 4.653 4.658 hfx_ri_pre_scf_Pmat_RIx3C 9 13.0 0.000 0.000 4.444 4.467 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-HFX_H2O-32", label="RI-HFX_H2O-32", y=209.111, yerr=0.0 Plot: name="RI-HFX_H2O-32_timings_6cpu_1gpu", title="Timings of RI-HFX_H2O-32 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="rest", label="rest", y=76.018, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=50.713, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=23.078, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=21.886, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=20.215, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=17.201, yerr=0.0 Running RI-MP2_ammonia.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.010 0.011 103.237 103.237 qs_energies 1 2.0 0.000 0.000 103.054 103.054 mp2_main 1 3.0 0.000 0.000 96.225 96.225 mp2_gpw_main 1 4.0 0.001 0.001 95.836 95.836 mp2_ri_gpw_compute_in 1 5.0 0.541 0.541 51.464 51.475 mp2_ri_gpw_compute_en 1 5.0 0.099 0.099 44.313 44.324 mp2_ri_gpw_compute_in_loop 1 6.0 0.014 0.016 43.395 43.407 mp2_ri_gpw_compute_en_RI_loop 1 6.0 12.845 12.853 41.678 41.689 dbcsr_multiply_generic 2666 8.0 0.167 0.169 22.667 23.044 ao_to_mo_and_store_B_mult_1 1328 7.0 0.014 0.015 21.775 22.152 mp2_ri_gpw_compute_en_expansio 1040 7.0 0.739 0.746 16.656 16.866 mp2_eri_3c_integrate_gpw 1328 7.0 0.019 0.019 16.067 16.503 local_gemm 1040 8.0 15.917 16.120 15.917 16.120 make_m2s 5332 9.0 0.057 0.058 12.734 12.774 make_images 5332 10.0 2.174 2.177 12.553 12.591 multiply_cannon 2666 9.0 0.413 0.414 9.253 9.585 hybrid_alltoall_any 6683 11.6 8.515 8.535 8.798 8.822 make_images_data 5332 11.0 0.071 0.072 8.702 8.724 multiply_cannon_loop 2666 10.0 0.199 0.202 8.137 8.456 fft_wrap_pw1pw2 26668 10.4 0.140 0.147 7.775 8.253 integrate_v_rspace 1338 8.0 1.027 1.037 7.713 7.733 collocate_function 1328 8.0 4.972 4.977 7.184 7.647 get_2c_integrals 1 6.0 0.005 0.005 7.517 7.527 compute_2c_integrals 1 7.0 0.007 0.008 6.936 6.937 compute_2c_integrals_loop_lm 1 8.0 0.013 0.021 6.758 6.769 mp2_eri_2c_integrate_gpw 1 9.0 2.009 2.030 6.746 6.765 mp2_ri_gpw_compute_en_comm 221 7.0 1.036 1.050 5.961 6.247 scf_env_do_scf 1 3.0 0.000 0.000 5.991 5.992 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 5.991 5.992 grid_integrate_task_list 1338 9.0 5.379 5.385 5.379 5.385 ao_to_mo_and_store_B_E_Ex_1 1328 7.0 3.468 3.499 5.300 5.373 mp2_ri_gpw_compute_en_ener 1040 7.0 5.063 5.142 5.063 5.142 fft_wrap_pw1pw2_20 10647 11.4 0.022 0.022 4.559 5.033 pw_gpu_r3dc1d_3d 13282 12.2 3.922 4.445 3.922 4.445 qs_scf_new_mos 10 5.0 0.000 0.000 4.328 4.333 mp_sendrecv_dm3 442 8.0 3.900 4.172 3.900 4.172 multiply_cannon_multrec 2676 11.0 1.773 1.996 3.790 4.048 eigensolver 11 5.8 0.001 0.002 3.035 3.037 potential_pw2rs 2666 10.0 0.102 0.103 2.679 2.765 pw_gpu_c1dr3d_3d 13280 12.7 2.662 2.698 2.662 2.698 cp_fm_diag_elpa 11 6.8 0.000 0.000 2.393 2.393 cp_fm_diag_elpa_base 11 7.8 2.309 2.326 2.391 2.392 collocate_single_gaussian 1328 10.0 0.094 0.094 2.252 2.329 fft_wrap_pw1pw2_10 15957 11.5 0.020 0.020 2.301 2.313 copy_dbcsr_to_fm 1351 8.0 0.034 0.036 2.261 2.297 replicate_iaK_2intgroup 1 6.0 2.089 2.095 2.229 2.235 mp2_eri_2c_integrate_gpw_pot_l 1328 10.0 0.004 0.004 2.164 2.231 fill_local_i_aL 884 7.5 2.178 2.179 2.178 2.179 multiply_cannon_sync_h2d 2676 11.0 2.168 2.176 2.168 2.176 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-MP2_ammonia", label="RI-MP2_ammonia", y=103.237, yerr=0.0 Plot: name="RI-MP2_ammonia_timings_6cpu_1gpu", title="Timings of RI-MP2_ammonia with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="rest", label="rest", y=55.517999999999994, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="local_gemm", label="local_gemm", y=15.917, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=12.845, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=8.515, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=5.379, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_ener", label="mp2_ri_gpw_compute_en_ener", y=5.063, yerr=0.0 Running diag_cu144_broy.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/diag_cu144_broy_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.077 0.080 201.832 201.832 qs_energies 1 2.0 0.000 0.000 200.748 200.749 scf_env_do_scf 1 3.0 0.000 0.000 187.787 187.787 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 187.787 187.787 qs_ks_update_qs_env 15 5.0 0.000 0.000 103.411 103.442 rebuild_ks_matrix 15 6.0 0.000 0.000 103.212 103.242 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 103.211 103.242 qs_vxc_create 15 8.0 0.029 0.058 60.344 60.408 qs_scf_new_mos 15 5.0 0.000 0.000 53.049 53.054 fft_wrap_pw1pw2 1086 10.0 0.027 0.027 52.680 52.704 calculate_dispersion_nonloc 15 9.0 10.918 10.959 51.790 51.827 eigensolver 15 6.0 0.002 0.002 43.666 43.702 sum_up_and_integrate 15 8.0 0.000 0.000 41.339 41.374 integrate_v_rspace 15 9.0 0.046 0.047 41.314 41.350 grid_integrate_task_list 15 10.0 34.110 34.139 34.110 34.139 qs_rho_update_rho_low 16 5.0 0.000 0.000 28.721 28.722 calculate_rho_elec 16 6.0 0.178 0.179 28.721 28.722 pw_gpu_c1dr3d_3d_ps 585 12.1 5.567 5.635 27.678 27.715 fft_wrap_pw1pw2_150 765 11.0 0.005 0.005 27.534 27.571 cp_fm_diag_elpa 15 7.0 0.000 0.000 26.213 26.218 cp_fm_diag_elpa_base 15 8.0 24.364 24.976 26.207 26.209 pw_gpu_r3dc1d_3d_ps 501 11.9 4.940 5.230 24.968 24.981 grid_collocate_task_list 16 7.0 17.391 17.393 17.391 17.393 cp_fm_cholesky_restore 45 7.0 15.530 16.289 15.530 16.289 fft_wrap_pw1pw2_200 197 11.3 0.001 0.001 13.149 13.223 density_rs2pw 16 7.0 0.002 0.002 11.144 11.144 mp_alltoall_z22v 1086 14.0 9.413 9.683 9.413 9.683 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 9.658 9.658 vdW_energy 15 10.0 9.470 9.478 9.470 9.478 pw_gpu_ffc 585 13.1 8.966 8.985 8.966 8.985 xc_vxc_pw_create 15 9.0 0.182 0.183 8.524 8.526 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 8.392 8.421 pw_gpu_cff 501 12.9 8.323 8.346 8.323 8.346 potential_pw2rs 15 10.0 0.007 0.007 7.159 7.225 pw_gpu_sf 585 13.1 6.981 6.983 6.981 6.983 copy_dbcsr_to_fm 16 5.9 0.001 0.001 6.582 6.604 pw_gpu_fg 501 12.9 6.475 6.513 6.475 6.513 x_to_yz 585 13.1 1.043 1.053 6.132 6.147 dbcsr_complete_redistribute 46 8.3 1.653 1.664 5.499 5.527 yz_to_x 501 12.9 0.855 0.859 5.178 5.420 fft_wrap_pw1pw2_10 62 10.5 0.000 0.000 5.406 5.411 cp_fm_uplo_to_full 30 8.0 3.764 5.095 3.764 5.095 xc_pw_derive 90 11.0 0.001 0.001 5.011 5.021 xc_rho_set_and_dset_create 15 10.0 0.132 0.134 4.940 4.954 build_core_ppnl 1 5.0 4.702 4.719 4.702 4.719 gspace_mixing 14 5.0 0.125 0.125 4.166 4.166 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="diag_cu144_broy", label="diag_cu144_broy", y=201.832, yerr=0.0 Plot: name="diag_cu144_broy_timings_6cpu_1gpu", title="Timings of diag_cu144_broy with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="rest", label="rest", y=99.51899999999999, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=34.11, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=24.364, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=17.391, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=15.53, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="calculate_dispersion_nonloc", label="calculate_dispersion_nonloc", y=10.918, yerr=0.0 Running bench_dftb.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/bench_dftb_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 2.030 2.070 160.213 160.213 qs_energies 1 2.0 0.000 0.000 158.070 158.070 ls_scf 1 3.0 0.000 0.000 151.001 151.002 ls_scf_main 1 4.0 0.000 0.001 139.745 139.745 density_matrix_trs4 5 5.0 0.004 0.004 111.280 111.283 dbcsr_multiply_generic 95 6.2 0.158 0.159 96.072 96.083 multiply_cannon 95 7.2 1.842 2.218 67.516 67.772 multiply_cannon_loop 95 8.2 0.170 0.171 56.588 56.811 multiply_cannon_multrec 190 9.2 42.741 43.014 47.930 48.151 ls_scf_dm_to_ks 5 5.0 0.000 0.000 26.615 26.621 make_m2s 190 7.2 0.015 0.015 24.231 24.236 make_images 190 8.2 5.404 5.606 23.692 23.695 matrix_ls_to_qs 5 6.0 0.000 0.000 17.456 17.509 dbcsr_complete_redistribute 11 7.5 10.467 10.536 14.837 14.907 matrix_decluster 5 7.0 0.000 0.000 13.526 13.592 arnoldi_extremal 6 6.2 0.000 0.000 11.458 11.460 arnoldi_normal_ev 6 7.2 0.005 0.005 11.458 11.460 build_subspace 12 8.2 0.032 0.032 11.227 11.227 qs_ks_update_qs_env 6 6.2 0.000 0.000 11.133 11.192 rebuild_ks_matrix 6 7.2 0.000 0.000 10.808 10.813 build_dftb_ks_matrix 6 8.2 0.001 0.001 10.808 10.813 build_dftb_coulomb 6 9.2 0.782 0.791 10.504 10.509 dbcsr_matrix_vector_mult 310 9.0 0.074 0.074 10.168 10.310 make_images_data 190 9.2 0.006 0.006 9.906 10.036 dbcsr_matrix_vector_mult_local 310 10.0 9.677 9.819 9.681 9.823 hybrid_alltoall_any 201 10.0 6.604 6.750 9.509 9.636 ls_scf_init_scf 1 4.0 0.000 0.000 9.568 9.569 tb_ewald_overlap 6 10.2 9.337 9.480 9.337 9.480 calculate_norms 380 9.2 7.955 7.972 7.955 7.972 dbcsr_finalize 277 7.6 0.103 0.104 7.535 7.585 ls_scf_init_matrix_S 1 5.0 0.000 0.000 7.561 7.562 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 7.009 7.009 dbcsr_merge_all 247 8.6 1.434 1.491 6.909 6.952 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.000 6.859 6.861 build_qs_neighbor_lists 1 4.0 0.000 0.000 6.433 6.458 build_neighbor_lists_sab_tbe 1 5.0 6.248 6.275 6.248 6.275 dbcsr_copy 443 8.0 0.924 0.964 4.864 4.891 dbcsr_special_finalize 285 9.2 0.005 0.005 4.794 4.806 setup_rec_index_2d 190 8.2 4.767 4.788 4.767 4.788 dbcsr_sort_indices 643 10.1 4.522 4.535 4.522 4.535 dbcsr_data_new 3509 9.3 4.154 4.487 4.154 4.487 dbcsr_add_d 130 6.0 0.001 0.001 4.342 4.395 dbcsr_add_anytype 130 7.0 1.849 1.856 4.341 4.394 dbcsr_dot 66 6.3 3.861 3.864 4.138 4.326 dbcsr_mm_accdrv_process 8119 10.0 0.430 0.504 4.106 4.149 dbcsr_copy_into_existing 5 8.0 3.929 3.942 3.929 3.942 dbcsr_mm_multrec_init 95 8.2 0.000 0.000 3.348 3.724 dbcsr_mm_csr_init 95 9.2 0.006 0.006 3.348 3.723 dbcsr_mm_sched_init 95 10.2 0.000 0.000 3.318 3.694 dbcsr_mm_accdrv_init 95 11.2 0.357 0.404 3.318 3.693 mp_waitall_1 2666 10.6 3.395 3.682 3.395 3.682 dbcsr_mm_accdrv_process_sort 8119 11.0 3.619 3.645 3.619 3.645 tree_to_linear_d 11 10.5 3.549 3.552 3.549 3.552 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="bench_dftb", label="bench_dftb", y=160.213, yerr=0.0 Plot: name="bench_dftb_timings_6cpu_1gpu", title="Timings of bench_dftb with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="rest", label="rest", y=80.036, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=42.741, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=10.467, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=9.677, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="tb_ewald_overlap", label="tb_ewald_overlap", y=9.337, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="calculate_norms", label="calculate_norms", y=7.955, yerr=0.0 Running dbcsr.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/dbcsr_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.004 0.005 49.445 49.445 lib_test 1 2.0 0.000 0.000 49.437 49.440 dbcsr_run_tests 3 3.0 0.000 0.000 49.436 49.439 test_multiplies_multiproc 3 4.0 0.001 0.001 38.267 38.297 dbcsr_multiply_generic 9 5.0 0.002 0.002 29.412 29.414 multiply_cannon 9 6.0 0.220 0.254 19.171 19.830 multiply_cannon_loop 9 7.0 0.003 0.003 17.657 18.075 multiply_cannon_multrec 18 8.0 9.293 9.784 16.369 16.787 dbcsr_make_random_matrix 9 4.0 7.717 7.736 11.032 11.063 dbcsr_finalize 27 5.7 0.001 0.001 7.503 7.519 dbcsr_merge_all 18 6.5 3.708 3.715 7.387 7.409 dbcsr_mm_accdrv_process 8199 9.0 1.087 1.189 6.845 6.914 dbcsr_redistribute 9 5.0 3.671 3.719 6.161 6.182 make_m2s 18 6.0 0.001 0.001 5.141 5.144 make_images 18 7.0 0.355 0.355 5.107 5.111 dbcsr_mm_accdrv_process_sort 8199 10.0 4.646 4.684 4.646 4.684 make_images_data 18 8.0 0.001 0.001 3.038 3.038 hybrid_alltoall_any 18 9.0 2.508 2.509 2.996 2.997 mp_alltoall_d11v 27 6.0 2.187 2.202 2.187 2.202 tree_to_linear_d 9 7.0 1.903 1.911 1.903 1.911 dbcsr_data_copy_aa2 18 7.5 1.640 1.645 1.640 1.645 dbcsr_data_release 507 7.7 1.427 1.429 1.427 1.429 mp_sum_l 61 4.9 0.668 1.319 0.668 1.319 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.667 1.318 jit_kernel_multiply 6 10.0 1.112 1.182 1.112 1.182 dbcsr_data_new 354 7.4 0.986 1.109 0.986 1.109 dbcsr_checksum 6 5.0 1.081 1.086 1.092 1.092 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="dbcsr", label="dbcsr", y=49.445, yerr=0.0 Plot: name="dbcsr_timings_6cpu_1gpu", title="Timings of dbcsr with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="rest", label="rest", y=20.41, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=9.293, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=7.717, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_mm_accdrv_process_sort", label="dbcsr_mm_accdrv_process_sort", y=4.646, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.708, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_redistribute", label="dbcsr_redistribute", y=3.671, yerr=0.0 Running MQAE_single_node.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/MQAE_single_node_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.045 0.047 206.543 206.544 qs_mol_dyn_low 1 2.0 0.004 0.005 204.974 205.008 qs_forces 6 3.8 0.001 0.001 128.736 128.736 qs_energies 6 4.8 0.001 0.001 121.425 121.425 scf_env_do_scf 6 5.8 0.000 0.000 114.768 114.768 scf_env_do_scf_inner_loop 113 6.2 0.006 0.008 107.603 107.604 velocity_verlet 5 3.0 0.003 0.003 98.192 98.239 rebuild_ks_matrix 119 8.1 0.000 0.001 88.817 88.817 qs_ks_build_kohn_sham_matrix 119 9.1 0.020 0.020 88.816 88.816 qs_ks_update_qs_env 119 7.3 0.001 0.001 83.795 83.795 fft_wrap_pw1pw2 2059 12.4 0.042 0.044 69.953 69.975 fft_wrap_pw1pw2_150 1321 13.9 0.009 0.009 67.054 67.120 qs_vxc_create 119 10.1 0.003 0.004 56.146 56.150 xc_vxc_pw_create 119 11.1 1.530 1.535 56.143 56.146 qmmm_el_coupling 6 3.8 0.000 0.000 40.478 40.481 qmmm_elec_with_gaussian 6 4.8 0.023 0.023 40.471 40.474 xc_pw_derive 714 13.1 0.010 0.010 39.259 39.282 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 39.016 39.051 pw_gpu_c1dr3d_3d_ps 1095 14.8 10.652 10.748 37.684 37.698 qmmm_elec_gaussian_low_G 6 6.8 34.265 34.302 34.265 34.302 qmmm_forces 6 3.8 0.001 0.001 33.068 33.069 qmmm_forces_with_gaussian 6 4.8 0.023 0.023 31.822 32.705 pw_gpu_r3dc1d_3d_ps 964 14.0 9.556 9.626 32.215 32.253 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 30.459 31.343 xc_rho_set_and_dset_create 119 12.1 2.468 2.478 28.018 28.024 qmmm_forces_gaussian_low_G 6 6.8 25.539 26.429 25.539 26.429 xc_pw_divergence 119 12.1 0.006 0.006 26.201 26.208 qs_rho_update_rho_low 119 7.3 0.001 0.001 23.471 23.693 calculate_rho_elec 119 8.3 1.106 1.106 23.470 23.693 mp_alltoall_z22v 2059 16.4 17.665 17.781 17.665 17.781 density_rs2pw 119 9.3 0.008 0.008 17.224 17.407 sum_up_and_integrate 119 10.1 0.002 0.003 16.293 16.303 integrate_v_rspace 119 11.1 0.022 0.023 16.102 16.112 x_to_yz 1095 15.8 2.345 2.359 11.969 12.036 dbcsr_multiply_generic 2598 12.3 0.102 0.104 11.169 11.350 potential_pw2rs 119 12.1 0.034 0.035 10.404 10.406 yz_to_x 964 15.0 1.817 1.819 9.858 9.919 multiply_cannon 2598 13.3 0.226 0.229 9.527 9.781 multiply_cannon_loop 2598 14.3 0.261 0.265 9.027 9.278 qs_ks_ddapc 119 10.1 0.002 0.003 9.240 9.253 pw_gpu_sf 1095 15.8 8.608 8.613 8.608 8.613 pw_gpu_fg 964 15.0 7.755 7.818 7.755 7.818 init_scf_loop 6 6.8 0.000 0.000 7.162 7.162 qs_scf_new_mos 113 7.2 0.001 0.001 7.039 7.040 qs_scf_loop_do_ot 113 8.2 0.001 0.001 7.038 7.039 ot_scf_mini 113 9.2 0.002 0.002 6.750 6.751 multiply_cannon_multrec 5196 15.3 3.191 3.232 6.634 6.687 pw_gpu_ffc 1095 15.8 6.438 6.451 6.438 6.451 grid_integrate_task_list 119 12.1 5.675 5.685 5.675 5.685 xc_functional_eval 238 13.1 0.003 0.003 5.204 5.205 grid_collocate_task_list 119 9.3 5.111 5.142 5.111 5.142 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 5.052 5.052 pw_gpu_cff 964 15.0 4.980 4.995 4.980 4.995 qmmm_forces_gaussian_low_R 6 6.8 0.000 0.000 4.920 4.927 qmmm_forces_with_gaussian_LG 6 7.8 4.920 4.927 4.920 4.927 pw_poisson_solve 125 9.9 0.003 0.003 4.762 4.763 qmmm_elec_gaussian_low_R 6 6.8 0.000 0.000 4.750 4.752 qmmm_elec_with_gaussian_LG 6 7.8 4.750 4.752 4.750 4.752 ot_mini 113 10.2 0.001 0.001 4.712 4.712 init_scf_run 6 5.8 0.000 0.000 4.500 4.500 scf_env_initial_rho_setup 6 6.8 0.000 0.000 4.500 4.500 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="MQAE_single_node", label="MQAE_single_node", y=206.543, yerr=0.0 Plot: name="MQAE_single_node_timings_6cpu_1gpu", title="Timings of MQAE_single_node with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="rest", label="rest", y=108.86600000000001, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=34.265, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=25.539, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=17.665, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.652, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=9.556, yerr=0.0 Summary: Performance test took 40 minutes. Status: OK ---> Removed intermediate container dbd1eb247fb8 ---> 63eb98f6faa2 Step 45/46 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in 4b9dda5a47b2 ---> Removed intermediate container 4b9dda5a47b2 ---> 1331742d6e9d Step 46/46 : ENTRYPOINT [] ---> Running in 448464f5e3c9 ---> Removed intermediate container 448464f5e3c9 ---> 66f37f86a758 [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built 66f37f86a758 Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2026-07-01 08:36:46+00:00