StartDate: 2026-07-02 06:07:24+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: 5976d25a45174d28df9c77845b753e0a0585d6d8 CommitTime: 2026-07-01 18:37:51 +0200 CommitAuthor: SY Wang CommitSubject: Spack: Remove `+dlaf` from SIRIUS requirement (#5499) #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=5976d25a45174d28df9c77845b753e0a0585d6d8 SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 420.9MB Step 1/46 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 5a2aba542b08: Waiting 6cb9b761b877: Waiting 02cb0e091e33: Waiting 9c3d619183d2: Waiting 7f7602a82106: Waiting 2cd52cbb1ebe: Waiting 6e8af4fd0a07: Waiting 15a17189b2df: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 6e8af4fd0a07: Verifying Checksum 6e8af4fd0a07: Download complete 2cd52cbb1ebe: Download complete 02cb0e091e33: Verifying Checksum 02cb0e091e33: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 9c3d619183d2: Verifying Checksum 9c3d619183d2: Download complete 7f7602a82106: Verifying Checksum 7f7602a82106: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/46 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/46 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/46 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/46 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/46 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/46 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> ae920e0abda3 Step 8/46 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> 94839a704e2d Step 9/46 : RUN mkdir scripts ---> Using cache ---> 433a8b0a0499 Step 10/46 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./scripts/ ---> eaba2cf3554a Step 11/46 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> 8f7240ec1de1 Step 12/46 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --with-sirius=install --gpu-ver=V100 --dry-run ---> Running in 24c55a770d32 No MPI installation detected. (Ignore this message if a fresh MPI installation is requested.) Toolchain script received the following options: --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --with-sirius=install --gpu-ver=V100 --dry-run Parsing options and resolving conflicts... WARNING: (./install_cp2k_toolchain.sh, line 1163) Installing dependencies and CP2K requires CMake but CMake is not enabled, so a new copy of CMake will be installed first.  Toolchain configuration summary ------------------------------- System specifications: -j = 12 --target-cpu = native --gpu-ver = V100 --mpi-mode = mpich --math-mode = openblas Enabled features: --enable-tsan = no --enable-cuda = yes --enable-gauxc-cutlass = no --enable-hip = no --enable-opencl = no --enable-cray = no Packages to be installed: - cmake - mpich - openblas - fftw - eigen - libint - libxc - libxsmm - libxs - cosma - scalapack - elpa - dbcsr - spfft - spla - gsl - spglib - hdf5 - libvdwxc - sirius - libvori - tblite - pugixml - fmt Packages to be detected from system: - gcc Packages not used: - intel - amd - ninja - openmpi - intelmpi - mkl - acml - gauxc - libxstream - cusolvermp - plumed - libtorch - deepmd - ace - dftd4 - libsmeagol - trexio - libfci - greenx - gmp - mcl With --dry-run option, this script concludes with above report. The setup, toolchain env and conf files are written to /opt/cp2k-toolchain/install. ---> Removed intermediate container 24c55a770d32 ---> a2e1ccf1a19f Step 13/46 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> f0b7d4fcc4bd Step 14/46 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Running in 893ca65a9211 ==================== Finding GCC from system paths ==================== path to gcc is /usr/bin/gcc path to g++ is /usr/bin/g++ path to gfortran is /usr/bin/gfortran GCC compiler version 13.3.0 found Step gcc took 0.00 seconds. Step intel took 0.00 seconds. Step amd took 0.00 seconds. ==================== Getting proc arch info using OpenBLAS tools ==================== wget --quiet https://www.cp2k.org/static/downloads/OpenBLAS-0.3.33.tar.gz -O OpenBLAS-0.3.33.tar.gz OpenBLAS-0.3.33.tar.gz: OK Checksum of OpenBLAS-0.3.33.tar.gz Ok OpenBLAS detected LIBCORE = skylakex OpenBLAS detected ARCH = x86_64 ==================== Installing CMake ==================== wget --quiet https://www.cp2k.org/static/downloads/cmake-4.3.0-linux-x86_64.tar.gz -O cmake-4.3.0-linux-x86_64.tar.gz cmake-4.3.0-linux-x86_64.tar.gz: OK Checksum of cmake-4.3.0-linux-x86_64.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/cmake-4.3.0 Step cmake took 5.00 seconds. Step ninja took 0.00 seconds. ---> Removed intermediate container 893ca65a9211 ---> 5e0ae7ff9f4d Step 15/46 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> 14fb568c2fa8 Step 16/46 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Running in e2b13667bcd7 ==================== Installing MPICH ==================== wget --quiet https://www.cp2k.org/static/downloads/mpich-5.0.1.tar.gz -O mpich-5.0.1.tar.gz mpich-5.0.1.tar.gz: OK Checksum of mpich-5.0.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/mpich-5.0.1 for MPICH device ch4 Found directory /opt/cp2k-toolchain/install/mpich-5.0.1/bin Found directory /opt/cp2k-toolchain/install/mpich-5.0.1/lib Found directory /opt/cp2k-toolchain/install/mpich-5.0.1/include mpiexec is installed as /opt/cp2k-toolchain/install/mpich-5.0.1/bin/mpiexec mpicc is installed as /opt/cp2k-toolchain/install/mpich-5.0.1/bin/mpicc mpicxx is installed as /opt/cp2k-toolchain/install/mpich-5.0.1/bin/mpicxx mpifort is installed as /opt/cp2k-toolchain/install/mpich-5.0.1/bin/mpifort Step mpich took 611.00 seconds. ---> Removed intermediate container e2b13667bcd7 ---> 660d42919b43 Step 17/46 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> af3f47626934 Step 18/46 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Running in 5fbc33da7f9e ==================== Installing OpenBLAS ==================== wget --quiet https://www.cp2k.org/static/downloads/OpenBLAS-0.3.33.tar.gz -O OpenBLAS-0.3.33.tar.gz OpenBLAS-0.3.33.tar.gz: OK Checksum of OpenBLAS-0.3.33.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/openblas-0.3.33 Installing OpenBLAS library for target SKYLAKEX Step openblas took 310.00 seconds. Step gmp took 0.00 seconds. ---> Removed intermediate container 5fbc33da7f9e ---> b838f4570b28 Step 19/46 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> 1bece8100c85 Step 20/46 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Running in 431112a1e724 ==================== Installing FFTW ==================== wget --quiet https://www.cp2k.org/static/downloads/fftw-3.3.11.tar.gz -O fftw-3.3.11.tar.gz fftw-3.3.11.tar.gz: OK Checksum of fftw-3.3.11.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/fftw-3.3.11 Step fftw took 171.00 seconds. ==================== Installing Eigen ==================== wget --quiet https://www.cp2k.org/static/downloads/eigen-5.0.1.tar.gz -O eigen-5.0.1.tar.gz eigen-5.0.1.tar.gz: OK Checksum of eigen-5.0.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/eigen-5.0.1 Step eigen took 4.00 seconds. ==================== Installing LIBINT ==================== wget --quiet https://www.cp2k.org/static/downloads/libint-v2.13.1-cp2k-lmax-5.tar.xz -O libint-v2.13.1-cp2k-lmax-5.tar.xz libint-v2.13.1-cp2k-lmax-5.tar.xz: OK Checksum of libint-v2.13.1-cp2k-lmax-5.tar.xz Ok Installing from scratch into /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5 Step libint took 543.00 seconds. ==================== Installing LIBXC ==================== wget --quiet https://www.cp2k.org/static/downloads/libxc-7.0.0.tar.bz2 -O libxc-7.0.0.tar.bz2 libxc-7.0.0.tar.bz2: OK Checksum of libxc-7.0.0.tar.bz2 Ok Installing from scratch into /opt/cp2k-toolchain/install/libxc-7.0.0 Step libxc took 417.00 seconds. Step greenx took 0.00 seconds. ---> Removed intermediate container 431112a1e724 ---> 6eaad9fb98c8 Step 21/46 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> 8b7a39cbd691 Step 22/46 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Running in 5e6dfd7992b4 ==================== Installing Libxsmm ==================== wget --quiet https://www.cp2k.org/static/downloads/libxsmm-2.0.0.tar.gz -O libxsmm-2.0.0.tar.gz libxsmm-2.0.0.tar.gz: OK Checksum of libxsmm-2.0.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libxsmm-2.0.0 Step libxsmm took 21.00 seconds. ==================== Installing LIBXS ==================== wget --quiet https://www.cp2k.org/static/downloads/libxs-1.0.0.tar.gz -O libxs-1.0.0.tar.gz libxs-1.0.0.tar.gz: OK Checksum of libxs-1.0.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libxs-1.0.0 Step libxs took 8.00 seconds. Step libxstream took 0.00 seconds. ==================== Installing ScaLAPACK ==================== wget --quiet https://www.cp2k.org/static/downloads/scalapack-2.2.3.tar.gz -O scalapack-2.2.3.tar.gz scalapack-2.2.3.tar.gz: OK Checksum of scalapack-2.2.3.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/scalapack-2.2.3 Step scalapack took 35.00 seconds. Step cusolvermp took 0.00 seconds. ==================== Installing COSMA ==================== wget --quiet https://www.cp2k.org/static/downloads/COSMA-v2.8.4.tar.gz -O COSMA-v2.8.4.tar.gz COSMA-v2.8.4.tar.gz: OK Checksum of COSMA-v2.8.4.tar.gz Ok wget --quiet https://www.cp2k.org/static/downloads/COSTA-v2.3.2.tar.gz -O COSTA-v2.3.2.tar.gz COSTA-v2.3.2.tar.gz: OK Checksum of COSTA-v2.3.2.tar.gz Ok wget --quiet https://www.cp2k.org/static/downloads/Tiled-MM-v2.3.2.tar.gz -O Tiled-MM-v2.3.2.tar.gz Tiled-MM-v2.3.2.tar.gz: OK Checksum of Tiled-MM-v2.3.2.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/COSMA-2.8.4 Step cosma took 67.00 seconds. ---> Removed intermediate container 5e6dfd7992b4 ---> ca179760b3ee Step 23/46 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> a7a29f629e49 Step 24/46 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Running in 436bd7036663 ==================== Installing ELPA ==================== wget --quiet https://www.cp2k.org/static/downloads/elpa-2026.02.001.tar.gz -O elpa-2026.02.001.tar.gz elpa-2026.02.001.tar.gz: OK Checksum of elpa-2026.02.001.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/elpa-2026.02.001 Installing from scratch into /opt/cp2k-toolchain/install/elpa-2026.02.001/cpu Installing from scratch into /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia Step elpa took 304.00 seconds. ---> Removed intermediate container 436bd7036663 ---> 3320f2fe2fa7 Step 25/46 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> 957da9de6a04 Step 26/46 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Running in d41857e0fe31 ==================== Installing GSL ==================== wget --quiet https://www.cp2k.org/static/downloads/gsl-2.8.tar.gz -O gsl-2.8.tar.gz gsl-2.8.tar.gz: OK Checksum of gsl-2.8.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/gsl-2.8 Step gsl took 70.00 seconds. Step plumed took 0.00 seconds. Step libtorch took 0.00 seconds. Step gauxc took 0.00 seconds. Step deepmd took 0.00 seconds. Step ace took 0.00 seconds. ---> Removed intermediate container d41857e0fe31 ---> 1faa58991044 Step 27/46 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> 079c66b75332 Step 28/46 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Running in 9b7bbcb1436c ==================== Installing HDF5 ==================== wget --quiet https://www.cp2k.org/static/downloads/hdf5-2.1.1.tar.gz -O hdf5-2.1.1.tar.gz hdf5-2.1.1.tar.gz: OK Checksum of hdf5-2.1.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/hdf5-2.1.1 Step hdf5 took 121.00 seconds. ==================== Installing libvdwxc ==================== wget --quiet https://www.cp2k.org/static/downloads/libvdwxc-0.5.0.tar.gz -O libvdwxc-0.5.0.tar.gz libvdwxc-0.5.0.tar.gz: OK Checksum of libvdwxc-0.5.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libvdwxc-0.5.0 Step libvdwxc took 14.00 seconds. ==================== Installing Spglib ==================== wget --quiet https://www.cp2k.org/static/downloads/spglib-2.7.0.tar.gz -O spglib-2.7.0.tar.gz spglib-2.7.0.tar.gz: OK Checksum of spglib-2.7.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/spglib-2.7.0 Step spglib took 4.00 seconds. ==================== Installing libvori ==================== wget --quiet https://www.cp2k.org/static/downloads/libvori-220621.tar.gz -O libvori-220621.tar.gz libvori-220621.tar.gz: OK Checksum of libvori-220621.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/libvori-220621 Step libvori took 13.00 seconds. Step libsmeagol took 0.00 seconds. Step libfci took 0.00 seconds. ==================== Installing fmt ==================== wget --quiet https://www.cp2k.org/static/downloads/fmt-12.1.0.zip -O fmt-12.1.0.zip fmt-12.1.0.zip: OK Checksum of fmt-12.1.0.zip Ok Installing from scratch into /opt/cp2k-toolchain/install/fmt-12.1.0 Step fmt took 8.00 seconds. ---> Removed intermediate container 9b7bbcb1436c ---> 5bae3a09b266 Step 29/46 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> f09556c118a7 Step 30/46 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Running in befbe0598838 Step dftd4 took 0.00 seconds. ==================== Installing tblite ==================== wget --quiet https://www.cp2k.org/static/downloads/tblite-0.6.0.tar.xz -O tblite-0.6.0.tar.xz tblite-0.6.0.tar.xz: OK Checksum of tblite-0.6.0.tar.xz Ok Step tblite took 39.00 seconds. ==================== Installing pugixml ==================== wget --quiet https://www.cp2k.org/static/downloads/pugixml-1.15.tar.gz -O pugixml-1.15.tar.gz pugixml-1.15.tar.gz: OK Checksum of pugixml-1.15.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/pugixml-1.15 Step pugixml took 9.00 seconds. ==================== Installing SpFFT ==================== wget --quiet https://www.cp2k.org/static/downloads/SpFFT-1.1.1.tar.gz -O SpFFT-1.1.1.tar.gz SpFFT-1.1.1.tar.gz: OK Checksum of SpFFT-1.1.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/SpFFT-1.1.1 Step spfft took 20.00 seconds. ==================== Installing SpLA ==================== wget --quiet https://www.cp2k.org/static/downloads/SpLA-1.6.1.tar.gz -O SpLA-1.6.1.tar.gz SpLA-1.6.1.tar.gz: OK Checksum of SpLA-1.6.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/SpLA-1.6.1 Step spla took 23.00 seconds. ==================== Installing SIRIUS ==================== wget --quiet https://www.cp2k.org/static/downloads/SIRIUS-7.11.1.tar.gz -O SIRIUS-7.11.1.tar.gz SIRIUS-7.11.1.tar.gz: OK Checksum of SIRIUS-7.11.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/sirius-7.11.1 Installing from scratch into /opt/cp2k-toolchain/install/sirius-7.11.1/cuda Step sirius took 428.00 seconds. Step trexio took 0.00 seconds. Step MCL took 0.00 seconds. ---> Removed intermediate container befbe0598838 ---> cfdc670ad1c8 Step 31/46 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> a836cce73f97 Step 32/46 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Running in 808715fba64d ==================== Installing DBCSR ==================== wget --quiet https://www.cp2k.org/static/downloads/dbcsr-2.10.0.tar.gz -O dbcsr-2.10.0.tar.gz dbcsr-2.10.0.tar.gz: OK Checksum of dbcsr-2.10.0.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/dbcsr-2.10.0 Installing from scratch into /opt/cp2k-toolchain/install/dbcsr-2.10.0-cuda Step DBCSR took 120.00 seconds. ---> Removed intermediate container 808715fba64d ---> d9474852e839 Step 33/46 : WORKDIR /opt/cp2k ---> Running in 2de3e46cea8a ---> Removed intermediate container 2de3e46cea8a ---> 1927405e02e1 Step 34/46 : COPY ./src ./src ---> 056c9be6d159 Step 35/46 : COPY ./data ./data ---> 76ce0cc5db81 Step 36/46 : COPY ./tools/build_utils ./tools/build_utils ---> b6b29df56301 Step 37/46 : COPY ./cmake ./cmake ---> db3c51754897 Step 38/46 : COPY ./CMakeLists.txt . ---> c982394a81d9 Step 39/46 : COPY ./tools/docker/scripts/build_cp2k.sh ./tools/docker/scripts/cmake_cp2k.sh ./ ---> ab8d20f0e86e Step 40/46 : RUN ./build_cp2k.sh toolchain_cuda_V100 psmp ---> Running in 56dc5e39b054 ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpi.so (found version "5.0") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpicxx.so (found version "5.0") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpifort.so (found version "5.0") -- Found MPI: TRUE (found version "5.0") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "5.0") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.33 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a -- Using LIBXS + LIBXSMM for Small Matrix Multiplication -- CP2K_WITH_GPU is deprecated in favor of CMAKE_HIP_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "5.0") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 70 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "2.1.1") found components: C Fortran -- Found MPI: TRUE (found version "5.0") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.11/include -- Boost detected. satisfied by headers bundled with Libint2 distribution -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- mctc-lib: Find installed package -- multicharge: Find installed package -- DFTD4: found version 4.2.0, using v4.2+ API -- toml-f: Find installed package -- s-dftd3: Find installed package -- DFTD4: found version 4.2.0, using v4.2+ API -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.7.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "2.1.1") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.11/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Checking for one of the modules 's-dftd3' -- Checking for one of the modules 'mctc-lib' -- Found DFTD3: /opt/cp2k-toolchain/install/tblite-0.6.0/lib/libs-dftd3.a -- Checking for one of the modules 'dftd4' -- Checking for one of the modules 'multicharge' -- Found DFTD4: /opt/cp2k-toolchain/install/tblite-0.6.0/lib/libdftd4.a -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'scalapack' -- Checking for one of the modules 'elpa;elpa_openmp;elpa-openmp-2019.05.001;elpa_openmp-2019.11.001;elpa_openmp-2020.05.001;elpa-2019.05.001;elpa-2019.11.001;elpa-2020.05.001' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so -- Checking for module 'libvdwxc>=0.5.0' -- Found libvdwxc, version 0.5.0 -- Checking for module 'fftw3' -- Found fftw3, version 3.3.11 -- Found LibVDWXC: vdwxc;fftw3 (Required is at least version "0.5.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -- Adding libxs_jit.F from dependency libxs for compilation -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - Vendor: OpenBLAS - Include directories: /opt/cp2k-toolchain/install/openblas-0.3.33/include - Libraries: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so - LAPACK - Include directories: /opt/cp2k-toolchain/install/openblas-0.3.33/include - Libraries: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so - MPI - Include directories: /opt/cp2k-toolchain/install/mpich-5.0.1/include - Libraries: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpi.so - MPI_F08: Enabled - ScaLAPACK - Vendor: auto - Include directories: - Libraries: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a - Hardware acceleration - Backend: CUDA - GPU architectures: 70 - GPU profiling enabled: OFF - GPU-accelerated modules - ELPA: ON - GRID: ON - DBM: ON - PW: ON - LibXC - Include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - Libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - Include directories: /opt/cp2k-toolchain/install/hdf5-2.1.1/include - Libraries: hdf5-shared - FFTW3 - Include directories: /opt/cp2k-toolchain/install/fftw-3.3.11/include - Libraries: /opt/cp2k-toolchain/install/fftw-3.3.11/lib/libfftw3.a - LIBXS - Include directories: - Libraries: - SpLA - Include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - Libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - DFTD4 - Enabled via TBLITE - Include directories: /opt/cp2k-toolchain/install/tblite-0.6.0/include;/opt/cp2k-toolchain/install/tblite-0.6.0/include/dftd4/GNU-13.3.0 - Libraries: - TBLITE - Include directories: /opt/cp2k-toolchain/install/tblite-0.6.0/include;/opt/cp2k-toolchain/install/tblite-0.6.0/include/tblite/GNU-13.3.0 - Libraries: - SIRIUS - Include directories: - Libraries: - COSMA - Include directories: /opt/cp2k-toolchain/install/COSMA-2.8.4-cuda/include - Libraries: MPI::MPI_CXX;costa::costa;$;$;$<$:cosma::BLAS::blas>;$;$<$:Tiled-MM::Tiled-MM>;$<$:Tiled-MM::Tiled-MM>;$<$:semiprof::semiprof>;$<$:cosma::scalapack::scalapack> - Libint2 - Include directories: - Libraries: - ELPA - Include directories: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/include/elpa_openmp-2026.02.001 - Libraries: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -------------------------------------------------------------------- - - - Dependencies not included in this build - - - -------------------------------------------------------------------- - DeePMD - PEXSI - ACE (libpace) - Spglib - LibSMEAGOL - MiMiC - DLA-Future - PLUMED - LibFCI - GauXC - Libvori - LibTorch - TREXIO - OpenPMD - GreenX After building and installing CP2K, run the regtests with: /opt/cp2k/tests/do_regtest.py /opt/cp2k/bin psmp -- Configuring done (12.8s) -- Generating done (0.5s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container 56dc5e39b054 ---> ec112c160635 Step 41/46 : COPY ./benchmarks ./benchmarks ---> c66f24efdbfc Step 42/46 : COPY ./tools/regtesting ./tools/regtesting ---> ad79704df7e8 Step 43/46 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> e345897b3ce0 Step 44/46 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in dcfa85c2ede3 ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma libxs libxsmm dbcsr_acc libdftd4 dftd4_v4_2 s_dftd3 mctc-lib tblite sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 83 input files and 0 errors. ========== Running Performance Test ========== Plot: name="total_timings_6cpu_1gpu", title="Total Timings with 6 CPU Cores and 1 GPU", ylabel="time [s]" Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.027 0.027 99.405 99.405 qs_mol_dyn_low 1 2.0 0.004 0.005 99.003 99.005 qs_forces 11 3.9 0.002 0.002 98.953 98.953 qs_energies 11 4.9 0.001 0.001 88.270 88.272 scf_env_do_scf 11 5.9 0.001 0.001 73.403 73.404 scf_env_do_scf_inner_loop 108 6.5 0.006 0.008 62.886 62.886 velocity_verlet 10 3.0 0.001 0.002 61.046 61.064 rebuild_ks_matrix 119 8.3 0.001 0.001 27.221 27.222 qs_ks_build_kohn_sham_matrix 119 9.3 0.020 0.020 27.220 27.221 qs_ks_update_qs_env 119 7.6 0.001 0.001 25.223 25.225 dbcsr_multiply_generic 2286 12.5 0.146 0.149 25.168 25.209 qs_rho_update_rho_low 119 7.7 0.001 0.001 21.981 21.996 calculate_rho_elec 119 8.7 0.855 0.864 21.980 21.995 qs_scf_new_mos 108 7.5 0.001 0.001 20.649 20.663 qs_scf_loop_do_ot 108 8.5 0.001 0.001 20.648 20.662 ot_scf_mini 108 9.5 0.003 0.003 18.679 18.680 fft_wrap_pw1pw2 1201 11.6 0.023 0.023 16.645 16.672 sum_up_and_integrate 119 10.3 0.002 0.003 14.435 14.489 integrate_v_rspace 119 11.3 0.345 0.347 14.345 14.401 fft_wrap_pw1pw2_140 487 12.2 0.003 0.003 14.338 14.382 multiply_cannon 2286 13.5 0.338 0.341 12.602 12.638 multiply_cannon_loop 2286 14.5 0.262 0.262 11.533 11.547 make_m2s 4572 13.5 0.045 0.046 10.905 10.931 ot_mini 108 10.5 0.001 0.001 10.889 10.889 density_rs2pw 119 9.7 0.008 0.008 10.782 10.881 make_images 4572 14.5 1.118 1.122 10.731 10.756 init_scf_loop 11 6.9 0.000 0.000 10.432 10.432 grid_collocate_task_list 119 9.7 10.312 10.380 10.312 10.380 pw_gpu_r3dc1d_3d_ps 606 13.1 2.353 2.372 8.540 8.551 pw_gpu_c1dr3d_3d_ps 595 14.2 2.228 2.255 8.075 8.091 grid_integrate_task_list 119 12.3 7.557 7.614 7.557 7.614 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.483 7.592 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 7.479 7.479 prepare_preconditioner 11 7.9 0.000 0.000 7.162 7.165 make_preconditioner 11 8.9 0.000 0.000 7.162 7.165 init_scf_run 11 5.9 0.000 0.000 6.723 6.723 scf_env_initial_rho_setup 11 6.9 0.000 0.001 6.722 6.722 qs_ot_get_derivative 108 11.5 0.002 0.002 6.591 6.592 hybrid_alltoall_any 4725 16.4 4.791 4.828 6.486 6.528 potential_pw2rs 119 12.3 0.037 0.038 6.442 6.443 make_images_data 4572 15.5 0.055 0.056 6.376 6.402 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.037 6.301 multiply_cannon_multrec 4572 15.5 2.048 2.112 6.162 6.203 mp_alltoall_z22v 1201 15.6 4.217 4.300 4.217 4.300 ot_diis_step 108 11.5 0.006 0.006 4.273 4.273 wfi_extrapolate 11 7.9 0.001 0.001 4.005 4.005 build_core_ppl_forces 11 5.9 3.756 3.852 3.756 3.852 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.789 3.824 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.718 3.721 apply_single 119 13.6 0.001 0.001 3.718 3.721 dbcsr_mm_accdrv_process 9594 16.2 0.759 0.906 3.707 3.718 mp_waitall_1 64495 16.9 3.532 3.543 3.532 3.543 dbcsr_complete_redistribute 329 12.2 1.215 1.218 3.225 3.494 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.332 3.364 calculate_dm_sparse 119 9.5 0.001 0.001 3.320 3.335 qs_ot_get_p 119 10.4 0.001 0.001 3.292 3.295 multiply_cannon_sync_h2d 4572 15.5 3.072 3.101 3.072 3.101 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.084 3.085 transfer_rs2pw 487 10.6 0.008 0.008 2.565 2.683 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.681 2.681 pw_poisson_solve 119 10.3 0.003 0.003 2.674 2.680 yz_to_x 606 14.1 0.464 0.469 2.621 2.661 x_to_yz 595 15.2 0.514 0.519 2.574 2.607 qs_create_task_list 11 7.9 0.000 0.000 2.479 2.546 generate_qs_task_list 11 8.9 1.122 1.129 2.479 2.546 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.518 2.523 jit_kernel_multiply 12 15.7 2.369 2.522 2.369 2.522 qs_ot_get_derivative_taylor 59 13.0 0.003 0.003 2.284 2.285 calculate_first_density_matrix 1 7.0 0.000 0.000 2.262 2.262 transfer_rs2pw_140 130 11.5 1.523 1.543 2.125 2.251 cp_fm_cholesky_invert 11 10.9 2.198 2.198 2.198 2.198 pw_gpu_fg 606 14.1 2.159 2.182 2.159 2.182 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.165 2.165 qs_ot_p2m_diag 50 11.0 0.084 0.085 2.125 2.127 dbcsr_special_finalize 6858 15.5 0.039 0.040 2.046 2.047 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64", label="H2O-64", y=99.405, yerr=0.0 Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=68.772, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=10.312, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.557, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.791, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=4.217, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.756, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.026 0.026 90.742 90.742 qs_mol_dyn_low 1 2.0 0.004 0.004 90.315 90.317 qs_forces 11 3.9 0.002 0.002 90.267 90.267 qs_energies 11 4.9 0.001 0.001 79.589 79.589 scf_env_do_scf 11 5.9 0.001 0.001 64.162 64.162 velocity_verlet 10 3.0 0.001 0.002 57.227 57.244 scf_env_do_scf_inner_loop 96 6.5 0.006 0.009 53.548 53.548 rebuild_ks_matrix 107 8.3 0.001 0.001 24.804 24.807 qs_ks_build_kohn_sham_matrix 107 9.3 0.017 0.017 24.804 24.807 qs_ks_update_qs_env 107 7.6 0.001 0.001 22.673 22.677 dbcsr_multiply_generic 1966 12.4 0.127 0.128 22.610 22.640 qs_scf_new_mos 96 7.5 0.001 0.001 18.131 18.131 qs_scf_loop_do_ot 96 8.5 0.001 0.001 18.130 18.131 qs_rho_update_rho_low 107 7.7 0.001 0.001 17.488 17.505 calculate_rho_elec 107 8.7 0.759 0.764 17.487 17.505 ot_scf_mini 96 9.5 0.003 0.003 16.407 16.412 fft_wrap_pw1pw2 1081 11.6 0.020 0.020 15.085 15.126 sum_up_and_integrate 107 10.3 0.002 0.002 13.384 13.415 integrate_v_rspace 107 11.3 0.315 0.316 13.304 13.334 fft_wrap_pw1pw2_140 439 12.2 0.003 0.003 12.992 13.029 multiply_cannon 1966 13.4 0.293 0.295 11.360 11.375 init_scf_loop 11 6.9 0.000 0.000 10.534 10.534 multiply_cannon_loop 1966 14.4 0.225 0.227 10.454 10.456 density_rs2pw 107 9.7 0.007 0.007 9.771 9.895 make_m2s 3932 13.4 0.039 0.040 9.777 9.801 make_images 3932 14.4 0.997 1.004 9.624 9.648 ot_mini 96 10.5 0.001 0.001 9.599 9.601 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.329 8.329 pw_gpu_r3dc1d_3d_ps 546 13.1 2.109 2.134 7.755 7.759 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.430 7.563 pw_gpu_c1dr3d_3d_ps 535 14.2 2.000 2.021 7.304 7.342 prepare_preconditioner 11 7.9 0.000 0.000 7.244 7.245 make_preconditioner 11 8.9 0.000 0.000 7.244 7.245 grid_integrate_task_list 107 12.3 7.170 7.201 7.170 7.201 grid_collocate_task_list 107 9.7 6.927 7.014 6.927 7.014 init_scf_run 11 5.9 0.000 0.000 6.441 6.441 scf_env_initial_rho_setup 11 6.9 0.000 0.001 6.440 6.440 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.082 6.340 qs_ot_get_derivative 96 11.5 0.001 0.001 5.859 5.864 hybrid_alltoall_any 4079 16.3 4.297 4.315 5.846 5.851 potential_pw2rs 107 12.3 0.033 0.033 5.818 5.818 make_images_data 3932 15.4 0.050 0.051 5.719 5.720 multiply_cannon_multrec 3932 15.4 1.774 1.812 5.672 5.717 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.151 4.286 build_core_ppl_forces 11 5.9 3.755 3.857 3.755 3.857 mp_alltoall_z22v 1081 15.6 3.759 3.834 3.759 3.834 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.762 3.799 wfi_extrapolate 11 7.9 0.001 0.001 3.756 3.756 ot_diis_step 96 11.5 0.005 0.005 3.720 3.720 dbcsr_complete_redistribute 317 12.2 1.218 1.220 3.396 3.656 dbcsr_mm_accdrv_process 8450 16.1 1.186 1.276 3.557 3.567 qs_create_task_list 11 7.9 0.000 0.000 3.293 3.391 generate_qs_task_list 11 8.9 1.385 1.388 3.293 3.391 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.315 3.317 apply_single 107 13.6 0.001 0.001 3.315 3.317 mp_waitall_1 55487 16.8 3.133 3.148 3.133 3.148 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.109 3.109 calculate_dm_sparse 107 9.5 0.001 0.001 3.043 3.049 qs_ot_get_p 107 10.4 0.001 0.001 2.838 2.842 multiply_cannon_sync_h2d 3932 15.4 2.762 2.807 2.762 2.807 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.669 2.683 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.664 2.666 transfer_rs2pw 439 10.6 0.007 0.007 2.368 2.511 pw_poisson_solve 107 10.3 0.003 0.003 2.400 2.401 yz_to_x 546 14.1 0.418 0.423 2.337 2.382 x_to_yz 535 15.2 0.462 0.465 2.302 2.324 calculate_first_density_matrix 1 7.0 0.000 0.000 2.256 2.256 cp_fm_cholesky_invert 11 10.9 2.167 2.167 2.167 2.167 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.153 2.154 transfer_rs2pw_140 118 11.5 1.391 1.416 1.966 2.121 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.104 2.115 pw_gpu_fg 546 14.1 2.015 2.025 2.015 2.025 build_core_ppl 11 7.9 1.972 2.003 1.972 2.003 copy_fm_to_dbcsr 170 11.1 0.002 0.002 1.700 1.960 qs_ot_get_derivative_taylor 53 13.0 0.002 0.002 1.955 1.958 jit_kernel_multiply 10 15.4 1.854 1.928 1.854 1.928 dbcsr_special_finalize 5898 15.4 0.034 0.034 1.840 1.849 qs_ot_p2m_diag 44 11.0 0.072 0.073 1.837 1.838 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64_nonortho", label="H2O-64_nonortho", y=90.742, yerr=0.0 Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=64.834, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.17, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=6.927, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.297, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=3.759, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.755, yerr=0.0 Running w64PBE.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/w64PBE_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.043 0.043 239.741 239.741 qs_mol_dyn_low 1 2.0 0.004 0.004 239.048 239.051 qs_forces 11 3.9 0.002 0.002 238.998 238.998 qs_energies 11 4.9 0.001 0.001 208.290 208.291 velocity_verlet 10 3.0 0.001 0.002 188.637 188.655 scf_env_do_scf 11 5.9 0.001 0.002 188.128 188.129 scf_env_do_scf_inner_loop 106 6.8 0.006 0.009 164.416 164.416 rebuild_ks_matrix 117 8.5 0.001 0.001 124.243 124.251 qs_ks_build_kohn_sham_matrix 117 9.5 0.020 0.020 124.242 124.250 qs_ks_update_qs_env 120 7.8 0.001 0.001 110.423 110.429 fft_wrap_pw1pw2 2000 12.9 0.046 0.047 69.184 69.203 fft_wrap_pw1pw2_200 1298 14.3 0.008 0.009 65.608 65.623 qs_vxc_create 117 10.5 0.004 0.004 65.081 65.125 xc_vxc_pw_create 117 11.5 1.433 1.433 65.077 65.121 qs_rho_update_rho_low 117 7.9 0.001 0.001 61.544 61.551 calculate_rho_elec 117 8.9 1.199 1.204 61.543 61.550 sum_up_and_integrate 117 10.5 0.003 0.003 44.875 44.926 integrate_v_rspace 117 11.5 0.213 0.216 44.693 44.745 grid_collocate_task_list 117 9.9 42.084 42.174 42.084 42.174 xc_pw_derive 702 13.5 0.010 0.010 38.450 38.497 xc_rho_set_and_dset_create 117 12.5 0.928 0.934 37.783 37.789 pw_gpu_c1dr3d_3d_ps 1053 15.2 10.564 10.651 37.064 37.076 grid_integrate_task_list 117 12.5 33.469 33.529 33.469 33.529 pw_gpu_r3dc1d_3d_ps 947 14.5 9.613 9.644 32.061 32.093 xc_pw_divergence 117 12.5 0.005 0.005 25.444 25.466 init_scf_loop 14 6.8 0.001 0.001 23.648 23.649 mp_alltoall_z22v 2000 16.9 18.244 18.442 18.244 18.442 density_rs2pw 117 9.9 0.009 0.009 18.225 18.324 dbcsr_multiply_generic 2035 12.5 0.138 0.139 17.758 17.821 xc_functional_eval 117 13.5 0.002 0.002 16.759 16.779 pbe_lda_eval 117 14.5 16.757 16.778 16.757 16.778 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 15.969 16.116 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 14.586 14.586 qs_scf_new_mos 106 7.8 0.001 0.001 13.217 13.225 qs_scf_loop_do_ot 106 8.8 0.001 0.001 13.216 13.224 x_to_yz 1053 16.2 2.554 2.562 12.223 12.343 ot_scf_mini 106 9.8 0.003 0.003 11.834 11.839 potential_pw2rs 117 12.5 0.058 0.058 11.011 11.015 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 10.823 10.823 yz_to_x 947 15.5 1.817 1.829 10.392 10.451 multiply_cannon 2035 13.5 0.291 0.293 8.876 8.878 init_scf_run 11 5.9 0.000 0.000 8.857 8.857 scf_env_initial_rho_setup 11 6.9 0.000 0.001 8.856 8.856 pw_gpu_sf 1053 16.2 8.287 8.301 8.287 8.301 build_core_ppl_forces 11 5.9 8.124 8.282 8.124 8.282 prepare_preconditioner 14 7.8 0.000 0.000 8.265 8.266 make_preconditioner 14 8.8 0.000 0.000 8.265 8.265 multiply_cannon_loop 2035 14.5 0.237 0.241 7.882 7.898 pw_gpu_fg 947 15.5 7.441 7.458 7.441 7.458 make_m2s 4070 13.5 0.044 0.044 7.415 7.431 make_images 4070 14.5 0.973 0.974 7.240 7.253 ot_mini 106 10.8 0.001 0.001 7.225 7.232 wfi_extrapolate 11 7.9 0.001 0.002 6.967 6.967 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 6.834 6.872 pw_gpu_ffc 1053 16.2 5.972 6.003 5.972 6.003 build_kinetic_matrix_low 22 6.9 4.932 4.946 5.015 5.026 build_overlap_matrix_low 22 6.9 4.842 4.876 4.915 4.950 pw_poisson_solve 117 10.5 0.003 0.003 4.737 4.745 pw_gpu_cff 947 15.5 4.554 4.574 4.554 4.574 qs_ot_get_derivative 106 11.8 0.002 0.002 4.448 4.455 transfer_rs2pw 479 10.8 0.009 0.009 4.187 4.318 make_full_single_inverse 14 9.8 0.002 0.002 4.131 4.132 multiply_cannon_multrec 4070 15.5 1.729 1.740 4.114 4.116 pw_derive 1053 13.8 4.098 4.104 4.098 4.104 make_images_data 4070 15.5 0.052 0.053 3.899 3.905 hybrid_alltoall_any 4213 16.4 2.741 2.745 3.887 3.890 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.645 3.673 transfer_rs2pw_200 128 11.7 2.535 2.577 3.463 3.607 make_full_inverse_cholesky 14 9.8 0.000 0.000 3.377 3.525 mp_waitall_1 57459 16.9 3.283 3.324 3.283 3.324 build_core_ppl 11 7.9 3.144 3.188 3.144 3.188 transfer_pw2rs 479 13.4 0.006 0.006 3.068 3.073 ot_diis_step 106 11.8 0.005 0.006 2.757 2.757 pw_copy 1755 13.0 2.740 2.749 2.740 2.749 arnoldi_generalized_ev 14 10.8 0.000 0.000 2.600 2.602 fft_wrap_pw1pw2_70 234 13.2 0.002 0.002 2.590 2.593 dbcsr_sym_matrix_vector_mult 1269 12.5 0.035 0.035 2.563 2.564 transfer_pw2rs_200 128 14.1 1.602 1.614 2.462 2.470 gev_build_subspace 23 11.5 0.010 0.010 2.398 2.398 qs_create_task_list 11 7.9 0.000 0.000 2.369 2.380 generate_qs_task_list 11 8.9 1.308 1.309 2.368 2.380 apply_preconditioner_dbcsr 120 12.8 0.000 0.000 2.355 2.365 apply_single 120 13.8 0.001 0.001 2.355 2.365 dbcsr_complete_redistribute 323 11.8 0.878 0.910 2.156 2.326 dbcsr_sym_matrix_vector_mult_l 1269 13.5 2.219 2.244 2.225 2.250 pw_poisson_set 118 11.5 0.005 0.005 2.168 2.177 dbcsr_mm_accdrv_process 9388 16.2 0.591 0.592 2.132 2.140 calculate_dm_sparse 117 9.7 0.001 0.001 2.051 2.053 qs_ot_get_derivative_taylor 89 12.9 0.003 0.003 1.981 1.985 cp_dbcsr_sm_fm_multiply 46 9.3 0.002 0.002 1.880 1.882 multiply_cannon_sync_h2d 4070 15.5 1.786 1.847 1.786 1.847 pw_integral_ab_c1d_c1d_gs 117 11.5 1.771 1.776 1.797 1.802 qs_ot_get_p 120 10.5 0.001 0.001 1.632 1.636 pw_axpy 1170 12.0 1.579 1.580 1.579 1.580 copy_dbcsr_to_fm 143 10.8 0.004 0.004 1.454 1.487 dbcsr_special_finalize 6105 15.5 0.034 0.035 1.458 1.470 copy_fm_to_dbcsr 180 10.8 0.002 0.002 1.306 1.454 cp_dbcsr_sm_fm_multiply_core 46 10.3 0.000 0.000 1.398 1.402 dbcsr_merge_single_wm 4070 16.5 0.130 0.131 1.347 1.357 cp_fm_cholesky_invert 14 10.8 1.289 1.289 1.289 1.289 calculate_rho_core 11 7.9 0.161 0.162 1.224 1.261 multiply_cannon_metrocomm1 4070 15.5 0.012 0.012 1.191 1.248 dbcsr_dot 1125 12.2 1.145 1.159 1.223 1.239 mp_sendrecv_dv 479 12.8 1.150 1.238 1.150 1.238 calculate_first_density_matrix 1 7.0 0.000 0.000 1.098 1.098 jit_kernel_multiply 12 15.0 1.046 1.046 1.046 1.046 transfer_dbcsr_to_fm 14 10.8 0.001 0.001 0.969 0.999 dbcsr_sort_data 4070 17.5 0.940 0.948 0.940 0.948 dbcsr_finalize 4628 13.9 0.060 0.060 0.897 0.925 transfer_fm_to_dbcsr 14 9.8 0.000 0.000 0.757 0.907 cp_dbcsr_plus_fm_fm_t 22 8.9 0.001 0.001 0.898 0.898 dbcsr_merge_all 4098 15.1 0.178 0.180 0.790 0.816 qs_ot_get_orbitals 106 10.8 0.001 0.001 0.798 0.799 dbcsr_copy 7812 13.3 0.201 0.203 0.782 0.791 mp_alltoall_d11v 1899 13.8 0.770 0.785 0.770 0.785 evaluate_core_matrix_traces 117 8.5 0.001 0.001 0.764 0.766 calculate_ptrace_kp 234 9.5 0.001 0.001 0.763 0.765 qs_ot_p2m_diag 19 11.0 0.033 0.034 0.763 0.764 build_core_ppnl_forces 11 5.9 0.755 0.763 0.755 0.763 grid_create_task_list 11 9.9 0.739 0.746 0.739 0.746 cp_fm_cholesky_decompose 28 10.5 0.667 0.697 0.667 0.697 fft_wrap_pw1pw2_30 234 13.2 0.001 0.001 0.685 0.687 cp_dbcsr_syevd 19 12.0 0.002 0.002 0.647 0.647 make_images_pack 4070 15.5 0.623 0.632 0.636 0.646 cp_fm_uplo_to_full 47 13.4 0.480 0.633 0.480 0.633 cp_fm_diag_elpa 19 13.0 0.000 0.000 0.614 0.614 cp_fm_diag_elpa_base 19 14.0 0.604 0.607 0.614 0.614 qs_init_subsys 1 2.0 0.001 0.001 0.609 0.609 qs_env_setup 1 3.0 0.000 0.000 0.602 0.603 qs_env_rebuild_pw_env 23 5.3 0.000 0.000 0.602 0.602 pw_env_rebuild 1 5.0 0.000 0.000 0.601 0.602 pw_grid_setup 4 6.0 0.000 0.000 0.577 0.578 transfer_rs2pw_70 117 11.9 0.388 0.389 0.562 0.573 pw_grid_setup_internal 4 7.0 0.006 0.007 0.567 0.567 pw_zero 585 13.0 0.543 0.548 0.543 0.548 make_basis_sm 14 9.3 0.001 0.001 0.545 0.546 qs_ot_get_derivative_diag 17 12.0 0.001 0.001 0.536 0.538 dbcsr_copy_into_existing 22 7.9 0.526 0.532 0.526 0.532 acc_transpose_blocks 4070 15.5 0.023 0.023 0.503 0.509 dbcsr_mm_accdrv_process_sort 9388 17.2 0.495 0.505 0.495 0.505 mp_sum_d 3821 11.6 0.415 0.493 0.415 0.493 transfer_pw2rs_70 117 14.5 0.309 0.309 0.469 0.472 pw_grid_sort 4 8.0 0.337 0.338 0.456 0.457 dbcsr_sort_indices 10929 16.5 0.428 0.433 0.428 0.433 parallel_gemm_fm_cosma 96 8.9 0.411 0.414 0.411 0.414 compute_matrix_w 11 5.9 0.000 0.000 0.396 0.398 calculate_w_matrix_ot 11 6.9 0.003 0.003 0.396 0.398 reorthogonalize_vectors 10 9.0 0.000 0.000 0.387 0.388 dbcsr_data_copy_aa2 2343 15.5 0.377 0.385 0.377 0.385 ot_scf_init 14 7.8 0.002 0.002 0.372 0.374 mp_sum_l 6134 13.5 0.318 0.347 0.318 0.347 dbcsr_desymmetrize_deep 143 11.8 0.093 0.095 0.332 0.337 cp_dbcsr_alloc_block_from_nbl 88 7.7 0.207 0.211 0.325 0.331 mp_alltoall_i22 633 13.6 0.198 0.330 0.198 0.330 calculate_ecore_overlap 22 5.9 0.001 0.002 0.174 0.330 build_qs_neighbor_lists 11 6.9 0.001 0.001 0.317 0.319 dbcsr_add_d 1795 13.1 0.003 0.003 0.314 0.319 dbcsr_add_anytype 1795 14.1 0.168 0.172 0.311 0.316 integrate_v_core_rspace 11 7.9 0.070 0.071 0.306 0.312 distribute_tasks 11 9.9 0.304 0.305 0.304 0.305 pw_scale 468 12.0 0.294 0.299 0.294 0.299 setup_rec_index_2d 4070 14.5 0.280 0.286 0.280 0.286 fft_wrap_pw1pw2_10 234 13.2 0.001 0.001 0.255 0.259 pw_multiply_with 117 11.5 0.257 0.259 0.257 0.259 multiply_cannon_multrec_finali 2035 16.5 0.005 0.005 0.254 0.258 dbcsr_mm_multrec_finalize 2035 17.5 0.021 0.022 0.249 0.253 dbcsr_make_untransposed_blocks 2481 13.4 0.234 0.238 0.245 0.250 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="w64PBE", label="w64PBE", y=239.741, yerr=0.0 Plot: name="w64PBE_timings_6cpu_1gpu", title="Timings of w64PBE with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="rest", label="rest", y=118.623, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=42.084, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=33.469, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=18.244, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="pbe_lda_eval", label="pbe_lda_eval", y=16.757, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.564, yerr=0.0 Running w64SCAN.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/w64SCAN_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.185 0.187 898.446 898.446 qs_mol_dyn_low 1 2.0 0.004 0.004 896.243 896.245 qs_forces 11 3.9 0.002 0.002 896.195 896.195 qs_energies 11 4.9 0.001 0.001 805.696 805.696 scf_env_do_scf 11 5.9 0.001 0.002 768.008 768.009 velocity_verlet 10 3.0 0.001 0.002 715.504 715.521 scf_env_do_scf_inner_loop 106 6.8 0.006 0.009 691.771 691.771 rebuild_ks_matrix 117 8.5 0.001 0.001 620.499 620.506 qs_ks_build_kohn_sham_matrix 117 9.5 0.021 0.021 620.498 620.505 qs_ks_update_qs_env 119 7.8 0.001 0.001 543.785 543.793 fft_wrap_pw1pw2 3053 12.6 0.071 0.072 425.100 425.135 fft_wrap_pw1pw2_400 1649 13.9 0.011 0.011 407.492 407.500 qs_vxc_create 117 10.5 0.004 0.004 380.661 380.666 xc_vxc_pw_create 117 11.5 4.532 4.549 380.657 380.662 xc_rho_set_and_dset_create 117 12.5 5.905 5.906 255.310 255.365 qs_rho_update_rho_low 117 7.9 0.001 0.002 223.687 223.696 calculate_rho_elec 234 8.9 6.602 6.608 223.685 223.695 pw_gpu_c1dr3d_3d_ps 1521 15.1 118.790 119.305 214.272 214.313 pw_gpu_r3dc1d_3d_ps 1532 14.1 119.672 119.693 210.737 210.812 sum_up_and_integrate 117 10.5 0.005 0.005 186.334 186.732 integrate_v_rspace 234 11.5 0.423 0.427 185.464 185.860 xc_pw_derive 702 13.5 0.011 0.012 182.388 182.431 density_rs2pw 234 9.9 0.021 0.021 162.623 163.044 xc_functional_eval 234 13.5 0.003 0.004 153.696 153.765 libxc_lda_eval 234 14.5 153.687 153.755 153.693 153.761 xc_pw_divergence 117 12.5 0.007 0.007 119.502 119.615 potential_pw2rs 234 12.5 0.287 0.291 95.717 95.861 grid_integrate_task_list 234 12.5 89.323 89.858 89.323 89.858 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 77.469 77.470 init_scf_loop 13 6.8 0.000 0.001 76.175 76.175 mp_alltoall_z22v 3053 16.6 70.888 71.631 70.888 71.631 grid_collocate_task_list 234 9.9 54.319 54.751 54.319 54.751 x_to_yz 1521 16.1 9.383 9.391 44.275 44.829 yz_to_x 1532 15.1 7.713 7.737 43.709 43.867 transfer_rs2pw 947 10.9 0.020 0.020 35.249 35.769 transfer_rs2pw_400 245 11.8 25.283 25.320 30.832 31.338 pw_gpu_sf 1521 16.1 31.130 31.162 31.130 31.162 pw_gpu_fg 1532 15.1 30.030 30.067 30.030 30.067 transfer_pw2rs 947 13.5 0.016 0.016 29.056 29.059 transfer_pw2rs_400 245 14.3 20.728 20.795 25.812 25.816 init_scf_run 11 5.9 0.000 0.000 24.214 24.214 scf_env_initial_rho_setup 11 6.9 0.000 0.001 24.213 24.213 wfi_extrapolate 11 7.9 0.002 0.002 20.729 20.729 pw_gpu_ffc 1521 16.1 20.050 20.096 20.050 20.096 dbcsr_multiply_generic 2100 12.6 0.142 0.144 18.192 18.580 pw_poisson_solve 117 10.5 0.003 0.003 17.565 17.577 pw_gpu_cff 1532 15.1 17.174 17.200 17.174 17.200 fft_wrap_pw1pw2_140 468 13.2 0.003 0.003 13.868 13.914 qs_scf_new_mos 106 7.8 0.001 0.001 13.294 13.294 qs_scf_loop_do_ot 106 8.8 0.001 0.001 13.293 13.293 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 12.872 13.021 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 12.994 12.994 pw_derive 1053 13.8 12.346 12.363 12.346 12.363 ot_scf_mini 106 9.8 0.003 0.003 11.905 11.910 pw_copy 2223 13.1 9.370 9.383 9.370 9.383 multiply_cannon 2100 13.6 0.301 0.304 8.918 8.939 mp_waitall_1 59747 17.0 8.441 8.526 8.441 8.526 pw_integral_ab_c1d_c1d_gs 117 11.5 7.958 7.994 8.277 8.284 prepare_preconditioner 13 7.8 0.000 0.000 7.974 7.979 make_preconditioner 13 8.8 0.000 0.000 7.973 7.979 multiply_cannon_loop 2100 14.6 0.243 0.243 7.928 7.965 make_m2s 4200 13.6 0.043 0.043 7.349 7.365 ot_mini 106 10.8 0.001 0.001 7.226 7.230 mp_sendrecv_dv 947 12.9 6.738 7.216 6.738 7.216 make_images 4200 14.6 0.978 0.985 7.174 7.191 qs_env_update_s_mstruct 11 6.9 0.000 0.000 6.854 6.889 pw_poisson_set 118 11.5 0.006 0.006 6.852 6.864 build_core_ppl_forces 11 5.9 5.993 6.163 5.993 6.163 pw_axpy 1638 11.7 6.026 6.045 6.026 6.045 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 5.796 5.871 calculate_rho_core 11 7.9 0.438 0.439 4.914 4.986 qs_ot_get_derivative 106 11.8 0.002 0.002 4.485 4.491 build_overlap_matrix_low 22 6.9 4.320 4.338 4.392 4.408 build_kinetic_matrix_low 22 6.9 4.280 4.300 4.360 4.379 multiply_cannon_multrec 4200 15.6 1.754 1.760 4.157 4.164 hybrid_alltoall_any 4338 16.5 2.693 2.701 3.881 3.893 make_images_data 4200 15.6 0.052 0.053 3.850 3.863 make_full_single_inverse 13 9.8 0.002 0.002 3.809 3.809 transfer_rs2pw_140 234 11.9 2.754 2.756 3.713 3.732 make_full_inverse_cholesky 13 9.8 0.000 0.000 3.416 3.557 fft_wrap_pw1pw2_50 468 13.2 0.003 0.003 2.802 2.816 ot_diis_step 106 11.8 0.005 0.005 2.720 2.720 transfer_pw2rs_140 234 14.5 1.691 1.698 2.614 2.621 build_core_ppl 11 7.9 2.483 2.553 2.483 2.553 arnoldi_generalized_ev 13 10.8 0.000 0.000 2.394 2.394 dbcsr_sym_matrix_vector_mult 1206 12.5 0.033 0.033 2.359 2.361 dbcsr_complete_redistribute 312 11.8 0.950 0.955 2.200 2.356 apply_preconditioner_dbcsr 119 12.8 0.000 0.000 2.309 2.318 apply_single 119 13.8 0.001 0.001 2.309 2.317 gev_build_subspace 22 11.5 0.009 0.009 2.209 2.209 pw_zero 702 12.6 2.207 2.209 2.207 2.209 dbcsr_mm_accdrv_process 9484 16.3 0.798 0.996 2.148 2.158 calculate_dm_sparse 117 9.7 0.001 0.001 2.068 2.074 dbcsr_sym_matrix_vector_mult_l 1206 13.5 2.049 2.064 2.055 2.070 qs_ot_get_derivative_taylor 89 12.9 0.004 0.004 2.060 2.063 qs_init_subsys 1 2.0 0.001 0.001 1.938 1.938 qs_env_setup 1 3.0 0.000 0.000 1.930 1.931 qs_env_rebuild_pw_env 23 5.3 0.000 0.000 1.930 1.931 pw_env_rebuild 1 5.0 0.000 0.000 1.930 1.931 pw_grid_setup 4 6.0 0.000 0.000 1.867 1.868 cp_dbcsr_sm_fm_multiply 45 9.4 0.002 0.002 1.849 1.849 pw_grid_setup_internal 4 7.0 0.019 0.019 1.835 1.836 multiply_cannon_sync_h2d 4200 15.6 1.761 1.821 1.761 1.821 qs_create_task_list 11 7.9 0.000 0.000 1.703 1.743 generate_qs_task_list 11 8.9 0.878 0.886 1.703 1.743 qs_ot_get_p 119 10.6 0.001 0.001 1.680 1.689 copy_dbcsr_to_fm 138 10.8 0.004 0.004 1.636 1.656 pw_grid_sort 4 8.0 1.117 1.120 1.512 1.516 dbcsr_special_finalize 6300 15.6 0.034 0.034 1.447 1.453 mp_sum_d 3885 11.5 1.111 1.431 1.111 1.431 copy_fm_to_dbcsr 174 10.8 0.002 0.002 1.283 1.423 cp_dbcsr_sm_fm_multiply_core 45 10.4 0.000 0.000 1.385 1.385 dbcsr_merge_single_wm 4200 16.6 0.130 0.133 1.335 1.342 integrate_v_core_rspace 11 7.9 0.151 0.152 1.317 1.318 multiply_cannon_metrocomm1 4200 15.6 0.012 0.012 1.209 1.236 dbcsr_dot 1134 12.2 1.139 1.142 1.206 1.211 mp_sum_l 6329 13.5 0.778 1.196 0.778 1.196 cp_fm_cholesky_invert 13 10.8 1.187 1.187 1.187 1.187 transfer_dbcsr_to_fm 13 10.8 0.001 0.001 1.172 1.187 calculate_first_density_matrix 1 7.0 0.000 0.000 1.105 1.105 pw_scale 585 11.9 1.092 1.092 1.092 1.092 jit_kernel_multiply 12 15.0 0.847 1.038 0.847 1.038 dbcsr_sort_data 4200 17.6 0.930 0.933 0.930 0.933 dbcsr_finalize 4788 14.0 0.060 0.061 0.919 0.932 cp_dbcsr_plus_fm_fm_t 22 8.9 0.001 0.001 0.910 0.912 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="w64SCAN", label="w64SCAN", y=898.446, yerr=0.0 Plot: name="w64SCAN_timings_6cpu_1gpu", title="Timings of w64SCAN with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="rest", label="rest", y=346.086, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="libxc_lda_eval", label="libxc_lda_eval", y=153.687, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=119.672, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=118.79, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=89.323, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=70.888, yerr=0.0 Running GW_PBE_4benzene.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.018 0.020 107.080 107.080 qs_energies 1 2.0 0.000 0.000 106.776 106.779 mp2_main 1 3.0 0.000 0.000 100.474 100.477 mp2_gpw_main 1 4.0 0.000 0.000 98.874 98.876 rpa_ri_compute_en 1 5.0 0.000 0.000 91.902 91.904 rpa_num_int 1 6.0 0.001 0.001 91.891 91.896 dbt_total 2336 9.6 0.021 0.021 73.459 73.460 compute_mat_P_omega 1 7.0 0.001 0.002 69.834 69.836 compute_mat_P_omega_contract 10 8.0 5.220 5.263 69.516 69.524 dbt_contract 787 11.0 0.049 0.050 48.032 48.033 dbt_tas_total 1149 12.2 0.144 0.144 36.989 36.989 dbt_tas_multiply 807 12.1 0.003 0.004 36.304 36.304 dbt_tas_dbm 807 14.1 0.006 0.006 27.885 27.885 dbm_multiply 807 16.1 26.403 26.406 26.403 26.406 dbt_copy 1107 10.7 0.069 0.069 26.187 26.201 compute_mat_P_omega_calc_M_occ 250 9.0 5.213 5.218 24.481 24.481 dbt_tas_mm_1N 524 15.1 0.003 0.003 17.935 18.040 dbt_reshape 594 11.8 7.290 7.331 17.564 17.597 compute_QP_energies 1 7.0 0.000 0.000 15.526 15.526 compute_self_energy_cubic_gw 1 8.0 0.114 0.115 15.526 15.526 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 15.274 15.274 dbt_tas_reserve_blocks_index 3266 14.3 0.640 0.649 10.979 10.980 dbm_reserve_blocks 3634 15.3 10.645 10.650 10.645 10.650 dbt_reserve_blocks_index 2347 13.0 0.307 0.311 9.148 9.198 dbt_crop 1042 12.0 6.818 6.885 9.073 9.137 dbt_reserve_blocks_index_array 2289 12.1 0.012 0.013 8.950 8.978 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 8.766 8.766 mp_waitall_2 2656 15.9 8.481 8.484 8.481 8.484 dbt_communicate_buffer 594 12.8 0.012 0.012 7.660 7.664 dbt_tas_mm_2 251 15.0 0.003 0.003 7.430 7.430 contract_cubic_gw 21 9.0 0.000 0.000 7.228 7.228 mp2_ri_gpw_compute_in 1 5.0 0.001 0.002 6.960 6.960 scf_env_do_scf 1 3.0 0.000 0.000 5.791 5.791 scf_env_do_scf_inner_loop 17 4.0 0.001 0.001 5.791 5.791 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 5.567 5.581 compute_mat_P_omega_copy_M_occ 250 9.0 0.002 0.002 5.468 5.470 dbcsr_multiply_generic 30 8.1 0.003 0.003 4.498 4.551 dbt_tas_copy 511 11.5 2.536 2.554 4.478 4.544 multiply_cannon 30 9.1 0.011 0.013 4.301 4.353 multiply_cannon_loop 30 10.1 0.004 0.005 4.238 4.290 multiply_cannon_multrec 60 11.1 0.276 0.282 3.658 3.675 trace_sigma_gw 21 9.0 0.534 0.626 3.492 3.492 mp_sync 8688 11.6 2.905 3.128 2.905 3.128 dbcsr_mm_accdrv_process 328 12.3 0.363 0.705 3.111 3.115 jit_kernel_multiply 17 11.4 2.741 3.079 2.741 3.079 qs_scf_new_mos 17 5.0 0.000 0.000 2.930 2.962 dbt_split_copyback 70 10.6 1.124 1.128 2.658 2.679 get_2c_integrals 1 6.0 0.000 0.000 2.645 2.645 convert_to_new_pgrid 2421 14.1 0.036 0.037 2.504 2.535 dbm_copy 1614 15.1 2.468 2.498 2.468 2.498 mp2_ri_gpw_compute_in_copy_3c 6 6.0 0.239 0.240 2.318 2.475 fft_wrap_pw1pw2 301 10.2 0.005 0.005 2.359 2.360 qs_ks_build_kohn_sham_matrix 18 6.9 0.002 0.002 2.355 2.356 qs_ks_update_qs_env 17 5.0 0.000 0.000 2.325 2.326 rebuild_ks_matrix 17 6.0 0.000 0.000 2.319 2.319 build_3c_integrals 5 6.0 1.378 1.403 1.988 2.145 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="GW_PBE_4benzene", label="GW_PBE_4benzene", y=107.08, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_6cpu_1gpu", title="Timings of GW_PBE_4benzene with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="rest", label="rest", y=47.443, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=26.403, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=10.645, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=8.481, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=7.29, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_crop", label="dbt_crop", y=6.818, yerr=0.0 Running RI-HFX_H2O-32.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.025 0.027 206.608 206.609 qs_forces 1 2.0 0.000 0.000 206.162 206.162 rebuild_ks_matrix 7 6.6 0.000 0.000 202.235 202.235 qs_ks_build_kohn_sham_matrix 7 7.6 0.002 0.002 202.235 202.235 hfx_ks_matrix 7 8.6 0.000 0.000 198.171 198.175 dbt_total 849 11.0 0.009 0.009 149.025 149.025 hfx_ri_update_ks 7 9.6 0.000 0.000 111.500 111.500 hfx_ri_update_ks_Pmat 7 10.6 21.569 21.590 111.495 111.495 qs_energies 1 3.0 0.000 0.000 106.888 106.888 scf_env_do_scf 1 4.0 0.000 0.000 105.028 105.028 qs_ks_update_qs_env 8 6.0 0.000 0.000 103.005 103.005 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 99.237 99.237 hfx_ri_update_forces 1 7.0 1.062 1.086 86.669 86.672 dbt_contract 207 12.4 0.053 0.055 85.979 85.980 dbt_tas_total 369 13.4 0.082 0.083 70.435 70.435 dbt_tas_multiply 216 13.5 0.001 0.001 67.450 67.450 dbt_copy 423 11.8 0.045 0.046 58.138 58.465 init_scf_loop 2 5.0 0.000 0.000 52.701 52.701 hfx_ri_forces_Pmat_3c 1 8.0 3.729 3.737 52.473 52.565 scf_env_do_scf_inner_loop 6 5.0 0.000 0.001 52.326 52.326 dbt_tas_dbm 216 15.5 0.002 0.002 52.238 52.239 dbm_multiply 216 17.5 48.842 49.064 48.842 49.064 dbt_reshape 175 13.2 19.971 20.121 44.543 44.943 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 31.371 31.371 precalc_derivatives 1 8.0 1.779 1.797 27.688 27.688 mp_waitall_2 1022 16.5 23.046 23.118 23.046 23.118 dbt_tas_mm_2 91 16.5 0.001 0.001 21.421 21.421 dbt_tas_reserve_blocks_index 1323 15.4 1.654 1.655 18.862 18.980 dbt_communicate_buffer 175 14.2 0.005 0.005 18.760 18.788 dbt_crop 372 13.7 14.243 14.303 18.488 18.570 dbm_reserve_blocks 1491 16.3 17.867 17.977 17.867 17.977 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 17.678 17.678 dbt_tas_mm_3T 77 17.1 0.000 0.001 16.373 16.665 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 16.484 16.485 build_3c_derivatives 3 9.0 2.434 2.543 15.474 15.482 dbt_reserve_blocks_index 889 14.5 0.607 0.614 15.269 15.297 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 15.208 15.208 dbt_reserve_blocks_index_array 859 13.5 0.008 0.008 14.989 15.008 dbt_tas_mm_3N 37 15.4 0.000 0.000 11.679 11.762 dbt_tas_copy 248 12.5 4.436 4.478 8.373 8.468 mp_sync 2901 12.8 7.291 7.504 7.291 7.504 dbt_tas_replicate 168 15.1 2.513 2.551 5.633 5.679 hfx_ri_pre_scf_Pmat_copy_2 9 13.0 2.002 2.020 5.403 5.421 hfx_ri_pre_scf_Pmat_int 1 13.0 0.000 0.000 5.236 5.236 hfx_ri_pre_scf_calc_tensors 1 14.0 0.003 0.003 4.497 4.500 dbt_tas_communicate_buffer 336 16.2 0.005 0.005 4.305 4.348 hfx_ri_pre_scf_Pmat_RIx3C 9 13.0 0.000 0.000 4.155 4.221 dbt_tas_reserve_blocks_templat 266 13.6 0.102 0.102 4.004 4.143 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-HFX_H2O-32", label="RI-HFX_H2O-32", y=206.608, yerr=0.0 Plot: name="RI-HFX_H2O-32_timings_6cpu_1gpu", title="Timings of RI-HFX_H2O-32 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="rest", label="rest", y=75.31300000000002, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=48.842, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=23.046, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=21.569, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=19.971, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=17.867, yerr=0.0 Running RI-MP2_ammonia.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.010 0.011 100.358 100.358 qs_energies 1 2.0 0.000 0.000 100.165 100.165 mp2_main 1 3.0 0.000 0.000 93.329 93.329 mp2_gpw_main 1 4.0 0.001 0.001 92.942 92.942 mp2_ri_gpw_compute_in 1 5.0 0.542 0.543 49.991 50.029 mp2_ri_gpw_compute_en 1 5.0 0.100 0.102 42.890 42.927 mp2_ri_gpw_compute_in_loop 1 6.0 0.013 0.014 41.854 41.891 mp2_ri_gpw_compute_en_RI_loop 1 6.0 12.762 12.783 40.273 40.275 dbcsr_multiply_generic 2666 8.0 0.157 0.158 21.662 21.837 ao_to_mo_and_store_B_mult_1 1328 7.0 0.014 0.015 20.740 20.914 mp2_ri_gpw_compute_en_expansio 1040 7.0 0.710 0.713 15.984 15.985 mp2_eri_3c_integrate_gpw 1328 7.0 0.018 0.018 15.644 15.813 local_gemm 1040 8.0 15.274 15.276 15.274 15.276 make_m2s 5332 9.0 0.054 0.055 12.353 12.426 make_images 5332 10.0 2.173 2.181 12.176 12.247 multiply_cannon 2666 9.0 0.396 0.399 8.650 8.903 hybrid_alltoall_any 6683 11.6 8.192 8.260 8.466 8.532 make_images_data 5332 11.0 0.066 0.067 8.371 8.442 fft_wrap_pw1pw2 26668 10.4 0.138 0.140 7.772 8.007 multiply_cannon_loop 2666 10.0 0.193 0.194 7.562 7.811 integrate_v_rspace 1338 8.0 1.045 1.052 7.719 7.760 get_2c_integrals 1 6.0 0.004 0.004 7.593 7.595 compute_2c_integrals 1 7.0 0.006 0.007 7.068 7.068 collocate_function 1328 8.0 4.771 4.856 6.795 6.916 compute_2c_integrals_loop_lm 1 8.0 0.015 0.024 6.888 6.900 mp2_eri_2c_integrate_gpw 1 9.0 1.943 1.947 6.873 6.876 scf_env_do_scf 1 3.0 0.000 0.000 6.014 6.015 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 6.014 6.015 mp2_ri_gpw_compute_en_comm 221 7.0 1.012 1.014 5.554 5.555 grid_integrate_task_list 1338 9.0 5.409 5.443 5.409 5.443 ao_to_mo_and_store_B_E_Ex_1 1328 7.0 3.458 3.505 5.242 5.279 mp2_ri_gpw_compute_en_ener 1040 7.0 4.817 4.838 4.817 4.838 fft_wrap_pw1pw2_20 10647 11.4 0.021 0.021 4.396 4.615 qs_scf_new_mos 10 5.0 0.000 0.000 4.358 4.360 multiply_cannon_multrec 2676 11.0 1.944 2.165 3.993 4.231 pw_gpu_r3dc1d_3d 13282 12.2 3.901 4.108 3.901 4.108 mp_sendrecv_dm3 442 8.0 3.536 3.541 3.536 3.541 eigensolver 11 5.8 0.001 0.001 3.048 3.050 potential_pw2rs 2666 10.0 0.099 0.099 2.736 2.747 pw_gpu_c1dr3d_3d 13280 12.7 2.697 2.721 2.697 2.721 fft_wrap_pw1pw2_10 15957 11.5 0.020 0.020 2.474 2.485 cp_fm_diag_elpa 11 6.8 0.000 0.000 2.408 2.408 cp_fm_diag_elpa_base 11 7.8 2.326 2.344 2.407 2.407 collocate_single_gaussian 1328 10.0 0.090 0.091 2.378 2.381 mp2_eri_2c_integrate_gpw_pot_l 1328 10.0 0.004 0.004 2.236 2.246 replicate_iaK_2intgroup 1 6.0 2.067 2.070 2.208 2.210 copy_dbcsr_to_fm 1351 8.0 0.033 0.034 2.203 2.204 fill_local_i_aL 884 7.5 2.162 2.168 2.162 2.168 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-MP2_ammonia", label="RI-MP2_ammonia", y=100.358, yerr=0.0 Plot: name="RI-MP2_ammonia_timings_6cpu_1gpu", title="Timings of RI-MP2_ammonia with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="rest", label="rest", y=53.904, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="local_gemm", label="local_gemm", y=15.274, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=12.762, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=8.192, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=5.409, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_ener", label="mp2_ri_gpw_compute_en_ener", y=4.817, yerr=0.0 Running diag_cu144_broy.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/diag_cu144_broy_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.077 0.077 200.949 200.949 qs_energies 1 2.0 0.000 0.000 199.897 199.899 scf_env_do_scf 1 3.0 0.000 0.000 187.212 187.212 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 187.212 187.212 qs_ks_update_qs_env 15 5.0 0.000 0.000 103.013 103.026 rebuild_ks_matrix 15 6.0 0.000 0.000 102.810 102.822 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 102.810 102.822 qs_vxc_create 15 8.0 0.101 0.104 60.031 60.034 qs_scf_new_mos 15 5.0 0.000 0.000 52.994 53.009 fft_wrap_pw1pw2 1086 10.0 0.027 0.027 52.482 52.666 calculate_dispersion_nonloc 15 9.0 10.895 11.031 51.426 51.426 eigensolver 15 6.0 0.002 0.002 43.389 43.491 sum_up_and_integrate 15 8.0 0.000 0.000 41.244 41.256 integrate_v_rspace 15 9.0 0.047 0.048 41.217 41.230 grid_integrate_task_list 15 10.0 34.115 34.126 34.115 34.126 qs_rho_update_rho_low 16 5.0 0.000 0.000 28.713 28.713 calculate_rho_elec 16 6.0 0.176 0.176 28.713 28.713 fft_wrap_pw1pw2_150 765 11.0 0.005 0.005 27.362 27.580 pw_gpu_c1dr3d_3d_ps 585 12.1 5.570 5.610 27.097 27.110 cp_fm_diag_elpa 15 7.0 0.000 0.000 26.461 26.467 cp_fm_diag_elpa_base 15 8.0 24.654 25.236 26.456 26.457 pw_gpu_r3dc1d_3d_ps 501 11.9 5.374 5.513 25.351 25.548 grid_collocate_task_list 16 7.0 17.379 17.381 17.379 17.381 cp_fm_cholesky_restore 45 7.0 15.020 15.757 15.020 15.757 fft_wrap_pw1pw2_200 197 11.3 0.001 0.001 13.390 13.417 density_rs2pw 16 7.0 0.001 0.002 11.144 11.154 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 9.427 9.427 mp_alltoall_z22v 1086 14.0 9.275 9.343 9.275 9.343 vdW_energy 15 10.0 9.286 9.288 9.286 9.288 pw_gpu_ffc 585 13.1 8.704 8.720 8.704 8.720 xc_vxc_pw_create 15 9.0 0.182 0.185 8.504 8.510 pw_gpu_cff 501 12.9 8.272 8.305 8.272 8.305 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 8.167 8.223 potential_pw2rs 15 10.0 0.007 0.007 7.056 7.056 pw_gpu_sf 585 13.1 6.854 6.864 6.854 6.864 copy_dbcsr_to_fm 16 5.9 0.001 0.001 6.660 6.664 pw_gpu_fg 501 12.9 6.413 6.426 6.413 6.426 x_to_yz 585 13.1 1.029 1.035 5.936 5.963 dbcsr_complete_redistribute 46 8.3 1.622 1.628 5.609 5.705 yz_to_x 501 12.9 0.869 0.873 5.236 5.340 fft_wrap_pw1pw2_10 62 10.5 0.000 0.000 5.245 5.247 xc_pw_derive 90 11.0 0.001 0.001 5.041 5.075 cp_fm_uplo_to_full 30 8.0 3.707 4.919 3.707 4.919 xc_rho_set_and_dset_create 15 10.0 0.132 0.135 4.847 4.896 build_core_ppnl 1 5.0 4.658 4.663 4.658 4.663 gspace_mixing 14 5.0 0.126 0.126 4.078 4.078 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="diag_cu144_broy", label="diag_cu144_broy", y=200.949, yerr=0.0 Plot: name="diag_cu144_broy_timings_6cpu_1gpu", title="Timings of diag_cu144_broy with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="rest", label="rest", y=98.88600000000001, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=34.115, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=24.654, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=17.379, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=15.02, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="calculate_dispersion_nonloc", label="calculate_dispersion_nonloc", y=10.895, yerr=0.0 Running bench_dftb.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/bench_dftb_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 1.966 1.994 157.357 157.357 qs_energies 1 2.0 0.000 0.000 155.287 155.288 ls_scf 1 3.0 0.000 0.000 148.344 148.347 ls_scf_main 1 4.0 0.000 0.001 137.137 137.137 density_matrix_trs4 5 5.0 0.004 0.004 109.508 109.545 dbcsr_multiply_generic 95 6.2 0.155 0.155 94.809 94.836 multiply_cannon 95 7.2 1.925 2.005 66.316 66.419 multiply_cannon_loop 95 8.2 0.167 0.167 55.595 55.768 multiply_cannon_multrec 190 9.2 42.543 42.816 47.611 47.862 ls_scf_dm_to_ks 5 5.0 0.000 0.000 25.818 25.856 make_m2s 190 7.2 0.015 0.015 23.975 24.040 make_images 190 8.2 5.325 5.543 23.454 23.515 matrix_ls_to_qs 5 6.0 0.000 0.000 16.859 16.865 dbcsr_complete_redistribute 11 7.5 10.119 10.138 14.473 14.496 matrix_decluster 5 7.0 0.000 0.000 13.201 13.220 arnoldi_extremal 6 6.2 0.000 0.000 11.242 11.242 arnoldi_normal_ev 6 7.2 0.005 0.005 11.242 11.242 build_subspace 12 8.2 0.031 0.031 11.028 11.029 qs_ks_update_qs_env 6 6.2 0.000 0.000 10.898 10.931 rebuild_ks_matrix 6 7.2 0.000 0.000 10.559 10.561 build_dftb_ks_matrix 6 8.2 0.001 0.001 10.559 10.561 build_dftb_coulomb 6 9.2 0.770 0.773 10.263 10.264 dbcsr_matrix_vector_mult 310 9.0 0.071 0.072 9.953 10.074 make_images_data 190 9.2 0.006 0.006 9.815 9.971 dbcsr_matrix_vector_mult_local 310 10.0 9.480 9.602 9.484 9.606 hybrid_alltoall_any 201 10.0 6.560 6.743 9.449 9.603 ls_scf_init_scf 1 4.0 0.000 0.000 9.517 9.518 tb_ewald_overlap 6 10.2 9.086 9.262 9.086 9.262 ls_scf_init_matrix_S 1 5.0 0.000 0.000 7.535 7.545 calculate_norms 380 9.2 7.410 7.422 7.410 7.422 dbcsr_finalize 277 7.6 0.103 0.105 7.351 7.411 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 6.882 6.884 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.000 6.852 6.853 dbcsr_merge_all 247 8.6 1.347 1.376 6.748 6.790 build_qs_neighbor_lists 1 4.0 0.000 0.000 6.283 6.363 build_neighbor_lists_sab_tbe 1 5.0 6.093 6.173 6.093 6.173 setup_rec_index_2d 190 8.2 4.665 4.692 4.665 4.692 dbcsr_special_finalize 285 9.2 0.005 0.005 4.676 4.680 dbcsr_copy 443 8.0 0.956 0.958 4.625 4.637 dbcsr_sort_indices 643 10.1 4.379 4.384 4.379 4.384 dbcsr_add_d 130 6.0 0.001 0.001 4.166 4.198 dbcsr_add_anytype 130 7.0 1.799 1.801 4.165 4.198 dbcsr_dot 66 6.3 3.749 3.752 4.021 4.123 dbcsr_mm_accdrv_process 8119 10.0 0.464 0.473 3.991 4.010 dbcsr_data_new 3509 9.3 3.945 3.983 3.945 3.983 dbcsr_copy_into_existing 5 8.0 3.658 3.671 3.658 3.671 mp_waitall_1 2666 10.6 3.230 3.631 3.230 3.631 dbcsr_mm_accdrv_process_sort 8119 11.0 3.527 3.537 3.527 3.537 tree_to_linear_d 11 10.5 3.499 3.503 3.499 3.503 dbcsr_mm_multrec_init 95 8.2 0.000 0.000 3.150 3.194 dbcsr_mm_csr_init 95 9.2 0.005 0.005 3.150 3.193 dbcsr_mm_sched_init 95 10.2 0.000 0.000 3.122 3.166 dbcsr_mm_accdrv_init 95 11.2 0.361 0.446 3.122 3.166 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="bench_dftb", label="bench_dftb", y=157.357, yerr=0.0 Plot: name="bench_dftb_timings_6cpu_1gpu", title="Timings of bench_dftb with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="rest", label="rest", y=78.719, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=42.543, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=10.119, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=9.48, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="tb_ewald_overlap", label="tb_ewald_overlap", y=9.086, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="calculate_norms", label="calculate_norms", y=7.41, yerr=0.0 Running dbcsr.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/dbcsr_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.004 0.004 47.765 47.765 lib_test 1 2.0 0.000 0.000 47.758 47.760 dbcsr_run_tests 3 3.0 0.000 0.000 47.758 47.759 test_multiplies_multiproc 3 4.0 0.001 0.001 37.029 37.126 dbcsr_multiply_generic 9 5.0 0.002 0.002 28.588 28.592 multiply_cannon 9 6.0 0.101 0.180 18.683 19.199 multiply_cannon_loop 9 7.0 0.003 0.003 17.307 17.699 multiply_cannon_multrec 18 8.0 9.181 9.625 16.015 16.368 dbcsr_make_random_matrix 9 4.0 7.395 7.465 10.593 10.689 dbcsr_finalize 27 5.7 0.001 0.001 7.295 7.351 dbcsr_merge_all 18 6.5 3.616 3.640 7.185 7.237 dbcsr_mm_accdrv_process 8199 9.0 1.129 1.174 6.635 6.732 dbcsr_redistribute 9 5.0 3.476 3.527 5.864 5.881 make_m2s 18 6.0 0.001 0.001 5.054 5.054 make_images 18 7.0 0.352 0.364 5.020 5.021 dbcsr_mm_accdrv_process_sort 8199 10.0 4.561 4.580 4.561 4.580 make_images_data 18 8.0 0.001 0.001 2.985 2.996 hybrid_alltoall_any 18 9.0 2.456 2.468 2.935 2.944 mp_alltoall_d11v 27 6.0 2.098 2.120 2.098 2.120 tree_to_linear_d 9 7.0 1.841 1.856 1.841 1.856 dbcsr_data_copy_aa2 18 7.5 1.595 1.606 1.595 1.606 dbcsr_data_release 507 7.7 1.371 1.372 1.371 1.372 dbcsr_data_new 354 7.4 0.951 1.075 0.951 1.075 dbcsr_checksum 6 5.0 1.028 1.033 1.044 1.044 mp_sum_l 61 4.9 0.510 0.995 0.510 0.995 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.510 0.994 jit_kernel_multiply 5 10.0 0.946 0.978 0.946 0.978 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="dbcsr", label="dbcsr", y=47.765, yerr=0.0 Plot: name="dbcsr_timings_6cpu_1gpu", title="Timings of dbcsr with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="rest", label="rest", y=19.536, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=9.181, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=7.395, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_mm_accdrv_process_sort", label="dbcsr_mm_accdrv_process_sort", y=4.561, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.616, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_redistribute", label="dbcsr_redistribute", y=3.476, yerr=0.0 Running MQAE_single_node.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/MQAE_single_node_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.042 0.043 201.530 201.530 qs_mol_dyn_low 1 2.0 0.004 0.004 200.044 200.077 qs_forces 6 3.8 0.001 0.001 127.193 127.194 qs_energies 6 4.8 0.000 0.000 120.069 120.070 scf_env_do_scf 6 5.8 0.000 0.000 113.609 113.610 scf_env_do_scf_inner_loop 113 6.2 0.005 0.007 106.644 106.644 velocity_verlet 5 3.0 0.003 0.003 95.370 95.417 rebuild_ks_matrix 119 8.1 0.001 0.001 88.091 88.091 qs_ks_build_kohn_sham_matrix 119 9.1 0.019 0.020 88.091 88.091 qs_ks_update_qs_env 119 7.3 0.001 0.001 83.110 83.110 fft_wrap_pw1pw2 2059 12.4 0.043 0.045 69.408 69.410 fft_wrap_pw1pw2_150 1321 13.9 0.009 0.009 66.612 66.623 qs_vxc_create 119 10.1 0.004 0.004 55.636 55.636 xc_vxc_pw_create 119 11.1 1.490 1.494 55.632 55.633 xc_pw_derive 714 13.1 0.009 0.010 38.999 39.038 qmmm_el_coupling 6 3.8 0.000 0.000 38.495 38.504 qmmm_elec_with_gaussian 6 4.8 0.021 0.021 38.489 38.498 pw_gpu_c1dr3d_3d_ps 1095 14.8 10.604 10.627 37.490 37.534 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 36.860 36.900 qmmm_elec_gaussian_low_G 6 6.8 31.952 31.965 31.952 31.965 pw_gpu_r3dc1d_3d_ps 964 14.0 9.508 9.529 31.864 31.904 qmmm_forces 6 3.8 0.001 0.001 31.557 31.557 qmmm_forces_with_gaussian 6 4.8 0.022 0.022 30.666 31.192 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 29.360 29.887 xc_rho_set_and_dset_create 119 12.1 2.375 2.384 27.719 27.746 xc_pw_divergence 119 12.1 0.005 0.006 26.029 26.053 qmmm_forces_gaussian_low_G 6 6.8 24.436 24.973 24.436 24.973 qs_rho_update_rho_low 119 7.3 0.001 0.001 23.222 23.438 calculate_rho_elec 119 8.3 1.058 1.059 23.221 23.438 mp_alltoall_z22v 2059 16.4 17.310 17.340 17.310 17.340 density_rs2pw 119 9.3 0.008 0.008 17.023 17.237 sum_up_and_integrate 119 10.1 0.002 0.002 16.184 16.193 integrate_v_rspace 119 11.1 0.021 0.022 16.004 16.012 x_to_yz 1095 15.8 2.383 2.385 11.771 11.789 dbcsr_multiply_generic 2598 12.3 0.097 0.098 10.812 10.981 potential_pw2rs 119 12.1 0.034 0.034 10.283 10.283 yz_to_x 964 15.0 1.824 1.826 9.746 9.758 multiply_cannon 2598 13.3 0.217 0.217 9.210 9.484 qs_ks_ddapc 119 10.1 0.002 0.002 9.150 9.172 multiply_cannon_loop 2598 14.3 0.252 0.256 8.729 8.998 pw_gpu_sf 1095 15.8 8.718 8.726 8.718 8.726 pw_gpu_fg 964 15.0 7.648 7.691 7.648 7.691 init_scf_loop 6 6.8 0.000 0.000 6.963 6.963 qs_scf_new_mos 113 7.2 0.001 0.001 6.901 6.901 qs_scf_loop_do_ot 113 8.2 0.001 0.001 6.901 6.901 ot_scf_mini 113 9.2 0.002 0.002 6.619 6.619 multiply_cannon_multrec 5196 15.3 3.121 3.134 6.412 6.433 pw_gpu_ffc 1095 15.8 6.379 6.411 6.379 6.411 grid_integrate_task_list 119 12.1 5.699 5.709 5.699 5.709 grid_collocate_task_list 119 9.3 5.115 5.117 5.115 5.117 xc_functional_eval 238 13.1 0.003 0.003 5.081 5.091 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 5.011 5.011 qmmm_elec_gaussian_low_R 6 6.8 0.000 0.000 4.908 4.934 qmmm_forces_gaussian_low_R 6 6.8 0.000 0.000 4.924 4.934 qmmm_elec_with_gaussian_LG 6 7.8 4.908 4.934 4.908 4.934 qmmm_forces_with_gaussian_LG 6 7.8 4.924 4.934 4.924 4.934 pw_gpu_cff 964 15.0 4.897 4.904 4.897 4.904 pw_poisson_solve 125 9.9 0.003 0.003 4.794 4.800 ot_mini 113 10.2 0.001 0.001 4.622 4.622 init_scf_run 6 5.8 0.000 0.000 4.338 4.338 scf_env_initial_rho_setup 6 6.8 0.000 0.000 4.337 4.337 pw_derive 1089 13.4 4.166 4.178 4.166 4.178 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="MQAE_single_node", label="MQAE_single_node", y=201.53, yerr=0.0 Plot: name="MQAE_single_node_timings_6cpu_1gpu", title="Timings of MQAE_single_node with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="rest", label="rest", y=107.72, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=31.952, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=24.436, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=17.31, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.604, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=9.508, yerr=0.0 Summary: Performance test took 39 minutes. Status: OK ---> Removed intermediate container dcfa85c2ede3 ---> 6aa0e72907f4 Step 45/46 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in 331150abe782 ---> Removed intermediate container 331150abe782 ---> 35c8e56728ac Step 46/46 : ENTRYPOINT [] ---> Running in e856830abb50 ---> Removed intermediate container e856830abb50 ---> e8844d927714 [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built e8844d927714 Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2026-07-02 08:04:54+00:00