StartDate: 2026-03-29 07:12:44+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: b4df9bbab3e4d463e7ca6a63cfed687d94442942 CommitTime: 2026-03-28 09:49:06 +0100 CommitAuthor: Growl CommitSubject: DBCSR 2.9.0 -> 2.9.1 (toolchain) #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=b4df9bbab3e4d463e7ca6a63cfed687d94442942 SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 412.9MB Step 1/46 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 02cb0e091e33: Waiting 9c3d619183d2: Waiting 7f7602a82106: Waiting 5a2aba542b08: Waiting 6cb9b761b877: Waiting 2cd52cbb1ebe: Waiting 6e8af4fd0a07: Waiting 15a17189b2df: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 2cd52cbb1ebe: Verifying Checksum 2cd52cbb1ebe: Download complete 6e8af4fd0a07: Verifying Checksum 6e8af4fd0a07: Download complete 02cb0e091e33: Verifying Checksum 02cb0e091e33: Download complete 9c3d619183d2: Verifying Checksum 9c3d619183d2: Download complete 7f7602a82106: Verifying Checksum 7f7602a82106: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/46 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/46 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/46 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/46 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/46 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/46 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> 1ff2ec46e723 Step 8/46 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> bf4865207130 Step 9/46 : RUN mkdir scripts ---> Using cache ---> 95733bd3ea48 Step 10/46 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./tools/toolchain/scripts/generate_cmake_options.sh ./scripts/ ---> Using cache ---> 436ecf42e4e6 Step 11/46 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> e086cdcf92a6 Step 12/46 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --gpu-ver=V100 --dry-run --list-cmake-options=no ---> Using cache ---> 2e4e5326a0e2 Step 13/46 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> 292ef86ef5e2 Step 14/46 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> 35a6c0774e4a Step 15/46 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 17d2cb9b6367 Step 16/46 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> a726f6399dec Step 17/46 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> 08b3176f5c4b Step 18/46 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> 84df80588d0d Step 19/46 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> 0985b6504af4 Step 20/46 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> 18d84d5810f4 Step 21/46 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> 5df7f3a2c257 Step 22/46 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> 0436339144a0 Step 23/46 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> Using cache ---> c2362a23cc3b Step 24/46 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Using cache ---> ecf9d2da1e12 Step 25/46 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> Using cache ---> 9659704d384d Step 26/46 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Using cache ---> d5a9626f3ec0 Step 27/46 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> Using cache ---> 40afe42497f4 Step 28/46 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Using cache ---> f080cd58109e Step 29/46 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> Using cache ---> ab491af73149 Step 30/46 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Using cache ---> 30cf49c0f563 Step 31/46 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> 61bd6094723e Step 32/46 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Running in 644158c86667 ==================== Installing DBCSR ==================== wget --quiet https://www.cp2k.org/static/downloads/dbcsr-2.9.1.tar.gz -O dbcsr-2.9.1.tar.gz dbcsr-2.9.1.tar.gz: OK Checksum of dbcsr-2.9.1.tar.gz Ok Installing from scratch into /opt/cp2k-toolchain/install/dbcsr-2.9.1 Step DBCSR took 136.00 seconds. ---> Removed intermediate container 644158c86667 ---> 5f0a639a41da Step 33/46 : WORKDIR /opt/cp2k ---> Running in 432aa824ca5a ---> Removed intermediate container 432aa824ca5a ---> e8aece56d642 Step 34/46 : COPY ./src ./src ---> 858b714bd73a Step 35/46 : COPY ./data ./data ---> 1dc71c6d43d4 Step 36/46 : COPY ./tools/build_utils ./tools/build_utils ---> f4ecf49f7857 Step 37/46 : COPY ./cmake ./cmake ---> b699cc5ac7b5 Step 38/46 : COPY ./CMakeLists.txt . ---> b1ac1b95ae72 Step 39/46 : COPY ./tools/docker/scripts/build_cp2k.sh . ---> 61f5774cbd6b Step 40/46 : RUN ./build_cp2k.sh toolchain_cuda_V100 psmp ---> Running in 05a3ecac4592 ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpi.so (found version "4.1") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpicxx.so (found version "4.1") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpifort.so (found version "4.1") -- Found MPI: TRUE (found version "4.1") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "4.1") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.32 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.32/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so -- Checking for module 'libxsmm-shared' -- Found libxsmm-shared, version 1.17.0 -- Checking for module 'libxsmmf-shared' -- Found libxsmmf-shared, version 1.17.0 -- Checking for module 'libxsmmext-shared' -- Found libxsmmext-shared, version 1.17.0 -- Checking for module 'libxsmmnoblas-shared' -- Found libxsmmnoblas-shared, version 1.17.0 -- Found LibXSMM: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include -- Using LIBXSMM for Small Matrix Multiplication -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a -- CP2K_WITH_GPU is deprecated in favor of CMAKE_HIP_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 70 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "4.1") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Checking for module 'libxsmmf' -- Found libxsmmf, version 1.17.0 -- Checking for module 'libxsmmext' -- Found libxsmmext, version 1.17.0 ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "2.1.1") found components: C Fortran -- Found MPI: TRUE (found version "4.1") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Checking for module 'libint2' -- Package 'libint2', required by 'virtual:world', not found -- Found Libint2: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.7.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "2.1.1") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'libvdwxc>=0.3.0' -- Looking for vdwxc_init_mpi -- Looking for vdwxc_init_mpi - not found -- Found LibVDWXC: /opt/cp2k-toolchain/install/libvdwxc-0.4.0/lib/libvdwxc.a (Required is at least version "0.3.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - vendor: OpenBLAS - include directories: /opt/cp2k-toolchain/install/openblas-0.3.32/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so - LAPACK - include directories: /opt/cp2k-toolchain/install/openblas-0.3.32/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so - MPI - include directories: /opt/cp2k-toolchain/install/mpich-4.3.2/include - libraries: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpi.so - MPI_F08: ON - ScaLAPACK - vendor: auto - include directories: - libraries: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a - Hardware Acceleration: - CUDA: - GPU architecture number: 70 - GPU profiling enabled: - GPU accelerated modules - ELPA module: ON - GRID module: ON - DBM module: ON - PW module: ON - LibXC - version: 7.0.0 - include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - version: 2.1.1 - include directories: /opt/cp2k-toolchain/install/hdf5-2.1.1/include - libraries: hdf5-shared - FFTW3 - include directories: /opt/cp2k-toolchain/install/fftw-3.3.10/include - libraries: /opt/cp2k-toolchain/install/fftw-3.3.10/lib/libfftw3.a - LIBXSMM - include directories: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include - libraries: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmext.so;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so;/opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmf.so;:libxsmmext.a;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so - SpLA - include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - SIRIUS - include directories: - libraries: - COSMA - include directories: /opt/cp2k-toolchain/install/COSMA-2.8.2/include - libraries: MPI::MPI_CXX;costa::costa;$;$;$<$:cosma::BLAS::blas>;$;$<$:Tiled-MM::Tiled-MM>;$<$:Tiled-MM::Tiled-MM>;$<$:semiprof::semiprof>;$<$:cosma::scalapack::scalapack> - Libint2 - include directories: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include - libraries: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/lib/libint2.a - ELPA - include directories: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/include/elpa_openmp-2026.02.001 - libraries: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -------------------------------------------------------------------- - - - List of dependencies not included in this build - - - -------------------------------------------------------------------- - DFTD4 - DeePMD - PEXSI - ACE (libpace) - TBLITE - Spglib - LibSMEAGOL - MiMiC - openPMD - DLA-Future - PLUMED - Libvori - LibTorch - TREXIO - GreenX After building and installing CP2K the regtests can be run with the following command: /opt/cp2k/tests/do_regtest.py /opt/cp2k/bin psmp -- Configuring done (12.5s) -- Generating done (0.5s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container 05a3ecac4592 ---> a6076dea6e40 Step 41/46 : COPY ./benchmarks ./benchmarks ---> 7d2ec3f72ab8 Step 42/46 : COPY ./tools/regtesting ./tools/regtesting ---> 687f9d78e276 Step 43/46 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> 050dd1b4f202 Step 44/46 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in ff148b3848a1 ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma xsmm dbcsr_acc sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 82 input files and 0 errors. ========== Running Performance Test ========== Plot: name="total_timings_6cpu_1gpu", title="Total Timings with 6 CPU Cores and 1 GPU", ylabel="time [s]" Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.028 0.029 102.893 102.893 qs_mol_dyn_low 1 2.0 0.004 0.004 102.455 102.459 qs_forces 11 3.9 0.002 0.002 102.405 102.405 qs_energies 11 4.9 0.001 0.001 90.977 90.981 scf_env_do_scf 11 5.9 0.001 0.001 70.061 70.061 velocity_verlet 10 3.0 0.002 0.002 65.358 65.375 scf_env_do_scf_inner_loop 108 6.5 0.006 0.009 59.721 59.721 rebuild_ks_matrix 119 8.3 0.001 0.001 26.279 26.280 qs_ks_build_kohn_sham_matrix 119 9.3 0.019 0.020 26.278 26.279 dbcsr_multiply_generic 2286 12.5 0.148 0.149 25.113 25.128 qs_ks_update_qs_env 119 7.6 0.001 0.001 24.104 24.105 qs_scf_new_mos 108 7.5 0.001 0.001 20.414 20.435 qs_scf_loop_do_ot 108 8.5 0.001 0.001 20.413 20.435 qs_rho_update_rho_low 119 7.7 0.001 0.001 19.920 19.946 calculate_rho_elec 119 8.7 0.894 0.905 19.919 19.945 ot_scf_mini 108 9.5 0.003 0.003 18.477 18.477 fft_wrap_pw1pw2 1201 11.6 0.024 0.024 15.543 15.580 sum_up_and_integrate 119 10.3 0.002 0.003 13.788 13.832 integrate_v_rspace 119 11.3 0.357 0.359 13.696 13.740 fft_wrap_pw1pw2_140 487 12.2 0.003 0.003 13.328 13.366 multiply_cannon 2286 13.5 0.336 0.343 12.733 12.755 multiply_cannon_loop 2286 14.5 0.262 0.265 11.638 11.675 ot_mini 108 10.5 0.001 0.001 10.772 10.772 make_m2s 4572 13.5 0.044 0.044 10.738 10.746 make_images 4572 14.5 1.211 1.215 10.559 10.569 init_scf_run 11 5.9 0.000 0.000 10.552 10.552 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.551 10.551 init_scf_loop 11 6.9 0.000 0.000 10.262 10.263 density_rs2pw 119 9.7 0.008 0.008 10.056 10.137 grid_collocate_task_list 119 9.7 8.941 8.984 8.941 8.984 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.325 8.325 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.972 8.135 pw_gpu_r3dc1d_3d_ps 606 13.1 2.280 2.292 7.953 7.954 pw_gpu_c1dr3d_3d_ps 595 14.2 2.218 2.238 7.559 7.598 wfi_extrapolate 11 7.9 0.001 0.001 7.520 7.520 grid_integrate_task_list 119 12.3 7.384 7.430 7.384 7.430 prepare_preconditioner 11 7.9 0.000 0.000 7.055 7.058 make_preconditioner 11 8.9 0.000 0.000 7.055 7.058 multiply_cannon_multrec 4572 15.5 2.177 2.235 6.476 6.543 qs_ot_get_derivative 108 11.5 0.002 0.002 6.477 6.477 hybrid_alltoall_any 4725 16.4 4.847 4.857 6.227 6.238 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.897 6.136 make_images_data 4572 15.5 0.054 0.054 6.091 6.094 potential_pw2rs 119 12.3 0.036 0.037 5.955 5.956 parallel_gemm_fm_cosma 81 9.0 5.551 5.551 5.551 5.551 ot_diis_step 108 11.5 0.006 0.006 4.270 4.270 build_core_ppl_forces 11 5.9 4.092 4.230 4.092 4.230 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.957 4.013 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.990 4.000 dbcsr_mm_accdrv_process 9594 16.2 0.862 0.950 3.910 3.912 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.703 3.704 apply_single 119 13.6 0.001 0.001 3.703 3.704 dbcsr_complete_redistribute 329 12.2 1.436 1.463 3.331 3.580 calculate_dm_sparse 119 9.5 0.001 0.001 3.328 3.351 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.283 3.283 qs_ot_get_p 119 10.4 0.001 0.001 3.227 3.232 qs_create_task_list 11 7.9 0.000 0.000 3.153 3.217 generate_qs_task_list 11 8.9 1.189 1.201 3.153 3.217 multiply_cannon_sync_h2d 4572 15.5 3.116 3.173 3.116 3.173 mp_alltoall_z22v 1201 15.6 3.102 3.143 3.102 3.143 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.882 2.883 mp_waitall_1 64495 16.9 2.742 2.759 2.742 2.759 pw_poisson_solve 119 10.3 0.003 0.003 2.671 2.672 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.605 2.610 calculate_first_density_matrix 1 7.0 0.000 0.000 2.576 2.576 transfer_rs2pw 487 10.6 0.008 0.008 2.397 2.517 jit_kernel_multiply 11 15.7 2.428 2.514 2.428 2.514 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.367 2.368 pw_gpu_fg 606 14.1 2.219 2.223 2.219 2.223 qs_ot_get_derivative_taylor 59 13.0 0.003 0.003 2.159 2.160 build_core_ppl 11 7.9 2.083 2.125 2.083 2.125 transfer_rs2pw_140 130 11.5 1.509 1.519 1.993 2.117 qs_ot_p2m_diag 50 11.0 0.089 0.091 2.095 2.097 dbcsr_special_finalize 6858 15.5 0.042 0.042 2.069 2.077 cp_fm_cholesky_invert 11 10.9 2.070 2.070 2.070 2.070 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.053 2.059 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64", label="H2O-64", y=102.893, yerr=0.0 Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=72.078, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.941, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.384, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.551, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.847, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=4.092, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.027 0.028 99.115 99.115 qs_mol_dyn_low 1 2.0 0.005 0.005 98.674 98.676 qs_forces 11 3.9 0.002 0.002 98.627 98.627 qs_energies 11 4.9 0.001 0.001 86.948 86.948 scf_env_do_scf 11 5.9 0.001 0.001 65.630 65.630 velocity_verlet 10 3.0 0.001 0.002 64.601 64.617 scf_env_do_scf_inner_loop 96 6.5 0.005 0.008 54.921 54.921 rebuild_ks_matrix 107 8.3 0.001 0.001 25.673 25.674 qs_ks_build_kohn_sham_matrix 107 9.3 0.018 0.018 25.673 25.673 qs_ks_update_qs_env 107 7.6 0.001 0.001 23.171 23.173 dbcsr_multiply_generic 1966 12.4 0.129 0.130 23.049 23.149 qs_rho_update_rho_low 107 7.7 0.001 0.001 18.354 18.365 calculate_rho_elec 107 8.7 0.803 0.807 18.353 18.364 qs_scf_new_mos 96 7.5 0.001 0.001 18.311 18.311 qs_scf_loop_do_ot 96 8.5 0.001 0.001 18.310 18.310 ot_scf_mini 96 9.5 0.003 0.003 16.600 16.601 sum_up_and_integrate 107 10.3 0.002 0.002 14.286 14.376 fft_wrap_pw1pw2 1081 11.6 0.022 0.022 14.256 14.302 integrate_v_rspace 107 11.3 0.322 0.324 14.202 14.292 fft_wrap_pw1pw2_140 439 12.2 0.003 0.003 12.276 12.336 multiply_cannon 1966 13.4 0.292 0.292 11.592 11.688 init_scf_loop 11 6.9 0.000 0.000 10.629 10.629 init_scf_run 11 5.9 0.000 0.000 10.572 10.572 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.571 10.571 multiply_cannon_loop 1966 14.4 0.227 0.228 10.558 10.560 make_m2s 3932 13.4 0.038 0.039 9.952 10.065 make_images 3932 14.4 1.117 1.166 9.794 9.905 ot_mini 96 10.5 0.001 0.001 9.709 9.709 density_rs2pw 107 9.7 0.007 0.007 9.274 9.434 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.663 8.663 grid_integrate_task_list 107 12.3 8.455 8.549 8.455 8.549 grid_collocate_task_list 107 9.7 8.249 8.381 8.249 8.381 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 8.024 8.176 wfi_extrapolate 11 7.9 0.001 0.001 7.634 7.634 pw_gpu_r3dc1d_3d_ps 546 13.1 2.106 2.181 7.369 7.378 prepare_preconditioner 11 7.9 0.000 0.000 7.245 7.254 make_preconditioner 11 8.9 0.000 0.000 7.245 7.254 pw_gpu_c1dr3d_3d_ps 535 14.2 2.000 2.021 6.860 6.897 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.108 6.351 multiply_cannon_multrec 3932 15.4 1.901 1.921 5.919 5.943 hybrid_alltoall_any 4079 16.3 4.461 4.559 5.845 5.860 qs_ot_get_derivative 96 11.5 0.001 0.002 5.778 5.779 make_images_data 3932 15.4 0.047 0.047 5.673 5.680 parallel_gemm_fm_cosma 81 9.0 5.679 5.680 5.679 5.680 potential_pw2rs 107 12.3 0.033 0.034 5.424 5.425 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.211 4.396 build_core_ppl_forces 11 5.9 4.091 4.224 4.091 4.224 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 4.000 4.064 ot_diis_step 96 11.5 0.005 0.005 3.908 3.908 dbcsr_mm_accdrv_process 8450 16.1 0.797 0.879 3.670 3.702 dbcsr_complete_redistribute 317 12.2 1.414 1.421 3.429 3.671 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.512 3.515 apply_single 107 13.6 0.001 0.001 3.512 3.514 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.497 3.498 qs_create_task_list 11 7.9 0.000 0.000 3.347 3.450 generate_qs_task_list 11 8.9 1.475 1.482 3.347 3.450 calculate_dm_sparse 107 9.5 0.001 0.001 3.045 3.048 mp_alltoall_z22v 1081 15.6 2.864 2.985 2.864 2.985 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.834 2.835 multiply_cannon_sync_h2d 3932 15.4 2.797 2.835 2.797 2.835 qs_ot_get_p 107 10.4 0.001 0.001 2.785 2.786 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.762 2.780 mp_waitall_1 55487 16.8 2.581 2.751 2.581 2.751 transfer_rs2pw 439 10.6 0.007 0.008 2.273 2.503 calculate_first_density_matrix 1 7.0 0.000 0.000 2.493 2.493 jit_kernel_multiply 11 15.7 2.320 2.430 2.320 2.430 pw_poisson_solve 107 10.3 0.003 0.003 2.410 2.411 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.319 2.320 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.217 2.235 build_core_ppl 11 7.9 2.113 2.167 2.113 2.167 transfer_rs2pw_140 118 11.5 1.366 1.384 1.905 2.142 cp_fm_cholesky_invert 11 10.9 2.091 2.091 2.091 2.091 pw_gpu_fg 546 14.1 2.061 2.067 2.061 2.067 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64_nonortho", label="H2O-64_nonortho", y=99.115, yerr=0.0 Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=68.17999999999999, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.455, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.249, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.679, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.461, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=4.091, yerr=0.0 Running GW_PBE_4benzene.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.019 0.020 172.720 172.722 qs_energies 1 2.0 0.000 0.000 172.407 172.411 mp2_main 1 3.0 0.000 0.000 165.577 165.580 mp2_gpw_main 1 4.0 0.000 0.000 163.742 163.746 rpa_ri_compute_en 1 5.0 0.000 0.000 154.569 154.572 rpa_num_int 1 6.0 0.001 0.001 154.560 154.563 parallel_gemm_fm_cosma 105 8.4 70.857 70.860 70.857 70.860 compute_mat_P_omega 1 7.0 0.001 0.002 70.068 70.071 compute_mat_P_omega_contract 10 8.0 5.477 5.532 69.334 69.342 dbt_total 2336 9.6 0.021 0.021 69.226 69.227 dbt_contract 787 11.0 0.049 0.050 46.689 46.691 compute_W_cubic_GW 10 7.0 0.004 0.004 46.029 46.033 dbt_tas_total 1149 12.2 0.140 0.141 36.374 36.374 dbt_tas_multiply 807 12.1 0.003 0.003 35.672 35.672 dbt_tas_dbm 807 14.1 0.006 0.006 28.107 28.107 dbm_multiply 807 16.1 26.825 27.162 26.825 27.162 compute_mat_P_omega_calc_M_occ 250 9.0 5.517 5.577 24.784 24.784 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 23.968 23.968 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 23.632 23.633 dbt_copy 1107 10.7 0.072 0.072 22.708 23.023 dbt_tas_mm_1N 524 15.1 0.003 0.003 18.180 18.563 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 15.524 15.525 dbt_reshape 594 11.8 6.755 6.994 14.752 14.867 compute_QP_energies 1 7.0 0.000 0.000 12.015 12.015 compute_self_energy_cubic_gw 1 8.0 0.123 0.124 12.015 12.015 dbt_tas_reserve_blocks_index 3266 14.3 0.679 0.689 10.535 10.567 dbm_reserve_blocks 3634 15.3 10.199 10.219 10.199 10.219 dbt_crop 1042 12.0 6.831 6.986 9.156 9.354 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 9.162 9.162 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 8.992 8.992 dbt_reserve_blocks_index 2347 13.0 0.334 0.337 8.822 8.973 dbt_reserve_blocks_index_array 2289 12.1 0.011 0.011 8.610 8.741 dbt_tas_mm_2 251 15.0 0.003 0.003 7.685 7.685 scf_env_do_scf 1 3.0 0.000 0.000 6.286 6.286 scf_env_do_scf_inner_loop 17 4.0 0.001 0.001 6.286 6.286 mp_waitall_2 2656 15.9 5.907 5.949 5.907 5.949 contract_cubic_gw 21 9.0 0.000 0.000 5.532 5.532 dbt_communicate_buffer 594 12.8 0.011 0.011 5.401 5.445 dbcsr_multiply_generic 30 8.1 0.003 0.003 5.188 5.213 multiply_cannon 30 9.1 0.010 0.013 4.994 5.017 compute_mat_P_omega_copy_M_occ 250 9.0 0.002 0.002 4.996 5.010 multiply_cannon_loop 30 10.1 0.004 0.004 4.937 4.963 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 4.945 4.945 get_2c_integrals 1 6.0 0.000 0.000 4.712 4.712 dbt_tas_copy 511 11.5 2.540 2.604 4.385 4.562 multiply_cannon_multrec 60 11.1 0.172 0.189 4.335 4.378 dbcsr_mm_accdrv_process 328 12.3 1.709 3.377 3.997 4.026 jit_kernel_multiply 18 11.8 2.281 3.920 2.281 3.920 qs_scf_new_mos 17 5.0 0.000 0.000 3.477 3.522 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="GW_PBE_4benzene", label="GW_PBE_4benzene", y=172.72, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_6cpu_1gpu", title="Timings of GW_PBE_4benzene with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="rest", label="rest", y=51.253, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=70.857, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=26.825, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=10.199, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_crop", label="dbt_crop", y=6.831, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=6.755, yerr=0.0 Running RI-HFX_H2O-32.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.022 0.023 192.971 192.971 qs_forces 1 2.0 0.000 0.000 192.511 192.511 rebuild_ks_matrix 7 6.6 0.000 0.000 188.037 188.038 qs_ks_build_kohn_sham_matrix 7 7.6 0.002 0.002 188.037 188.038 hfx_ks_matrix 7 8.6 0.000 0.000 184.276 184.284 dbt_total 849 11.0 0.009 0.009 137.047 137.048 hfx_ri_update_ks 7 9.6 0.000 0.000 105.366 105.366 hfx_ri_update_ks_Pmat 7 10.6 21.853 22.035 105.360 105.361 qs_energies 1 3.0 0.000 0.000 100.841 100.841 scf_env_do_scf 1 4.0 0.000 0.000 98.661 98.661 qs_ks_update_qs_env 8 6.0 0.000 0.000 96.416 96.416 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 91.628 91.628 dbt_contract 207 12.4 0.048 0.048 80.960 80.960 hfx_ri_update_forces 1 7.0 1.066 1.070 78.908 78.916 dbt_tas_total 369 13.4 0.074 0.074 67.299 67.299 dbt_tas_multiply 216 13.5 0.001 0.001 64.598 64.598 scf_env_do_scf_inner_loop 6 5.0 0.000 0.001 53.129 53.129 dbt_copy 423 11.8 0.045 0.046 51.786 52.165 dbt_tas_dbm 216 15.5 0.002 0.002 51.600 51.600 dbm_multiply 216 17.5 48.695 49.076 48.695 49.076 hfx_ri_forces_Pmat_3c 1 8.0 3.616 3.625 47.350 47.367 init_scf_loop 2 5.0 0.000 0.000 45.530 45.530 dbt_reshape 175 13.2 17.794 17.926 38.894 39.195 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 30.426 30.426 precalc_derivatives 1 8.0 1.819 1.845 25.719 25.719 dbt_tas_mm_2 91 16.5 0.001 0.001 21.197 21.197 mp_waitall_2 1022 16.5 18.471 18.476 18.471 18.476 dbt_tas_reserve_blocks_index 1323 15.4 1.665 1.673 17.885 17.899 dbm_reserve_blocks 1491 16.3 16.925 16.931 16.925 16.931 dbt_tas_mm_3T 77 17.1 0.001 0.001 16.532 16.551 dbt_crop 372 13.7 12.558 12.601 16.303 16.361 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 16.072 16.072 dbt_communicate_buffer 175 14.2 0.004 0.004 15.332 15.333 dbt_reserve_blocks_index 889 14.5 0.622 0.629 14.554 14.562 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 14.377 14.377 dbt_reserve_blocks_index_array 859 13.5 0.007 0.007 14.273 14.273 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 14.064 14.064 build_3c_derivatives 3 9.0 2.196 2.202 13.585 13.587 dbt_tas_mm_3N 37 15.4 0.000 0.000 11.348 11.414 dbt_tas_copy 248 12.5 4.130 4.141 7.799 7.822 mp_sync 2901 12.8 5.905 6.864 5.905 6.864 hfx_ri_pre_scf_Pmat_int 1 13.0 0.000 0.000 5.245 5.245 dbt_tas_replicate 168 15.1 2.198 2.212 4.556 4.584 hfx_ri_pre_scf_calc_tensors 1 14.0 0.003 0.003 4.525 4.525 hfx_ri_pre_scf_Pmat_copy_2 9 13.0 1.687 1.705 4.286 4.305 hfx_ri_pre_scf_Pmat_RIx3C 9 13.0 0.000 0.000 3.932 3.973 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-HFX_H2O-32", label="RI-HFX_H2O-32", y=192.971, yerr=0.0 Plot: name="RI-HFX_H2O-32_timings_6cpu_1gpu", title="Timings of RI-HFX_H2O-32 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="rest", label="rest", y=69.233, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=48.695, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=21.853, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=18.471, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=17.794, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=16.925, yerr=0.0 Running RI-MP2_ammonia.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.010 0.011 105.667 105.669 qs_energies 1 2.0 0.000 0.000 105.481 105.483 mp2_main 1 3.0 0.000 0.000 97.632 97.633 mp2_gpw_main 1 4.0 0.001 0.001 97.249 97.250 mp2_ri_gpw_compute_in 1 5.0 0.562 0.564 54.560 54.603 mp2_ri_gpw_compute_in_loop 1 6.0 0.013 0.014 46.328 46.373 mp2_ri_gpw_compute_en 1 5.0 0.091 0.092 42.624 42.666 mp2_ri_gpw_compute_en_RI_loop 1 6.0 12.819 12.821 39.968 39.968 dbcsr_multiply_generic 2666 8.0 0.158 0.158 23.221 23.840 ao_to_mo_and_store_B_mult_1 1328 7.0 0.014 0.014 21.823 22.442 mp2_eri_3c_integrate_gpw 1328 7.0 0.018 0.018 18.743 19.355 mp2_ri_gpw_compute_en_expansio 1040 7.0 0.727 0.733 16.622 16.674 local_gemm 1040 8.0 15.895 15.941 15.895 15.941 make_m2s 5332 9.0 0.052 0.052 12.605 12.716 make_images 5332 10.0 2.312 2.352 12.423 12.534 integrate_v_rspace 1338 8.0 1.051 1.066 10.534 10.818 multiply_cannon 2666 9.0 0.391 0.410 9.949 10.677 multiply_cannon_loop 2666 10.0 0.193 0.198 8.777 9.404 hybrid_alltoall_any 6683 11.6 8.293 8.371 8.548 8.624 make_images_data 5332 11.0 0.062 0.062 8.455 8.532 grid_integrate_task_list 1338 9.0 8.185 8.468 8.185 8.468 fft_wrap_pw1pw2 26668 10.4 0.149 0.155 7.711 8.021 get_2c_integrals 1 6.0 0.004 0.004 7.662 7.669 collocate_function 1328 8.0 5.012 5.026 7.096 7.402 compute_2c_integrals 1 7.0 0.007 0.008 7.094 7.094 compute_2c_integrals_loop_lm 1 8.0 0.014 0.022 6.960 6.999 mp2_eri_2c_integrate_gpw 1 9.0 2.105 2.131 6.946 6.977 scf_env_do_scf 1 3.0 0.000 0.000 6.949 6.950 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 6.948 6.950 ao_to_mo_and_store_B_E_Ex_1 1328 7.0 3.585 3.594 5.482 5.513 qs_scf_new_mos 10 5.0 0.000 0.000 5.383 5.387 multiply_cannon_multrec 2676 11.0 2.026 2.193 4.803 4.989 mp2_ri_gpw_compute_en_ener 1040 7.0 4.933 4.940 4.933 4.940 fft_wrap_pw1pw2_20 10647 11.4 0.021 0.021 4.477 4.798 mp2_ri_gpw_compute_en_comm 221 7.0 1.012 1.012 4.452 4.495 pw_gpu_r3dc1d_3d 13282 12.2 3.850 4.154 3.850 4.154 eigensolver 11 5.8 0.002 0.002 3.039 3.040 potential_pw2rs 2666 10.0 0.098 0.099 2.655 2.658 pw_gpu_c1dr3d_3d 13280 12.7 2.640 2.652 2.640 2.652 dbcsr_mm_accdrv_process 5392 12.0 0.845 1.456 2.536 2.546 mp_sendrecv_dm3 442 8.0 2.432 2.474 2.432 2.474 cp_fm_diag_elpa 11 6.8 0.000 0.000 2.424 2.425 cp_fm_diag_elpa_base 11 7.8 2.340 2.356 2.422 2.423 copy_dbcsr_to_fm 1351 8.0 0.031 0.033 2.306 2.331 collocate_single_gaussian 1328 10.0 0.097 0.098 2.298 2.316 fft_wrap_pw1pw2_10 15957 11.5 0.020 0.020 2.305 2.310 replicate_iaK_2intgroup 1 6.0 2.102 2.103 2.242 2.243 jit_kernel_multiply 8 13.0 1.578 2.210 1.578 2.210 multiply_cannon_sync_h2d 2676 11.0 1.749 2.188 1.749 2.188 mp2_eri_2c_integrate_gpw_pot_l 1328 10.0 0.004 0.004 2.163 2.178 fill_local_i_aL 884 7.5 2.150 2.151 2.150 2.151 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-MP2_ammonia", label="RI-MP2_ammonia", y=105.667, yerr=0.0 Plot: name="RI-MP2_ammonia_timings_6cpu_1gpu", title="Timings of RI-MP2_ammonia with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="rest", label="rest", y=55.463, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="local_gemm", label="local_gemm", y=15.895, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=12.819, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=8.293, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.185, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="collocate_function", label="collocate_function", y=5.012, yerr=0.0 Running diag_cu144_broy.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/diag_cu144_broy_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.076 0.077 203.557 203.558 qs_energies 1 2.0 0.000 0.000 202.434 202.436 scf_env_do_scf 1 3.0 0.000 0.000 188.608 188.610 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 188.608 188.609 qs_ks_update_qs_env 15 5.0 0.000 0.000 94.491 94.526 rebuild_ks_matrix 15 6.0 0.000 0.000 94.292 94.327 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 94.292 94.327 qs_vxc_create 15 8.0 0.000 0.000 58.024 58.043 qs_scf_new_mos 15 5.0 0.000 0.001 52.207 52.274 calculate_dispersion_nonloc 15 9.0 10.997 11.025 50.091 50.116 fft_wrap_pw1pw2 1086 10.0 0.029 0.030 50.015 50.020 eigensolver 15 6.0 0.002 0.002 42.838 42.873 qs_rho_update_rho_low 16 5.0 0.000 0.000 40.158 40.159 calculate_rho_elec 16 6.0 0.179 0.179 40.158 40.159 sum_up_and_integrate 15 8.0 0.000 0.000 34.805 34.865 integrate_v_rspace 15 9.0 0.047 0.047 34.780 34.841 grid_collocate_task_list 16 7.0 28.810 28.829 28.810 28.829 grid_integrate_task_list 15 10.0 27.825 27.860 27.825 27.860 pw_gpu_c1dr3d_3d_ps 585 12.1 5.579 5.650 26.221 26.242 cp_fm_diag_elpa 15 7.0 0.000 0.000 25.963 25.968 cp_fm_diag_elpa_base 15 8.0 24.204 24.767 25.957 25.957 fft_wrap_pw1pw2_150 765 11.0 0.004 0.005 25.664 25.666 pw_gpu_r3dc1d_3d_ps 501 11.9 4.674 4.875 23.759 23.772 cp_fm_cholesky_restore 45 7.0 14.993 15.706 14.993 15.706 fft_wrap_pw1pw2_200 197 11.3 0.001 0.001 12.101 12.145 density_rs2pw 16 7.0 0.001 0.001 11.152 11.180 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 9.942 9.942 vdW_energy 15 10.0 9.484 9.526 9.484 9.526 pw_gpu_ffc 585 13.1 9.000 9.052 9.000 9.052 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 8.583 8.609 pw_gpu_cff 501 12.9 8.493 8.505 8.493 8.505 xc_vxc_pw_create 15 9.0 0.179 0.180 7.932 7.938 mp_alltoall_z22v 1086 14.0 6.599 7.027 6.599 7.027 pw_gpu_sf 585 13.1 7.017 7.021 7.017 7.021 potential_pw2rs 15 10.0 0.007 0.007 6.909 6.934 pw_gpu_fg 501 12.9 6.684 6.762 6.684 6.762 copy_dbcsr_to_fm 16 5.9 0.001 0.001 6.539 6.628 dbcsr_complete_redistribute 46 8.3 1.774 1.781 5.622 5.707 fft_wrap_pw1pw2_10 62 10.5 0.000 0.000 5.529 5.533 cp_fm_uplo_to_full 30 8.0 3.633 4.868 3.633 4.868 build_core_ppnl 1 5.0 4.805 4.818 4.805 4.818 x_to_yz 585 13.1 0.995 1.007 4.591 4.696 xc_rho_set_and_dset_create 15 10.0 0.130 0.130 4.676 4.679 xc_pw_derive 90 11.0 0.001 0.001 4.566 4.571 yz_to_x 501 12.9 0.850 0.857 3.853 4.156 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="diag_cu144_broy", label="diag_cu144_broy", y=203.557, yerr=0.0 Plot: name="diag_cu144_broy_timings_6cpu_1gpu", title="Timings of diag_cu144_broy with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="rest", label="rest", y=96.728, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=28.81, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=27.825, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=24.204, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=14.993, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="calculate_dispersion_nonloc", label="calculate_dispersion_nonloc", y=10.997, yerr=0.0 Running bench_dftb.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/bench_dftb_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.045 0.046 269.946 269.947 qs_energies 1 2.0 0.000 0.000 269.826 269.828 ls_scf 1 3.0 0.000 0.000 268.980 268.981 ls_scf_main 1 4.0 0.001 0.002 259.195 259.196 density_matrix_trs4 11 5.0 0.008 0.008 215.263 215.342 dbcsr_multiply_generic 185 6.1 0.323 0.327 175.976 176.065 multiply_cannon 185 7.1 2.027 2.046 122.746 123.656 multiply_cannon_loop 185 8.1 0.335 0.336 108.268 108.806 multiply_cannon_multrec 370 9.1 82.343 82.851 91.711 92.241 make_m2s 370 7.1 0.030 0.030 45.000 45.120 make_images 370 8.1 11.332 11.651 43.941 44.059 ls_scf_dm_to_ks 11 5.0 0.000 0.000 39.446 39.494 matrix_ls_to_qs 11 6.0 0.000 0.000 36.550 36.812 dbcsr_complete_redistribute 23 7.5 22.585 22.682 31.016 31.205 matrix_decluster 11 7.0 0.000 0.000 28.127 28.287 arnoldi_extremal 12 6.1 0.000 0.000 23.668 23.673 arnoldi_normal_ev 12 7.1 0.010 0.010 23.668 23.673 build_subspace 23 8.1 0.063 0.063 23.185 23.186 dbcsr_matrix_vector_mult 652 9.0 0.152 0.154 21.662 21.713 dbcsr_matrix_vector_mult_local 652 10.0 20.654 20.704 20.661 20.710 make_images_data 370 9.1 0.011 0.012 16.519 16.765 hybrid_alltoall_any 393 9.9 11.692 11.944 16.017 16.262 calculate_norms 740 9.1 15.691 15.750 15.691 15.750 dbcsr_finalize 559 7.6 0.218 0.226 14.417 14.511 dbcsr_merge_all 510 8.6 2.716 2.734 13.195 13.284 dbcsr_copy 761 7.5 1.692 1.724 10.134 10.268 dbcsr_special_finalize 555 9.1 0.010 0.010 9.439 9.452 setup_rec_index_2d 370 8.1 9.373 9.407 9.373 9.407 dbcsr_sort_indices 1283 10.0 8.873 8.931 8.873 8.931 dbcsr_add_d 280 6.0 0.001 0.001 8.682 8.714 dbcsr_add_anytype 280 7.0 3.796 3.801 8.681 8.712 dbcsr_copy_into_existing 11 8.0 8.421 8.524 8.421 8.524 dbcsr_dot 144 6.3 7.622 7.627 8.093 8.322 ls_scf_init_scf 1 4.0 0.000 0.000 8.296 8.297 ls_scf_init_matrix_S 1 5.0 0.000 0.000 7.863 7.864 dbcsr_mm_accdrv_process 14501 10.0 0.687 0.767 7.347 7.357 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.000 7.087 7.090 tree_to_linear_d 23 10.5 6.987 7.008 6.987 7.008 dbcsr_mm_accdrv_process_sort 14501 11.0 6.582 6.590 6.582 6.590 dbcsr_merge_single_wm 370 10.1 0.555 0.583 6.088 6.091 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="bench_dftb", label="bench_dftb", y=269.946, yerr=0.0 Plot: name="bench_dftb_timings_6cpu_1gpu", title="Timings of bench_dftb with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="rest", label="rest", y=116.98100000000002, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=82.343, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=22.585, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=20.654, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="calculate_norms", label="calculate_norms", y=15.691, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=11.692, yerr=0.0 Running dbcsr.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/dbcsr_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.004 0.005 48.448 48.449 lib_test 1 2.0 0.000 0.000 48.439 48.442 dbcsr_run_tests 3 3.0 0.000 0.000 48.438 48.441 test_multiplies_multiproc 3 4.0 0.001 0.001 37.361 37.434 dbcsr_multiply_generic 9 5.0 0.002 0.002 28.988 28.995 multiply_cannon 9 6.0 0.279 0.541 19.154 19.566 multiply_cannon_loop 9 7.0 0.002 0.003 17.741 18.046 multiply_cannon_multrec 18 8.0 9.365 9.653 16.542 16.851 dbcsr_make_random_matrix 9 4.0 7.522 7.593 10.925 10.998 dbcsr_finalize 27 5.7 0.001 0.001 7.538 7.677 dbcsr_merge_all 18 6.5 3.650 3.660 7.420 7.557 dbcsr_mm_accdrv_process 8199 9.0 1.346 1.471 6.946 6.958 dbcsr_redistribute 9 5.0 3.547 3.555 5.732 5.747 make_m2s 18 6.0 0.001 0.001 5.016 5.038 make_images 18 7.0 0.369 0.374 4.982 5.005 dbcsr_mm_accdrv_process_sort 8199 10.0 4.724 4.725 4.724 4.725 make_images_data 18 8.0 0.001 0.001 2.869 2.889 hybrid_alltoall_any 18 9.0 2.472 2.486 2.837 2.857 dbcsr_data_copy_aa2 18 7.5 1.780 1.930 1.780 1.930 mp_alltoall_d11v 27 6.0 1.915 1.923 1.915 1.923 tree_to_linear_d 9 7.0 1.850 1.851 1.850 1.851 dbcsr_data_release 507 7.7 1.390 1.391 1.390 1.391 dbcsr_checksum 6 5.0 1.061 1.065 1.075 1.075 jit_kernel_multiply 6 10.0 0.876 0.988 0.876 0.988 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="dbcsr", label="dbcsr", y=48.448, yerr=0.0 Plot: name="dbcsr_timings_6cpu_1gpu", title="Timings of dbcsr with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="rest", label="rest", y=19.64, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=9.365, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=7.522, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_mm_accdrv_process_sort", label="dbcsr_mm_accdrv_process_sort", y=4.724, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.65, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_redistribute", label="dbcsr_redistribute", y=3.547, yerr=0.0 Running MQAE_single_node.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/MQAE_single_node_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.047 0.049 200.007 200.007 qs_mol_dyn_low 1 2.0 0.004 0.004 198.489 198.524 qs_forces 6 3.8 0.001 0.001 122.242 122.242 qs_energies 6 4.8 0.000 0.000 115.400 115.401 scf_env_do_scf 6 5.8 0.000 0.000 107.817 107.817 scf_env_do_scf_inner_loop 113 6.2 0.005 0.007 99.904 99.904 velocity_verlet 5 3.0 0.003 0.003 96.139 96.189 rebuild_ks_matrix 119 8.1 0.000 0.000 81.747 81.748 qs_ks_build_kohn_sham_matrix 119 9.1 0.019 0.019 81.746 81.748 qs_ks_update_qs_env 119 7.3 0.001 0.001 77.120 77.121 fft_wrap_pw1pw2 2059 12.4 0.045 0.046 63.997 64.018 fft_wrap_pw1pw2_150 1321 13.9 0.009 0.009 61.286 61.326 qs_vxc_create 119 10.1 0.002 0.002 52.271 52.272 xc_vxc_pw_create 119 11.1 1.524 1.533 52.269 52.270 qmmm_el_coupling 6 3.8 0.000 0.000 40.463 40.465 qmmm_elec_with_gaussian 6 4.8 0.019 0.019 40.457 40.459 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 38.862 38.984 xc_pw_derive 714 13.1 0.010 0.010 36.039 36.057 pw_gpu_c1dr3d_3d_ps 1095 14.8 10.316 10.342 34.669 34.696 qmmm_elec_gaussian_low_G 6 6.8 34.143 34.275 34.143 34.275 qmmm_forces 6 3.8 0.001 0.001 33.202 33.202 qmmm_forces_with_gaussian 6 4.8 0.022 0.022 32.501 32.776 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 31.206 31.483 pw_gpu_r3dc1d_3d_ps 964 14.0 9.076 9.134 29.271 29.278 xc_rho_set_and_dset_create 119 12.1 2.400 2.409 26.453 26.494 qmmm_forces_gaussian_low_G 6 6.8 26.125 26.362 26.125 26.362 xc_pw_divergence 119 12.1 0.006 0.006 23.900 23.924 qs_rho_update_rho_low 119 7.3 0.001 0.001 21.748 21.817 calculate_rho_elec 119 8.3 1.093 1.096 21.747 21.817 density_rs2pw 119 9.3 0.008 0.008 15.654 15.842 sum_up_and_integrate 119 10.1 0.002 0.002 13.632 13.706 dbcsr_multiply_generic 2598 12.3 0.095 0.096 13.658 13.667 integrate_v_rspace 119 11.1 0.022 0.022 13.455 13.528 mp_alltoall_z22v 2059 16.4 13.026 13.354 13.026 13.354 multiply_cannon 2598 13.3 0.213 0.215 12.062 12.083 multiply_cannon_loop 2598 14.3 0.243 0.247 11.593 11.618 multiply_cannon_multrec 5196 15.3 4.019 4.097 9.503 9.602 potential_pw2rs 119 12.1 0.033 0.033 9.474 9.475 x_to_yz 1095 15.8 2.227 2.273 9.344 9.465 qs_ks_ddapc 119 10.1 0.002 0.002 8.751 8.754 pw_gpu_sf 1095 15.8 8.608 8.641 8.608 8.641 init_scf_loop 6 6.8 0.000 0.000 7.910 7.910 yz_to_x 964 15.0 1.733 1.736 7.642 7.806 pw_gpu_fg 964 15.0 7.598 7.731 7.598 7.731 qs_scf_new_mos 113 7.2 0.001 0.001 7.183 7.185 qs_scf_loop_do_ot 113 8.2 0.001 0.001 7.182 7.184 ot_scf_mini 113 9.2 0.002 0.002 6.899 6.902 pw_gpu_ffc 1095 15.8 6.383 6.419 6.383 6.419 dbcsr_mm_accdrv_process 13992 16.0 0.898 1.268 5.417 5.438 init_scf_run 6 5.8 0.000 0.000 5.410 5.410 scf_env_initial_rho_setup 6 6.8 0.000 0.000 5.410 5.410 xc_functional_eval 238 13.1 0.003 0.003 5.177 5.196 qmmm_forces_gaussian_low_R 6 6.8 0.000 0.000 5.081 5.121 qmmm_forces_with_gaussian_LG 6 7.8 5.081 5.121 5.081 5.121 grid_collocate_task_list 119 9.3 4.973 5.080 4.973 5.080 ot_mini 113 10.2 0.001 0.001 4.926 4.928 pw_gpu_cff 964 15.0 4.889 4.908 4.889 4.908 jit_kernel_multiply 24 14.7 4.472 4.862 4.472 4.862 qmmm_elec_gaussian_low_R 6 6.8 0.000 0.000 4.719 4.728 qmmm_elec_with_gaussian_LG 6 7.8 4.719 4.728 4.719 4.728 pw_poisson_solve 125 9.9 0.003 0.003 4.684 4.702 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 4.656 4.656 pw_derive 1089 13.4 4.015 4.082 4.015 4.082 qs_ot_get_derivative 113 11.2 0.001 0.001 4.036 4.036 grid_integrate_task_list 119 12.1 3.958 4.033 3.958 4.033 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="MQAE_single_node", label="MQAE_single_node", y=200.007, yerr=0.0 Plot: name="MQAE_single_node_timings_6cpu_1gpu", title="Timings of MQAE_single_node with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="rest", label="rest", y=107.321, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=34.143, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=26.125, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=13.026, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.316, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=9.076, yerr=0.0 Summary: Performance test took 23 minutes. Status: OK ---> Removed intermediate container ff148b3848a1 ---> cf84ac873ca3 Step 45/46 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in 4e6cf6ad6727 ---> Removed intermediate container 4e6cf6ad6727 ---> df6bd623a33f Step 46/46 : ENTRYPOINT [] ---> Running in 3a0768478c0b ---> Removed intermediate container 3a0768478c0b ---> ba9cb9678508 [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built ba9cb9678508 Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2026-03-29 07:57:42+00:00