StartDate: 2026-04-03 06:42:25+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: 3db43b466e07ace2e9becec7e852368f2b1589c6 CommitTime: 2026-04-02 13:46:51 +0200 CommitAuthor: Frederick Stein CommitSubject: Some improvements to the active space code (#5039) #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=3db43b466e07ace2e9becec7e852368f2b1589c6 SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 413MB Step 1/46 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 2cd52cbb1ebe: Waiting 6e8af4fd0a07: Waiting 02cb0e091e33: Waiting 9c3d619183d2: Waiting 7f7602a82106: Waiting 5a2aba542b08: Waiting 15a17189b2df: Waiting 6cb9b761b877: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 2cd52cbb1ebe: Verifying Checksum 2cd52cbb1ebe: Download complete 6e8af4fd0a07: Verifying Checksum 6e8af4fd0a07: Download complete 02cb0e091e33: Verifying Checksum 02cb0e091e33: Download complete 9c3d619183d2: Verifying Checksum 9c3d619183d2: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 7f7602a82106: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/46 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/46 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/46 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/46 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/46 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/46 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> 1ff2ec46e723 Step 8/46 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> bf4865207130 Step 9/46 : RUN mkdir scripts ---> Using cache ---> 95733bd3ea48 Step 10/46 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./tools/toolchain/scripts/generate_cmake_options.sh ./scripts/ ---> Using cache ---> 436ecf42e4e6 Step 11/46 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> e086cdcf92a6 Step 12/46 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --gpu-ver=V100 --dry-run --list-cmake-options=no ---> Using cache ---> 2e4e5326a0e2 Step 13/46 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> 292ef86ef5e2 Step 14/46 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> 35a6c0774e4a Step 15/46 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 17d2cb9b6367 Step 16/46 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> a726f6399dec Step 17/46 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> 08b3176f5c4b Step 18/46 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> 84df80588d0d Step 19/46 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> 0985b6504af4 Step 20/46 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> 18d84d5810f4 Step 21/46 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> 92a7ee20695a Step 22/46 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> 4eea4f45c46c Step 23/46 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> Using cache ---> 91e667907bdc Step 24/46 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Using cache ---> 5e3045527394 Step 25/46 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> Using cache ---> 0af9d2e9be60 Step 26/46 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Using cache ---> d1c746d88f44 Step 27/46 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> Using cache ---> c1c41ca33047 Step 28/46 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Using cache ---> ec39789c586d Step 29/46 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> Using cache ---> bff0608e0a58 Step 30/46 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Using cache ---> 92adcd501b84 Step 31/46 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> Using cache ---> 95776880c549 Step 32/46 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Using cache ---> 92a8fe130694 Step 33/46 : WORKDIR /opt/cp2k ---> Using cache ---> 72e300efeb1a Step 34/46 : COPY ./src ./src ---> a90d78ebc369 Step 35/46 : COPY ./data ./data ---> a12a9cd08006 Step 36/46 : COPY ./tools/build_utils ./tools/build_utils ---> 9d6c3b5a7cd6 Step 37/46 : COPY ./cmake ./cmake ---> 7977e77b1485 Step 38/46 : COPY ./CMakeLists.txt . ---> c9c4faeab345 Step 39/46 : COPY ./tools/docker/scripts/build_cp2k.sh . ---> 0e3402cabccd Step 40/46 : RUN ./build_cp2k.sh toolchain_cuda_V100 psmp ---> Running in 22801d413b63 ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpi.so (found version "4.1") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpicxx.so (found version "4.1") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpifort.so (found version "4.1") -- Found MPI: TRUE (found version "4.1") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "4.1") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.32 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.32/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so -- Checking for module 'libxsmm-shared' -- Found libxsmm-shared, version 1.17.0 -- Checking for module 'libxsmmf-shared' -- Found libxsmmf-shared, version 1.17.0 -- Checking for module 'libxsmmext-shared' -- Found libxsmmext-shared, version 1.17.0 -- Checking for module 'libxsmmnoblas-shared' -- Found libxsmmnoblas-shared, version 1.17.0 -- Found LibXSMM: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include -- Using LIBXSMM for Small Matrix Multiplication -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a -- CP2K_WITH_GPU is deprecated in favor of CMAKE_HIP_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 70 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "4.1") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Checking for module 'libxsmmf' -- Found libxsmmf, version 1.17.0 -- Checking for module 'libxsmmext' -- Found libxsmmext, version 1.17.0 ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "2.1.1") found components: C Fortran -- Found MPI: TRUE (found version "4.1") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Checking for module 'libint2' -- Package 'libint2', required by 'virtual:world', not found -- Found Libint2: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.7.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "2.1.1") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'libvdwxc>=0.3.0' -- Looking for vdwxc_init_mpi -- Looking for vdwxc_init_mpi - not found -- Found LibVDWXC: /opt/cp2k-toolchain/install/libvdwxc-0.4.0/lib/libvdwxc.a (Required is at least version "0.3.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - vendor: OpenBLAS - include directories: /opt/cp2k-toolchain/install/openblas-0.3.32/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so - LAPACK - include directories: /opt/cp2k-toolchain/install/openblas-0.3.32/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so - MPI - include directories: /opt/cp2k-toolchain/install/mpich-4.3.2/include - libraries: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpi.so - MPI_F08: ON - ScaLAPACK - vendor: auto - include directories: - libraries: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a - Hardware Acceleration: - CUDA: - GPU architecture number: 70 - GPU profiling enabled: - GPU accelerated modules - ELPA module: ON - GRID module: ON - DBM module: ON - PW module: ON - LibXC - version: 7.0.0 - include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - version: 2.1.1 - include directories: /opt/cp2k-toolchain/install/hdf5-2.1.1/include - libraries: hdf5-shared - FFTW3 - include directories: /opt/cp2k-toolchain/install/fftw-3.3.10/include - libraries: /opt/cp2k-toolchain/install/fftw-3.3.10/lib/libfftw3.a - LIBXSMM - include directories: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include - libraries: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmext.so;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so;/opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmf.so;:libxsmmext.a;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so - SpLA - include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - SIRIUS - include directories: - libraries: - COSMA - include directories: /opt/cp2k-toolchain/install/COSMA-2.8.2/include - libraries: MPI::MPI_CXX;costa::costa;$;$;$<$:cosma::BLAS::blas>;$;$<$:Tiled-MM::Tiled-MM>;$<$:Tiled-MM::Tiled-MM>;$<$:semiprof::semiprof>;$<$:cosma::scalapack::scalapack> - Libint2 - include directories: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include - libraries: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/lib/libint2.a - ELPA - include directories: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/include/elpa_openmp-2026.02.001 - libraries: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -------------------------------------------------------------------- - - - List of dependencies not included in this build - - - -------------------------------------------------------------------- - DFTD4 - DeePMD - PEXSI - ACE (libpace) - TBLITE - Spglib - LibSMEAGOL - MiMiC - openPMD - DLA-Future - PLUMED - Libvori - LibTorch - TREXIO - GreenX After building and installing CP2K the regtests can be run with the following command: /opt/cp2k/tests/do_regtest.py /opt/cp2k/bin psmp -- Configuring done (15.9s) -- Generating done (0.6s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container 22801d413b63 ---> ae84fc0a20bb Step 41/46 : COPY ./benchmarks ./benchmarks ---> d9a740e21499 Step 42/46 : COPY ./tools/regtesting ./tools/regtesting ---> 58a9beb50da8 Step 43/46 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> 7454205c4486 Step 44/46 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in 129d1de3c990 ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma xsmm dbcsr_acc sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 82 input files and 0 errors. ========== Running Performance Test ========== Plot: name="total_timings_6cpu_1gpu", title="Total Timings with 6 CPU Cores and 1 GPU", ylabel="time [s]" Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.029 0.030 106.126 106.126 qs_mol_dyn_low 1 2.0 0.005 0.005 105.669 105.672 qs_forces 11 3.9 0.002 0.002 105.616 105.616 qs_energies 11 4.9 0.001 0.001 93.642 93.643 scf_env_do_scf 11 5.9 0.001 0.001 71.585 71.585 velocity_verlet 10 3.0 0.001 0.002 67.965 67.984 scf_env_do_scf_inner_loop 108 6.5 0.006 0.009 60.807 60.807 rebuild_ks_matrix 119 8.3 0.001 0.001 26.752 26.754 qs_ks_build_kohn_sham_matrix 119 9.3 0.021 0.021 26.752 26.753 dbcsr_multiply_generic 2286 12.5 0.152 0.153 25.614 25.663 qs_ks_update_qs_env 119 7.6 0.001 0.001 24.559 24.560 qs_scf_new_mos 108 7.5 0.001 0.001 20.812 20.819 qs_scf_loop_do_ot 108 8.5 0.001 0.001 20.811 20.818 qs_rho_update_rho_low 119 7.7 0.001 0.001 20.270 20.279 calculate_rho_elec 119 8.7 0.930 0.934 20.269 20.278 ot_scf_mini 108 9.5 0.003 0.003 18.832 18.834 fft_wrap_pw1pw2 1201 11.6 0.025 0.026 15.840 15.880 sum_up_and_integrate 119 10.3 0.003 0.003 13.908 13.964 integrate_v_rspace 119 11.3 0.369 0.373 13.809 13.865 fft_wrap_pw1pw2_140 487 12.2 0.003 0.003 13.587 13.658 multiply_cannon 2286 13.5 0.361 0.367 12.921 12.927 multiply_cannon_loop 2286 14.5 0.268 0.269 11.793 11.806 make_m2s 4572 13.5 0.046 0.046 11.017 11.020 ot_mini 108 10.5 0.001 0.001 10.999 11.000 init_scf_run 11 5.9 0.000 0.000 10.970 10.970 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.970 10.970 make_images 4572 14.5 1.250 1.253 10.832 10.835 init_scf_loop 11 6.9 0.000 0.000 10.697 10.698 density_rs2pw 119 9.7 0.008 0.008 10.341 10.414 grid_collocate_task_list 119 9.7 8.962 9.009 8.962 9.009 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.832 8.832 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 8.455 8.610 pw_gpu_r3dc1d_3d_ps 606 13.1 2.363 2.384 8.091 8.104 wfi_extrapolate 11 7.9 0.002 0.002 7.847 7.847 pw_gpu_c1dr3d_3d_ps 595 14.2 2.307 2.322 7.718 7.746 prepare_preconditioner 11 7.9 0.000 0.000 7.442 7.445 make_preconditioner 11 8.9 0.000 0.000 7.442 7.445 grid_integrate_task_list 119 12.3 7.368 7.427 7.368 7.427 multiply_cannon_multrec 4572 15.5 2.208 2.216 6.612 6.630 qs_ot_get_derivative 108 11.5 0.002 0.002 6.616 6.618 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.209 6.466 hybrid_alltoall_any 4725 16.4 4.923 4.930 6.322 6.325 make_images_data 4572 15.5 0.058 0.059 6.178 6.183 potential_pw2rs 119 12.3 0.038 0.039 6.071 6.072 parallel_gemm_fm_cosma 81 9.0 6.000 6.001 6.000 6.001 build_core_ppl_forces 11 5.9 4.336 4.452 4.336 4.452 ot_diis_step 108 11.5 0.006 0.006 4.358 4.358 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.256 4.257 build_core_hamiltonian_matrix 11 6.9 0.001 0.002 4.179 4.239 dbcsr_mm_accdrv_process 9594 16.2 0.970 0.972 3.999 4.001 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.764 3.764 apply_single 119 13.6 0.001 0.001 3.764 3.764 dbcsr_complete_redistribute 329 12.2 1.517 1.525 3.479 3.745 qs_create_task_list 11 7.9 0.000 0.000 3.390 3.451 generate_qs_task_list 11 8.9 1.253 1.259 3.389 3.451 calculate_dm_sparse 119 9.5 0.001 0.001 3.410 3.416 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.359 3.359 qs_ot_get_p 119 10.4 0.001 0.001 3.267 3.270 mp_alltoall_z22v 1201 15.6 3.152 3.229 3.152 3.229 multiply_cannon_sync_h2d 4572 15.5 3.082 3.106 3.082 3.106 cp_dbcsr_sm_fm_multiply 37 9.5 0.002 0.002 2.940 2.941 mp_waitall_1 64495 16.9 2.784 2.808 2.784 2.808 pw_poisson_solve 119 10.3 0.003 0.003 2.751 2.751 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.699 2.710 transfer_rs2pw 487 10.6 0.009 0.009 2.533 2.650 calculate_first_density_matrix 1 7.0 0.000 0.000 2.635 2.635 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.409 2.409 jit_kernel_multiply 10 15.6 2.381 2.382 2.381 2.382 transfer_rs2pw_140 130 11.5 1.625 1.631 2.121 2.244 pw_gpu_fg 606 14.1 2.215 2.242 2.215 2.242 build_core_ppl 11 7.9 2.172 2.211 2.172 2.211 qs_ot_get_derivative_taylor 59 13.0 0.003 0.003 2.208 2.208 cp_fm_cholesky_invert 11 10.9 2.182 2.182 2.182 2.182 dbcsr_special_finalize 6858 15.5 0.045 0.045 2.167 2.172 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.127 2.137 compute_matrix_w 11 5.9 0.000 0.000 2.123 2.124 calculate_w_matrix_ot 11 6.9 0.004 0.004 2.123 2.124 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64", label="H2O-64", y=106.126, yerr=0.0 Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=74.537, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.962, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.368, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=6.0, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.923, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=4.336, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.029 0.029 102.483 102.483 qs_mol_dyn_low 1 2.0 0.005 0.005 102.022 102.025 qs_forces 11 3.9 0.002 0.002 101.970 101.970 qs_energies 11 4.9 0.001 0.001 89.649 89.649 scf_env_do_scf 11 5.9 0.001 0.001 67.042 67.042 velocity_verlet 10 3.0 0.002 0.002 66.767 66.786 scf_env_do_scf_inner_loop 96 6.5 0.006 0.008 55.810 55.810 rebuild_ks_matrix 107 8.3 0.001 0.001 26.083 26.083 qs_ks_build_kohn_sham_matrix 107 9.3 0.019 0.020 26.082 26.083 qs_ks_update_qs_env 107 7.6 0.001 0.001 23.600 23.600 dbcsr_multiply_generic 1966 12.4 0.139 0.140 23.406 23.489 qs_rho_update_rho_low 107 7.7 0.001 0.001 18.675 18.690 calculate_rho_elec 107 8.7 0.843 0.850 18.674 18.689 qs_scf_new_mos 96 7.5 0.001 0.001 18.573 18.577 qs_scf_loop_do_ot 96 8.5 0.001 0.001 18.572 18.576 ot_scf_mini 96 9.5 0.003 0.003 16.829 16.831 sum_up_and_integrate 107 10.3 0.003 0.003 14.451 14.539 fft_wrap_pw1pw2 1081 11.6 0.023 0.024 14.493 14.536 integrate_v_rspace 107 11.3 0.337 0.338 14.356 14.444 fft_wrap_pw1pw2_140 439 12.2 0.003 0.003 12.424 12.499 multiply_cannon 1966 13.4 0.315 0.316 11.870 11.879 init_scf_loop 11 6.9 0.000 0.000 11.150 11.150 init_scf_run 11 5.9 0.000 0.000 11.126 11.126 scf_env_initial_rho_setup 11 6.9 0.000 0.001 11.125 11.125 multiply_cannon_loop 1966 14.4 0.237 0.238 10.898 10.919 make_m2s 3932 13.4 0.041 0.041 9.981 9.997 ot_mini 96 10.5 0.001 0.001 9.852 9.853 make_images 3932 14.4 1.109 1.121 9.815 9.832 density_rs2pw 107 9.7 0.008 0.008 9.523 9.675 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 9.132 9.133 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 8.621 8.767 grid_integrate_task_list 107 12.3 8.472 8.558 8.472 8.558 grid_collocate_task_list 107 9.7 8.277 8.395 8.277 8.395 wfi_extrapolate 11 7.9 0.002 0.002 7.948 7.948 prepare_preconditioner 11 7.9 0.000 0.000 7.710 7.714 make_preconditioner 11 8.9 0.000 0.000 7.710 7.714 pw_gpu_r3dc1d_3d_ps 546 13.1 2.179 2.205 7.406 7.424 pw_gpu_c1dr3d_3d_ps 535 14.2 2.097 2.114 7.058 7.083 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.423 6.681 multiply_cannon_multrec 3932 15.4 1.899 1.906 6.205 6.235 parallel_gemm_fm_cosma 81 9.0 6.170 6.170 6.170 6.170 qs_ot_get_derivative 96 11.5 0.002 0.002 5.983 5.983 hybrid_alltoall_any 4079 16.3 4.480 4.495 5.726 5.743 make_images_data 3932 15.4 0.052 0.052 5.604 5.620 potential_pw2rs 107 12.3 0.035 0.035 5.546 5.547 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.462 4.573 build_core_ppl_forces 11 5.9 4.435 4.528 4.435 4.528 build_core_hamiltonian_matrix 11 6.9 0.001 0.002 4.224 4.271 dbcsr_mm_accdrv_process 8450 16.1 1.081 1.404 3.954 3.991 dbcsr_complete_redistribute 317 12.2 1.533 1.549 3.709 3.976 ot_diis_step 96 11.5 0.005 0.005 3.848 3.848 qs_create_task_list 11 7.9 0.000 0.000 3.584 3.646 generate_qs_task_list 11 8.9 1.586 1.623 3.584 3.646 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.547 3.548 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.400 3.401 apply_single 107 13.6 0.001 0.001 3.400 3.401 calculate_dm_sparse 107 9.5 0.001 0.001 3.209 3.209 cp_dbcsr_sm_fm_multiply 37 9.5 0.002 0.002 2.950 2.952 mp_alltoall_z22v 1081 15.6 2.862 2.951 2.862 2.951 qs_ot_get_p 107 10.4 0.001 0.001 2.893 2.895 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.846 2.866 multiply_cannon_sync_h2d 3932 15.4 2.775 2.785 2.775 2.785 calculate_first_density_matrix 1 7.0 0.000 0.000 2.702 2.702 transfer_rs2pw 439 10.6 0.009 0.009 2.414 2.595 jit_kernel_multiply 11 15.4 2.284 2.569 2.284 2.569 mp_waitall_1 55487 16.8 2.522 2.542 2.522 2.542 pw_poisson_solve 107 10.3 0.003 0.003 2.511 2.513 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.408 2.409 transfer_dbcsr_to_fm 11 10.9 0.002 0.002 2.269 2.287 build_core_ppl 11 7.9 2.188 2.224 2.188 2.224 transfer_rs2pw_140 118 11.5 1.511 1.526 2.035 2.223 cp_fm_cholesky_invert 11 10.9 2.219 2.219 2.219 2.219 compute_matrix_w 11 5.9 0.000 0.000 2.218 2.218 calculate_w_matrix_ot 11 6.9 0.004 0.004 2.218 2.218 copy_fm_to_dbcsr 170 11.1 0.002 0.002 1.879 2.138 build_kinetic_matrix_low 22 6.9 1.932 1.948 2.043 2.059 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64_nonortho", label="H2O-64_nonortho", y=102.483, yerr=0.0 Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=70.649, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.472, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.277, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=6.17, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.48, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=4.435, yerr=0.0 Running GW_PBE_4benzene.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.020 0.022 182.663 182.664 qs_energies 1 2.0 0.000 0.000 182.300 182.301 mp2_main 1 3.0 0.000 0.000 174.883 174.884 mp2_gpw_main 1 4.0 0.000 0.000 172.837 172.838 rpa_ri_compute_en 1 5.0 0.000 0.000 162.769 162.770 rpa_num_int 1 6.0 0.001 0.001 162.760 162.761 parallel_gemm_fm_cosma 105 8.4 79.133 79.228 79.133 79.228 compute_mat_P_omega 1 7.0 0.002 0.002 69.974 69.980 compute_mat_P_omega_contract 10 8.0 5.684 5.753 69.206 69.217 dbt_total 2336 9.6 0.022 0.023 68.869 68.870 compute_W_cubic_GW 10 7.0 0.004 0.004 51.349 51.354 dbt_contract 787 11.0 0.051 0.052 45.721 45.722 dbt_tas_total 1149 12.2 0.146 0.148 35.126 35.126 dbt_tas_multiply 807 12.1 0.003 0.003 34.400 34.401 dbt_tas_dbm 807 14.1 0.006 0.007 26.696 26.697 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 26.446 26.446 dbm_multiply 807 16.1 24.970 26.346 24.970 26.346 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 26.096 26.097 compute_mat_P_omega_calc_M_occ 250 9.0 5.646 5.741 24.485 24.485 dbt_copy 1107 10.7 0.074 0.074 23.481 23.655 dbt_tas_mm_1N 524 15.1 0.003 0.003 16.300 17.652 dbt_reshape 594 11.8 6.909 7.063 15.082 15.177 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 15.010 15.010 compute_QP_energies 1 7.0 0.000 0.000 12.479 12.479 compute_self_energy_cubic_gw 1 8.0 0.132 0.134 12.478 12.478 dbt_tas_reserve_blocks_index 3266 14.3 0.691 0.694 10.990 11.247 dbm_reserve_blocks 3634 15.3 10.653 10.907 10.653 10.907 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 10.057 10.057 dbt_crop 1042 12.0 6.999 7.053 9.383 9.450 dbt_reserve_blocks_index 2347 13.0 0.334 0.335 9.121 9.347 dbt_reserve_blocks_index_array 2289 12.1 0.011 0.012 8.894 9.151 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 9.046 9.046 dbt_tas_mm_2 251 15.0 0.003 0.003 7.715 7.716 scf_env_do_scf 1 3.0 0.000 0.000 6.789 6.789 scf_env_do_scf_inner_loop 17 4.0 0.001 0.001 6.789 6.789 mp_waitall_2 2656 15.9 5.984 5.998 5.984 5.998 dbcsr_multiply_generic 30 8.1 0.003 0.003 5.692 5.733 contract_cubic_gw 21 9.0 0.000 0.000 5.638 5.638 multiply_cannon 30 9.1 0.007 0.009 5.497 5.537 dbt_communicate_buffer 594 12.8 0.013 0.014 5.473 5.487 multiply_cannon_loop 30 10.1 0.005 0.005 5.441 5.482 get_2c_integrals 1 6.0 0.000 0.000 5.475 5.475 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 5.095 5.110 compute_mat_P_omega_copy_M_occ 250 9.0 0.002 0.002 4.990 4.996 multiply_cannon_multrec 60 11.1 0.167 0.178 4.757 4.809 dbt_tas_copy 511 11.5 2.643 2.659 4.637 4.682 dbcsr_mm_accdrv_process 328 12.3 0.045 0.045 4.405 4.465 jit_kernel_multiply 18 11.7 4.353 4.413 4.353 4.413 mp_sync 8688 11.6 3.078 4.063 3.078 4.063 qs_scf_new_mos 17 5.0 0.000 0.000 3.784 3.808 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="GW_PBE_4benzene", label="GW_PBE_4benzene", y=182.663, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_6cpu_1gpu", title="Timings of GW_PBE_4benzene with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="rest", label="rest", y=53.999000000000024, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=79.133, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=24.97, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=10.653, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_crop", label="dbt_crop", y=6.999, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=6.909, yerr=0.0 Running RI-HFX_H2O-32.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.023 0.023 200.200 200.200 qs_forces 1 2.0 0.000 0.000 199.692 199.692 rebuild_ks_matrix 7 6.6 0.000 0.000 194.855 194.855 qs_ks_build_kohn_sham_matrix 7 7.6 0.002 0.002 194.855 194.855 hfx_ks_matrix 7 8.6 0.000 0.000 190.944 190.945 dbt_total 849 11.0 0.009 0.009 141.513 141.513 hfx_ri_update_ks 7 9.6 0.000 0.000 110.040 110.041 hfx_ri_update_ks_Pmat 7 10.6 23.060 23.068 110.035 110.035 qs_energies 1 3.0 0.000 0.000 105.434 105.435 scf_env_do_scf 1 4.0 0.000 0.000 102.996 102.996 qs_ks_update_qs_env 8 6.0 0.000 0.000 100.652 100.653 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 94.210 94.211 dbt_contract 207 12.4 0.049 0.049 82.565 82.565 hfx_ri_update_forces 1 7.0 1.137 1.149 80.902 80.902 dbt_tas_total 369 13.4 0.077 0.077 68.166 68.166 dbt_tas_multiply 216 13.5 0.001 0.001 65.337 65.337 scf_env_do_scf_inner_loop 6 5.0 0.000 0.001 55.063 55.063 dbt_copy 423 11.8 0.046 0.047 54.226 54.974 dbt_tas_dbm 216 15.5 0.002 0.002 51.842 51.842 dbm_multiply 216 17.5 48.952 48.995 48.952 48.995 init_scf_loop 2 5.0 0.000 0.000 47.931 47.932 hfx_ri_forces_Pmat_3c 1 8.0 3.465 3.511 47.393 47.399 dbt_reshape 175 13.2 19.028 19.081 41.086 41.346 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 31.601 31.601 precalc_derivatives 1 8.0 1.998 2.026 27.495 27.495 dbt_tas_mm_2 91 16.5 0.001 0.001 21.734 21.734 mp_waitall_2 1022 16.5 19.401 19.504 19.401 19.504 dbt_tas_reserve_blocks_index 1323 15.4 1.744 1.755 18.363 18.671 dbm_reserve_blocks 1491 16.3 17.359 17.667 17.359 17.667 dbt_crop 372 13.7 13.253 13.317 17.186 17.311 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 17.031 17.031 dbt_tas_mm_3T 77 17.1 0.001 0.001 16.603 16.734 dbt_communicate_buffer 175 14.2 0.004 0.004 16.106 16.214 dbt_reserve_blocks_index 889 14.5 0.629 0.639 15.199 15.206 dbt_reserve_blocks_index_array 859 13.5 0.007 0.007 14.914 14.917 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 14.865 14.865 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 14.782 14.782 build_3c_derivatives 3 9.0 2.330 2.343 14.401 14.404 dbt_tas_mm_3N 37 15.4 0.000 0.000 11.228 11.395 dbt_tas_copy 248 12.5 4.121 4.265 7.618 8.037 mp_sync 2901 12.8 6.334 7.246 6.334 7.246 hfx_ri_pre_scf_Pmat_int 1 13.0 0.000 0.000 5.581 5.581 dbt_tas_replicate 168 15.1 2.286 2.287 4.802 4.852 hfx_ri_pre_scf_calc_tensors 1 14.0 0.004 0.004 4.789 4.796 hfx_ri_pre_scf_Pmat_copy_2 9 13.0 1.827 1.851 4.635 4.659 dbcsr_multiply_generic 155 10.8 0.008 0.008 4.133 4.141 hfx_ri_pre_scf_Pmat_RIx3C 9 13.0 0.000 0.000 4.059 4.110 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-HFX_H2O-32", label="RI-HFX_H2O-32", y=200.2, yerr=0.0 Plot: name="RI-HFX_H2O-32_timings_6cpu_1gpu", title="Timings of RI-HFX_H2O-32 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="rest", label="rest", y=72.39999999999999, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=48.952, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=23.06, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=19.401, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=19.028, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=17.359, yerr=0.0 Running RI-MP2_ammonia.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.011 0.012 110.272 110.272 qs_energies 1 2.0 0.000 0.000 110.070 110.070 mp2_main 1 3.0 0.000 0.000 101.927 101.927 mp2_gpw_main 1 4.0 0.001 0.001 101.530 101.530 mp2_ri_gpw_compute_in 1 5.0 0.594 0.597 56.952 56.958 mp2_ri_gpw_compute_in_loop 1 6.0 0.015 0.016 48.410 48.415 mp2_ri_gpw_compute_en 1 5.0 0.102 0.107 44.512 44.518 mp2_ri_gpw_compute_en_RI_loop 1 6.0 13.001 13.070 41.716 41.718 dbcsr_multiply_generic 2666 8.0 0.175 0.175 24.729 24.834 ao_to_mo_and_store_B_mult_1 1328 7.0 0.015 0.015 23.268 23.373 mp2_eri_3c_integrate_gpw 1328 7.0 0.018 0.019 19.102 19.206 mp2_ri_gpw_compute_en_expansio 1040 7.0 0.800 0.803 17.097 17.103 local_gemm 1040 8.0 16.297 16.306 16.297 16.306 make_m2s 5332 9.0 0.056 0.056 13.468 13.499 make_images 5332 10.0 2.503 2.512 13.273 13.305 integrate_v_rspace 1338 8.0 1.096 1.114 10.683 10.748 multiply_cannon 2666 9.0 0.425 0.430 10.552 10.628 multiply_cannon_loop 2666 10.0 0.205 0.209 9.363 9.441 hybrid_alltoall_any 6683 11.6 8.811 8.849 9.074 9.111 make_images_data 5332 11.0 0.073 0.073 8.999 9.037 grid_integrate_task_list 1338 9.0 8.251 8.294 8.251 8.294 get_2c_integrals 1 6.0 0.005 0.005 7.947 7.947 fft_wrap_pw1pw2 26668 10.4 0.150 0.156 7.596 7.653 compute_2c_integrals 1 7.0 0.008 0.008 7.333 7.334 collocate_function 1328 8.0 5.344 5.353 7.261 7.311 compute_2c_integrals_loop_lm 1 8.0 0.023 0.023 7.197 7.228 mp2_eri_2c_integrate_gpw 1 9.0 2.241 2.261 7.174 7.205 scf_env_do_scf 1 3.0 0.000 0.000 7.171 7.172 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 7.171 7.172 ao_to_mo_and_store_B_E_Ex_1 1328 7.0 3.757 3.759 5.719 5.721 mp2_ri_gpw_compute_en_ener 1040 7.0 5.612 5.622 5.612 5.622 qs_scf_new_mos 10 5.0 0.000 0.000 5.590 5.593 multiply_cannon_multrec 2676 11.0 2.354 2.370 5.294 5.300 mp2_ri_gpw_compute_en_comm 221 7.0 1.091 1.092 4.819 4.875 fft_wrap_pw1pw2_20 10647 11.4 0.023 0.023 4.338 4.376 pw_gpu_r3dc1d_3d 13282 12.2 3.673 3.728 3.673 3.728 eigensolver 11 5.8 0.002 0.002 3.140 3.142 potential_pw2rs 2666 10.0 0.104 0.104 2.704 2.721 mp_sendrecv_dm3 442 8.0 2.642 2.704 2.642 2.704 dbcsr_mm_accdrv_process 5392 12.0 0.245 0.249 2.670 2.692 pw_gpu_c1dr3d_3d 13280 12.7 2.654 2.664 2.654 2.664 cp_fm_diag_elpa 11 6.8 0.000 0.000 2.514 2.514 cp_fm_diag_elpa_base 11 7.8 2.426 2.443 2.513 2.513 copy_dbcsr_to_fm 1351 8.0 0.035 0.035 2.397 2.405 replicate_iaK_2intgroup 1 6.0 2.242 2.244 2.385 2.388 collocate_single_gaussian 1328 10.0 0.102 0.103 2.314 2.332 fft_wrap_pw1pw2_10 15957 11.5 0.021 0.021 2.300 2.328 jit_kernel_multiply 8 13.0 2.305 2.327 2.305 2.327 fill_local_i_aL 884 7.5 2.273 2.280 2.273 2.280 mp2_eri_2c_integrate_gpw_pot_l 1328 10.0 0.005 0.005 2.218 2.245 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-MP2_ammonia", label="RI-MP2_ammonia", y=110.272, yerr=0.0 Plot: name="RI-MP2_ammonia_timings_6cpu_1gpu", title="Timings of RI-MP2_ammonia with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="rest", label="rest", y=58.300000000000004, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="local_gemm", label="local_gemm", y=16.297, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=13.001, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=8.811, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.251, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_ener", label="mp2_ri_gpw_compute_en_ener", y=5.612, yerr=0.0 Running diag_cu144_broy.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/diag_cu144_broy_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.085 0.085 212.924 212.931 qs_energies 1 2.0 0.000 0.000 211.726 211.734 scf_env_do_scf 1 3.0 0.000 0.000 196.777 196.784 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 196.777 196.784 qs_ks_update_qs_env 15 5.0 0.000 0.000 97.858 97.879 rebuild_ks_matrix 15 6.0 0.000 0.000 97.633 97.655 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 97.633 97.654 qs_vxc_create 15 8.0 0.034 0.067 60.786 60.801 qs_scf_new_mos 15 5.0 0.000 0.000 55.964 56.019 fft_wrap_pw1pw2 1086 10.0 0.030 0.031 52.946 52.989 calculate_dispersion_nonloc 15 9.0 11.418 11.434 52.395 52.446 eigensolver 15 6.0 0.002 0.002 45.721 45.911 qs_rho_update_rho_low 16 5.0 0.000 0.000 40.889 40.891 calculate_rho_elec 16 6.0 0.193 0.193 40.888 40.891 sum_up_and_integrate 15 8.0 0.000 0.000 35.353 35.392 integrate_v_rspace 15 9.0 0.049 0.049 35.328 35.366 grid_collocate_task_list 16 7.0 28.518 28.550 28.518 28.550 grid_integrate_task_list 15 10.0 27.728 27.752 27.728 27.752 pw_gpu_c1dr3d_3d_ps 585 12.1 5.837 5.915 27.641 27.712 cp_fm_diag_elpa 15 7.0 0.000 0.000 27.693 27.697 cp_fm_diag_elpa_base 15 8.0 25.826 26.416 27.687 27.688 fft_wrap_pw1pw2_150 765 11.0 0.005 0.005 26.761 26.783 pw_gpu_r3dc1d_3d_ps 501 11.9 5.001 5.250 25.268 25.295 cp_fm_cholesky_restore 45 7.0 16.005 16.748 16.005 16.748 fft_wrap_pw1pw2_200 197 11.3 0.001 0.001 12.702 12.732 density_rs2pw 16 7.0 0.002 0.002 12.161 12.202 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 10.940 10.940 vdW_energy 15 10.0 10.016 10.053 10.016 10.053 pw_gpu_ffc 585 13.1 9.685 9.722 9.685 9.722 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 9.359 9.523 pw_gpu_cff 501 12.9 9.171 9.182 9.171 9.182 xc_vxc_pw_create 15 9.0 0.195 0.198 8.357 8.359 potential_pw2rs 15 10.0 0.007 0.007 7.550 7.564 pw_gpu_sf 585 13.1 7.321 7.344 7.321 7.344 mp_alltoall_z22v 1086 14.0 6.878 7.250 6.878 7.250 copy_dbcsr_to_fm 16 5.9 0.001 0.001 7.101 7.174 pw_gpu_fg 501 12.9 6.962 6.974 6.962 6.974 dbcsr_complete_redistribute 46 8.3 2.018 2.141 6.168 6.328 fft_wrap_pw1pw2_10 62 10.5 0.000 0.000 6.123 6.126 build_core_ppnl 1 5.0 5.208 5.281 5.208 5.281 cp_fm_uplo_to_full 30 8.0 3.881 5.018 3.881 5.018 xc_rho_set_and_dset_create 15 10.0 0.136 0.140 4.936 4.963 x_to_yz 585 13.1 1.071 1.079 4.762 4.830 xc_pw_derive 90 11.0 0.001 0.001 4.747 4.784 yz_to_x 501 12.9 0.886 0.887 4.074 4.371 gspace_mixing 14 5.0 0.146 0.147 4.367 4.367 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="diag_cu144_broy", label="diag_cu144_broy", y=212.924, yerr=0.0 Plot: name="diag_cu144_broy_timings_6cpu_1gpu", title="Timings of diag_cu144_broy with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="rest", label="rest", y=103.429, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=28.518, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=27.728, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=25.826, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=16.005, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="calculate_dispersion_nonloc", label="calculate_dispersion_nonloc", y=11.418, yerr=0.0 Running bench_dftb.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/bench_dftb_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.048 0.048 296.923 296.924 qs_energies 1 2.0 0.000 0.000 296.782 296.784 ls_scf 1 3.0 0.000 0.000 295.842 295.845 ls_scf_main 1 4.0 0.001 0.002 285.118 285.121 density_matrix_trs4 11 5.0 0.009 0.009 237.426 237.447 dbcsr_multiply_generic 185 6.1 0.376 0.377 195.087 195.168 multiply_cannon 185 7.1 2.090 2.340 137.110 137.478 multiply_cannon_loop 185 8.1 0.371 0.373 121.450 121.526 multiply_cannon_multrec 370 9.1 93.605 93.851 104.172 104.340 make_m2s 370 7.1 0.032 0.033 48.731 48.862 make_images 370 8.1 12.243 12.706 47.583 47.721 ls_scf_dm_to_ks 11 5.0 0.000 0.000 42.711 42.726 matrix_ls_to_qs 11 6.0 0.000 0.000 39.431 39.616 dbcsr_complete_redistribute 23 7.5 24.670 24.785 33.658 33.748 matrix_decluster 11 7.0 0.000 0.000 30.452 30.540 arnoldi_extremal 12 6.1 0.000 0.000 25.327 25.331 arnoldi_normal_ev 12 7.1 0.011 0.011 25.326 25.330 build_subspace 23 8.1 0.067 0.068 24.810 24.811 dbcsr_matrix_vector_mult 652 9.0 0.166 0.167 23.344 23.434 dbcsr_matrix_vector_mult_local 652 10.0 22.268 22.358 22.276 22.366 make_images_data 370 9.1 0.014 0.015 17.669 17.961 hybrid_alltoall_any 393 9.9 12.070 12.392 17.116 17.415 calculate_norms 740 9.1 16.341 16.440 16.341 16.440 dbcsr_finalize 559 7.6 0.239 0.243 15.343 15.463 dbcsr_merge_all 510 8.6 2.769 2.897 13.992 14.114 dbcsr_copy 761 7.5 1.748 1.795 10.750 10.895 setup_rec_index_2d 370 8.1 10.300 10.333 10.300 10.333 dbcsr_special_finalize 555 9.1 0.011 0.011 10.271 10.286 dbcsr_sort_indices 1283 10.0 9.823 9.832 9.823 9.832 dbcsr_add_d 280 6.0 0.002 0.002 9.165 9.299 dbcsr_add_anytype 280 7.0 4.013 4.018 9.164 9.297 dbcsr_dot 144 6.3 8.333 8.371 8.990 9.147 ls_scf_init_scf 1 4.0 0.000 0.000 9.081 9.082 dbcsr_copy_into_existing 11 8.0 8.977 9.075 8.977 9.075 ls_scf_init_matrix_S 1 5.0 0.000 0.000 8.599 8.607 dbcsr_mm_accdrv_process 14501 10.0 0.808 0.909 8.477 8.511 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.001 7.727 7.727 dbcsr_mm_accdrv_process_sort 14501 11.0 7.574 7.601 7.574 7.601 tree_to_linear_d 23 10.5 7.431 7.481 7.431 7.481 dbcsr_merge_single_wm 370 10.1 0.597 0.598 6.539 6.564 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="bench_dftb", label="bench_dftb", y=296.923, yerr=0.0 Plot: name="bench_dftb_timings_6cpu_1gpu", title="Timings of bench_dftb with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="rest", label="rest", y=127.79599999999999, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=93.605, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=24.67, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=22.268, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="calculate_norms", label="calculate_norms", y=16.341, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="make_images", label="make_images", y=12.243, yerr=0.0 Running dbcsr.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/dbcsr_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.004 0.005 51.453 51.453 lib_test 1 2.0 0.000 0.000 51.439 51.447 dbcsr_run_tests 3 3.0 0.000 0.000 51.438 51.446 test_multiplies_multiproc 3 4.0 0.001 0.001 39.537 39.634 dbcsr_multiply_generic 9 5.0 0.002 0.002 30.675 30.681 multiply_cannon 9 6.0 0.296 0.397 20.535 20.853 multiply_cannon_loop 9 7.0 0.003 0.003 18.900 18.953 multiply_cannon_multrec 18 8.0 10.127 10.178 17.798 17.835 dbcsr_make_random_matrix 9 4.0 8.122 8.181 11.754 11.851 dbcsr_finalize 27 5.7 0.001 0.001 7.946 8.110 dbcsr_merge_all 18 6.5 3.823 3.828 7.821 7.980 dbcsr_mm_accdrv_process 8199 9.0 1.273 1.323 7.445 7.469 dbcsr_redistribute 9 5.0 3.775 3.785 6.070 6.078 make_m2s 18 6.0 0.001 0.001 5.216 5.219 make_images 18 7.0 0.391 0.394 5.181 5.184 dbcsr_mm_accdrv_process_sort 8199 10.0 5.027 5.030 5.027 5.030 make_images_data 18 8.0 0.001 0.001 2.963 2.969 hybrid_alltoall_any 18 9.0 2.558 2.558 2.922 2.928 dbcsr_data_copy_aa2 18 7.5 1.869 2.032 1.869 2.032 mp_alltoall_d11v 27 6.0 2.007 2.008 2.007 2.008 tree_to_linear_d 9 7.0 1.981 1.989 1.981 1.989 dbcsr_data_release 507 7.7 1.486 1.498 1.486 1.498 dbcsr_data_new 354 7.4 1.068 1.206 1.068 1.206 jit_kernel_multiply 6 10.0 1.144 1.173 1.144 1.173 dbcsr_checksum 6 5.0 1.117 1.132 1.135 1.135 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="dbcsr", label="dbcsr", y=51.453, yerr=0.0 Plot: name="dbcsr_timings_6cpu_1gpu", title="Timings of dbcsr with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="rest", label="rest", y=20.579, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=10.127, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=8.122, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_mm_accdrv_process_sort", label="dbcsr_mm_accdrv_process_sort", y=5.027, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.823, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_redistribute", label="dbcsr_redistribute", y=3.775, yerr=0.0 Running MQAE_single_node.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/MQAE_single_node_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.050 0.052 213.707 213.709 qs_mol_dyn_low 1 2.0 0.004 0.005 212.009 212.049 qs_forces 6 3.8 0.001 0.001 128.358 128.359 qs_energies 6 4.8 0.001 0.001 121.114 121.115 scf_env_do_scf 6 5.8 0.000 0.001 112.786 112.787 scf_env_do_scf_inner_loop 113 6.2 0.006 0.008 104.408 104.409 velocity_verlet 5 3.0 0.003 0.004 103.684 103.740 rebuild_ks_matrix 119 8.1 0.000 0.000 85.526 85.528 qs_ks_build_kohn_sham_matrix 119 9.1 0.021 0.021 85.526 85.527 qs_ks_update_qs_env 119 7.3 0.002 0.002 80.725 80.725 fft_wrap_pw1pw2 2059 12.4 0.047 0.049 66.783 66.786 fft_wrap_pw1pw2_150 1321 13.9 0.010 0.010 63.914 63.972 qs_vxc_create 119 10.1 0.002 0.002 54.805 54.806 xc_vxc_pw_create 119 11.1 1.637 1.641 54.803 54.804 qmmm_el_coupling 6 3.8 0.000 0.000 44.817 44.818 qmmm_elec_with_gaussian 6 4.8 0.022 0.022 44.811 44.812 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 42.932 43.369 qmmm_elec_gaussian_low_G 6 6.8 37.649 38.140 37.649 38.140 xc_pw_derive 714 13.1 0.011 0.012 37.533 37.564 pw_gpu_c1dr3d_3d_ps 1095 14.8 10.819 10.843 35.843 35.877 qmmm_forces 6 3.8 0.002 0.002 35.748 35.749 qmmm_forces_with_gaussian 6 4.8 0.024 0.024 35.091 35.393 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 33.654 33.967 pw_gpu_r3dc1d_3d_ps 964 14.0 9.627 9.679 30.880 30.914 qmmm_forces_gaussian_low_G 6 6.8 28.015 28.384 28.015 28.384 xc_rho_set_and_dset_create 119 12.1 2.585 2.586 27.676 27.701 xc_pw_divergence 119 12.1 0.006 0.007 25.074 25.101 qs_rho_update_rho_low 119 7.3 0.001 0.001 22.711 22.791 calculate_rho_elec 119 8.3 1.197 1.197 22.710 22.790 density_rs2pw 119 9.3 0.009 0.009 16.466 16.620 dbcsr_multiply_generic 2598 12.3 0.105 0.105 14.545 14.602 sum_up_and_integrate 119 10.1 0.003 0.003 14.095 14.136 integrate_v_rspace 119 11.1 0.021 0.022 13.901 13.942 mp_alltoall_z22v 2059 16.4 13.162 13.355 13.162 13.355 multiply_cannon 2598 13.3 0.239 0.240 12.820 12.843 multiply_cannon_loop 2598 14.3 0.267 0.268 12.312 12.331 multiply_cannon_multrec 5196 15.3 4.092 4.174 10.119 10.180 potential_pw2rs 119 12.1 0.036 0.036 9.954 9.955 x_to_yz 1095 15.8 2.336 2.337 9.565 9.661 qs_ks_ddapc 119 10.1 0.003 0.003 9.184 9.197 pw_gpu_sf 1095 15.8 8.661 8.681 8.661 8.681 pw_gpu_fg 964 15.0 8.307 8.414 8.307 8.414 init_scf_loop 6 6.8 0.000 0.000 8.374 8.374 yz_to_x 964 15.0 1.847 1.872 7.781 7.851 qs_scf_new_mos 113 7.2 0.001 0.001 7.454 7.456 qs_scf_loop_do_ot 113 8.2 0.001 0.001 7.453 7.455 ot_scf_mini 113 9.2 0.002 0.002 7.158 7.159 pw_gpu_ffc 1095 15.8 6.778 6.797 6.778 6.797 dbcsr_mm_accdrv_process 13992 16.0 0.970 1.382 5.957 5.979 init_scf_run 6 5.8 0.000 0.000 5.972 5.972 scf_env_initial_rho_setup 6 6.8 0.000 0.000 5.972 5.972 qmmm_forces_gaussian_low_R 6 6.8 0.000 0.000 5.639 5.695 qmmm_forces_with_gaussian_LG 6 7.8 5.639 5.695 5.639 5.695 xc_functional_eval 238 13.1 0.003 0.003 5.660 5.671 qmmm_elec_gaussian_low_R 6 6.8 0.000 0.000 5.283 5.337 qmmm_elec_with_gaussian_LG 6 7.8 5.283 5.337 5.283 5.337 jit_kernel_multiply 24 14.7 4.936 5.326 4.936 5.326 pw_gpu_cff 964 15.0 5.092 5.147 5.092 5.147 ot_mini 113 10.2 0.001 0.001 5.120 5.122 grid_collocate_task_list 119 9.3 5.007 5.076 5.007 5.076 pw_poisson_solve 125 9.9 0.004 0.004 4.891 4.897 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 4.834 4.835 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="MQAE_single_node", label="MQAE_single_node", y=213.707, yerr=0.0 Plot: name="MQAE_single_node_timings_6cpu_1gpu", title="Timings of MQAE_single_node with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="rest", label="rest", y=114.43499999999999, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=37.649, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=28.015, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=13.162, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.819, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=9.627, yerr=0.0 Summary: Performance test took 24 minutes. Status: OK ---> Removed intermediate container 129d1de3c990 ---> 81dfc5a6c786 Step 45/46 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in 0ea7648e7220 ---> Removed intermediate container 0ea7648e7220 ---> 7341d97bdff2 Step 46/46 : ENTRYPOINT [] ---> Running in 587ba07958f2 ---> Removed intermediate container 587ba07958f2 ---> e5c48ef28195 [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built e5c48ef28195 Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2026-04-03 07:27:45+00:00