StartDate: 2026-01-08 06:06:29+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: 1f1a7a2a435155a403097f4c56526f55ad971118 CommitTime: 2026-01-07 16:23:01 +0100 CommitAuthor: Juerg Hutter CommitSubject: Response force error calculation from external sampling (#4672) #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=1f1a7a2a435155a403097f4c56526f55ad971118 SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 408.9MB Step 1/47 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 02cb0e091e33: Waiting 9c3d619183d2: Waiting 2cd52cbb1ebe: Waiting 7f7602a82106: Waiting 5a2aba542b08: Waiting 6e8af4fd0a07: Waiting 6cb9b761b877: Waiting 15a17189b2df: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 2cd52cbb1ebe: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 6e8af4fd0a07: Verifying Checksum 6e8af4fd0a07: Download complete 02cb0e091e33: Download complete 9c3d619183d2: Verifying Checksum 9c3d619183d2: Download complete 7f7602a82106: Verifying Checksum 7f7602a82106: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/47 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/47 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/47 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/47 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/47 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/47 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> 852ff7058318 Step 8/47 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> 3cc2e0ec6ea3 Step 9/47 : RUN mkdir scripts ---> Using cache ---> 9264fff48632 Step 10/47 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./scripts/ ---> Using cache ---> 94eaf24213f0 Step 11/47 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> 7e5ef29eeea0 Step 12/47 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --gpu-ver=V100 --dry-run ---> Using cache ---> 4940ae3b8d72 Step 13/47 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> a858e4ab62d2 Step 14/47 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> 5c91d3ddd6af Step 15/47 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 32c866fb1eff Step 16/47 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> af4360843d07 Step 17/47 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> 607f13b74bd4 Step 18/47 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> bcfa76127bf3 Step 19/47 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> a6fd19eb59ef Step 20/47 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> 6c7fb00375da Step 21/47 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> 512fe0ec1dbe Step 22/47 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> d6f0752ae4f0 Step 23/47 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> Using cache ---> d99b15c81b72 Step 24/47 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Using cache ---> c1aef704603d Step 25/47 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> Using cache ---> 1268b070e7e6 Step 26/47 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Using cache ---> 9abe2366d295 Step 27/47 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> Using cache ---> 38c11788c2bb Step 28/47 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Using cache ---> 76cb6c895640 Step 29/47 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> Using cache ---> 55b2b9ffe6ea Step 30/47 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Using cache ---> c07e96d2703d Step 31/47 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> Using cache ---> 5cbdea565e52 Step 32/47 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Using cache ---> f2fefd69812a Step 33/47 : WORKDIR /opt/cp2k ---> Using cache ---> 2c1c209ad735 Step 34/47 : COPY ./src ./src ---> 2933849a3b1d Step 35/47 : COPY ./data ./data ---> 8d3a719d3522 Step 36/47 : COPY ./tests ./tests ---> e2aa2f5ab279 Step 37/47 : COPY ./tools/build_utils ./tools/build_utils ---> 7e4b9aad1113 Step 38/47 : COPY ./cmake ./cmake ---> ce5e8b1b6ea5 Step 39/47 : COPY ./CMakeLists.txt . ---> ade6bb7f4317 Step 40/47 : COPY ./tools/docker/scripts/build_cp2k.sh . ---> 68c41094668c Step 41/47 : RUN ./build_cp2k.sh toolchain_cuda_V100 psmp ---> Running in e6aeb93f5872 ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpi.so (found version "4.1") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpicxx.so (found version "4.1") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpifort.so (found version "4.1") -- Found MPI: TRUE (found version "4.1") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "4.1") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.30 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.30/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Checking for module 'libxsmm-shared' -- Found libxsmm-shared, version 1.17.0 -- Checking for module 'libxsmmf-shared' -- Found libxsmmf-shared, version 1.17.0 -- Checking for module 'libxsmmext-shared' -- Found libxsmmext-shared, version 1.17.0 -- Checking for module 'libxsmmnoblas-shared' -- Found libxsmmnoblas-shared, version 1.17.0 -- Found LibXSMM: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include -- Using LIBXSMM for Small Matrix Multiplication -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a -- CP2K_WITH_GPU is deprecated in favor of CMAKE_HIP_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 70 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "4.1") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Checking for module 'libxsmmf' -- Found libxsmmf, version 1.17.0 -- Checking for module 'libxsmmext' -- Found libxsmmext, version 1.17.0 ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "1.14.6") found components: C Fortran -- Found MPI: TRUE (found version "4.1") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Checking for module 'libint2' -- Found libint2, version 2.6.0 -- Found Libint2: /opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include;/opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include/libint2 -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.5.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "1.14.6") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'libvdwxc>=0.3.0' -- Looking for vdwxc_init_mpi -- Looking for vdwxc_init_mpi - not found -- Found LibVDWXC: /opt/cp2k-toolchain/install/libvdwxc-0.4.0/lib/libvdwxc.a (Required is at least version "0.3.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - vendor: OpenBLAS - include directories: /opt/cp2k-toolchain/install/openblas-0.3.30/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so - LAPACK - include directories: /opt/cp2k-toolchain/install/openblas-0.3.30/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so - MPI - include directories: /opt/cp2k-toolchain/install/mpich-4.3.2/include - libraries: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpi.so - MPI_F08: ON - ScaLAPACK - vendor: auto - include directories: - libraries: /opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a - Hardware Acceleration: - CUDA: - GPU architecture number: 70 - GPU profiling enabled: - GPU accelerated modules - PW module: ON - GRID module: ON - DBM module: ON - LibXC - version: 7.0.0 - include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - version: 1.14.6 - include directories: /opt/cp2k-toolchain/install/hdf5-1.14.6/include - libraries: hdf5-shared - FFTW3 - include directories: /opt/cp2k-toolchain/install/fftw-3.3.10/include - libraries: /opt/cp2k-toolchain/install/fftw-3.3.10/lib/libfftw3.a - LIBXSMM - include directories: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include - libraries: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmext.so;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so;/opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmf.so;:libxsmmext.a;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so - SpLA - include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - SIRIUS - include directories: - libraries: - COSMA - include directories: /opt/cp2k-toolchain/install/COSMA-2.7.0/include - libraries: MPI::MPI_CXX;costa::costa;$;$;cosma::BLAS::blas;cosma::scalapack::scalapack - Libint2 - include directories: /opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include;/opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include/libint2 - libraries: /opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/lib/libint2.a - ELPA - include directories: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/include/elpa_openmp-2024.05.001 - libraries: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a;:libopenblas.a -------------------------------------------------------------------- - - - List of dependencies not included in this build - - - -------------------------------------------------------------------- - DFTD4 - DeePMD - PEXSI - ACE (libpace) - TBLITE - Spglib - LibSMEAGOL - MiMiC - openPMD - DLA-Future - PLUMED - Libvori - LibTorch - TREXIO - GreenX After building CP2K the regtests can be run with the following command: ./tests/do_regtest.py /opt/cp2k/build/bin psmp -- Configuring done (12.8s) -- Generating done (0.5s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container e6aeb93f5872 ---> 3a571fb269ef Step 42/47 : COPY ./benchmarks ./benchmarks ---> 6974140576a0 Step 43/47 : COPY ./tools/regtesting ./tools/regtesting ---> 20bd22e40b44 Step 44/47 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> eec9c285011c Step 45/47 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in 51a0a3ebb3bf ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma xsmm dbcsr_acc sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 77 input files and 0 errors. ========== Running Performance Test ========== Plot: name="total_timings_6cpu_1gpu", title="Total Timings with 6 CPU Cores and 1 GPU", ylabel="time [s]" Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.028 0.028 106.467 106.468 qs_mol_dyn_low 1 2.0 0.004 0.004 106.022 106.024 qs_forces 11 3.9 0.002 0.002 105.972 105.972 qs_energies 11 4.9 0.001 0.001 94.444 94.444 scf_env_do_scf 11 5.9 0.001 0.001 73.261 73.261 velocity_verlet 10 3.0 0.002 0.002 67.993 68.011 scf_env_do_scf_inner_loop 108 6.5 0.006 0.009 62.418 62.418 rebuild_ks_matrix 119 8.3 0.001 0.001 27.472 27.473 qs_ks_build_kohn_sham_matrix 119 9.3 0.022 0.022 27.471 27.472 dbcsr_multiply_generic 2286 12.5 0.154 0.156 26.566 26.598 qs_ks_update_qs_env 119 7.6 0.001 0.001 25.195 25.196 qs_scf_new_mos 108 7.5 0.001 0.001 21.515 21.517 qs_scf_loop_do_ot 108 8.5 0.001 0.001 21.514 21.517 qs_rho_update_rho_low 119 7.7 0.001 0.001 20.567 20.577 calculate_rho_elec 119 8.7 0.893 0.897 20.566 20.576 ot_scf_mini 108 9.5 0.003 0.003 19.431 19.433 fft_wrap_pw1pw2 1201 11.6 0.025 0.026 16.399 16.448 sum_up_and_integrate 119 10.3 0.003 0.003 14.188 14.224 fft_wrap_pw1pw2_140 487 12.2 0.004 0.004 14.055 14.137 integrate_v_rspace 119 11.3 0.370 0.371 14.074 14.110 multiply_cannon 2286 13.5 0.349 0.356 13.366 13.418 multiply_cannon_loop 2286 14.5 0.271 0.273 12.226 12.234 make_m2s 4572 13.5 0.046 0.046 11.453 11.495 ot_mini 108 10.5 0.001 0.001 11.357 11.358 make_images 4572 14.5 1.257 1.286 11.270 11.311 init_scf_loop 11 6.9 0.000 0.000 10.762 10.762 density_rs2pw 119 9.7 0.009 0.009 10.647 10.749 init_scf_run 11 5.9 0.000 0.000 10.639 10.640 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.639 10.639 grid_collocate_task_list 119 9.7 8.987 9.051 8.987 9.051 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.437 8.438 pw_gpu_r3dc1d_3d_ps 606 13.1 2.462 2.479 8.395 8.412 build_core_hamiltonian_matrix_ 11 4.9 0.002 0.002 7.961 8.113 pw_gpu_c1dr3d_3d_ps 595 14.2 2.376 2.391 7.971 8.004 wfi_extrapolate 11 7.9 0.002 0.002 7.677 7.677 grid_integrate_task_list 119 12.3 7.410 7.446 7.410 7.446 prepare_preconditioner 11 7.9 0.000 0.000 7.346 7.359 make_preconditioner 11 8.9 0.000 0.000 7.346 7.359 qs_ot_get_derivative 108 11.5 0.002 0.002 6.789 6.792 hybrid_alltoall_any 4725 16.4 5.040 5.109 6.521 6.567 multiply_cannon_multrec 4572 15.5 2.219 2.224 6.477 6.477 make_images_data 4572 15.5 0.056 0.056 6.432 6.472 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.120 6.372 potential_pw2rs 119 12.3 0.039 0.040 6.293 6.293 parallel_gemm_fm_cosma 81 9.0 5.618 5.618 5.618 5.618 ot_diis_step 108 11.5 0.006 0.006 4.537 4.537 build_core_ppl_forces 11 5.9 4.062 4.170 4.062 4.170 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.071 4.119 build_core_hamiltonian_matrix 11 6.9 0.002 0.002 3.963 4.014 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.956 3.960 apply_single 119 13.6 0.001 0.001 3.956 3.960 dbcsr_mm_accdrv_process 9594 16.2 0.733 0.947 3.792 3.806 dbcsr_complete_redistribute 329 12.2 1.426 1.440 3.397 3.666 multiply_cannon_sync_h2d 4572 15.5 3.593 3.640 3.593 3.640 calculate_dm_sparse 119 9.5 0.001 0.001 3.468 3.474 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.410 3.411 qs_create_task_list 11 7.9 0.000 0.000 3.192 3.289 generate_qs_task_list 11 8.9 1.159 1.176 3.192 3.289 mp_alltoall_z22v 1201 15.6 3.191 3.287 3.191 3.287 qs_ot_get_p 119 10.4 0.001 0.001 3.190 3.194 mp_waitall_1 64495 16.9 2.970 3.130 2.970 3.130 pw_poisson_solve 119 10.3 0.003 0.004 2.992 2.995 cp_dbcsr_sm_fm_multiply 37 9.5 0.002 0.002 2.897 2.898 transfer_rs2pw 487 10.6 0.009 0.010 2.552 2.678 jit_kernel_multiply 12 15.7 2.456 2.657 2.456 2.657 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.607 2.613 calculate_first_density_matrix 1 7.0 0.000 0.000 2.506 2.506 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.381 2.383 pw_gpu_fg 606 14.1 2.333 2.363 2.333 2.363 dbcsr_special_finalize 6858 15.5 0.046 0.048 2.269 2.277 qs_ot_get_derivative_taylor 59 13.0 0.003 0.003 2.260 2.263 transfer_rs2pw_140 130 11.5 1.637 1.647 2.131 2.260 cp_fm_cholesky_invert 11 10.9 2.227 2.227 2.227 2.227 yz_to_x 606 14.1 0.529 0.530 2.144 2.190 x_to_yz 595 15.2 0.559 0.566 2.137 2.177 dbcsr_merge_single_wm 4572 16.5 0.160 0.165 2.137 2.147 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64", label="H2O-64", y=106.467, yerr=0.0 Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=75.35, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.987, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.41, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.618, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=5.04, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=4.062, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.028 0.028 101.269 101.271 qs_mol_dyn_low 1 2.0 0.004 0.004 100.807 100.809 qs_forces 11 3.9 0.002 0.002 100.759 100.761 qs_energies 11 4.9 0.001 0.001 89.056 89.057 scf_env_do_scf 11 5.9 0.001 0.001 67.740 67.741 velocity_verlet 10 3.0 0.002 0.002 65.618 65.636 scf_env_do_scf_inner_loop 96 6.5 0.005 0.008 56.783 56.784 rebuild_ks_matrix 107 8.3 0.001 0.001 26.646 26.663 qs_ks_build_kohn_sham_matrix 107 9.3 0.020 0.020 26.645 26.663 qs_ks_update_qs_env 107 7.6 0.001 0.001 24.053 24.068 dbcsr_multiply_generic 1966 12.4 0.134 0.135 23.784 23.894 qs_rho_update_rho_low 107 7.7 0.001 0.001 18.947 18.952 calculate_rho_elec 107 8.7 0.794 0.800 18.946 18.952 qs_scf_new_mos 96 7.5 0.001 0.001 18.884 18.893 qs_scf_loop_do_ot 96 8.5 0.001 0.001 18.883 18.892 ot_scf_mini 96 9.5 0.003 0.003 17.044 17.056 fft_wrap_pw1pw2 1081 11.6 0.023 0.023 14.947 14.996 sum_up_and_integrate 107 10.3 0.003 0.003 14.688 14.763 integrate_v_rspace 107 11.3 0.337 0.338 14.588 14.663 fft_wrap_pw1pw2_140 439 12.2 0.003 0.003 12.775 12.860 multiply_cannon 1966 13.4 0.300 0.301 12.154 12.230 multiply_cannon_loop 1966 14.4 0.233 0.237 11.219 11.283 init_scf_loop 11 6.9 0.000 0.000 10.876 10.876 init_scf_run 11 5.9 0.000 0.000 10.618 10.618 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.618 10.618 make_m2s 3932 13.4 0.040 0.041 10.031 10.034 ot_mini 96 10.5 0.001 0.001 10.017 10.031 make_images 3932 14.4 1.085 1.100 9.871 9.873 density_rs2pw 107 9.7 0.008 0.008 9.736 9.869 grid_integrate_task_list 107 12.3 8.538 8.616 8.538 8.616 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.590 8.590 grid_collocate_task_list 107 9.7 8.385 8.491 8.385 8.491 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.942 8.081 wfi_extrapolate 11 7.9 0.002 0.002 7.701 7.701 pw_gpu_r3dc1d_3d_ps 546 13.1 2.216 2.245 7.661 7.684 prepare_preconditioner 11 7.9 0.000 0.000 7.391 7.392 make_preconditioner 11 8.9 0.000 0.000 7.391 7.392 pw_gpu_c1dr3d_3d_ps 535 14.2 2.157 2.172 7.257 7.282 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.201 6.454 qs_ot_get_derivative 96 11.5 0.002 0.002 5.982 5.995 multiply_cannon_multrec 3932 15.4 1.904 1.965 5.954 5.992 hybrid_alltoall_any 4079 16.3 4.471 4.480 5.708 5.724 potential_pw2rs 107 12.3 0.035 0.036 5.712 5.713 parallel_gemm_fm_cosma 81 9.0 5.623 5.624 5.623 5.624 make_images_data 3932 15.4 0.048 0.049 5.603 5.619 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.199 4.320 build_core_ppl_forces 11 5.9 4.029 4.138 4.029 4.138 ot_diis_step 96 11.5 0.005 0.005 4.008 4.008 build_core_hamiltonian_matrix 11 6.9 0.002 0.002 3.944 3.983 dbcsr_complete_redistribute 317 12.2 1.419 1.437 3.495 3.755 dbcsr_mm_accdrv_process 8450 16.1 0.902 0.911 3.645 3.650 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.616 3.617 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.574 3.594 apply_single 107 13.6 0.001 0.001 3.574 3.594 multiply_cannon_sync_h2d 3932 15.4 3.384 3.544 3.384 3.544 qs_create_task_list 11 7.9 0.000 0.000 3.344 3.424 generate_qs_task_list 11 8.9 1.439 1.451 3.344 3.423 calculate_dm_sparse 107 9.5 0.001 0.001 3.189 3.192 mp_alltoall_z22v 1081 15.6 2.901 3.004 2.901 3.004 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.002 2.865 2.866 qs_ot_get_p 107 10.4 0.001 0.001 2.801 2.815 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.741 2.754 pw_poisson_solve 107 10.3 0.003 0.003 2.670 2.670 mp_waitall_1 55487 16.8 2.521 2.609 2.521 2.609 transfer_rs2pw 439 10.6 0.008 0.008 2.350 2.502 calculate_first_density_matrix 1 7.0 0.000 0.000 2.457 2.458 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.347 2.350 jit_kernel_multiply 10 15.6 2.204 2.215 2.204 2.215 cp_fm_cholesky_invert 11 10.9 2.195 2.195 2.195 2.195 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.182 2.194 pw_gpu_fg 546 14.1 2.152 2.189 2.152 2.189 transfer_rs2pw_140 118 11.5 1.483 1.498 1.968 2.127 build_core_ppl 11 7.9 2.022 2.058 2.022 2.058 dbcsr_special_finalize 5898 15.4 0.039 0.039 2.039 2.044 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64_nonortho", label="H2O-64_nonortho", y=101.269, yerr=0.0 Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=70.22300000000001, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.538, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.385, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.623, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.471, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=4.029, yerr=0.0 Running GW_PBE_4benzene.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.020 0.022 179.550 179.550 qs_energies 1 2.0 0.000 0.000 179.228 179.229 mp2_main 1 3.0 0.000 0.000 172.223 172.223 mp2_gpw_main 1 4.0 0.000 0.000 170.368 170.369 rpa_ri_compute_en 1 5.0 0.000 0.000 160.282 160.282 rpa_num_int 1 6.0 0.001 0.001 160.273 160.274 parallel_gemm_fm_cosma 105 8.4 73.562 73.568 73.562 73.568 dbt_total 2336 9.6 0.023 0.023 72.457 72.458 compute_mat_P_omega 1 7.0 0.002 0.002 71.972 71.974 compute_mat_P_omega_contract 10 8.0 5.353 5.397 71.278 71.293 dbt_contract 787 11.0 0.053 0.053 49.714 49.714 compute_W_cubic_GW 10 7.0 0.004 0.004 47.836 47.839 dbt_tas_total 1149 12.2 0.146 0.147 38.757 38.757 dbt_tas_multiply 807 12.1 0.003 0.003 38.056 38.057 dbt_tas_dbm 807 14.1 0.007 0.007 30.369 30.370 dbm_multiply 807 16.1 29.040 29.301 29.040 29.301 compute_mat_P_omega_calc_M_occ 250 9.0 5.390 5.439 25.299 25.299 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 24.861 24.861 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 24.529 24.531 dbt_copy 1107 10.7 0.076 0.076 23.655 24.001 dbt_tas_mm_1N 524 15.1 0.003 0.003 19.668 19.982 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.002 16.267 16.267 dbt_reshape 594 11.8 6.753 6.987 15.427 15.540 compute_QP_energies 1 7.0 0.000 0.000 13.210 13.210 compute_self_energy_cubic_gw 1 8.0 0.123 0.128 13.210 13.210 dbt_tas_reserve_blocks_index 3266 14.3 0.648 0.651 10.990 11.338 dbm_reserve_blocks 3634 15.3 10.682 11.036 10.682 11.036 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 10.076 10.076 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 9.696 9.696 dbt_reserve_blocks_index 2347 13.0 0.321 0.323 9.269 9.502 dbt_reserve_blocks_index_array 2289 12.1 0.013 0.014 9.055 9.312 dbt_crop 1042 12.0 6.750 6.891 9.019 9.212 dbt_tas_mm_2 251 15.0 0.003 0.003 8.330 8.330 contract_cubic_gw 21 9.0 0.000 0.000 6.491 6.491 scf_env_do_scf 1 3.0 0.000 0.000 6.463 6.463 scf_env_do_scf_inner_loop 17 4.0 0.001 0.001 6.463 6.463 mp_waitall_2 2656 15.9 6.239 6.296 6.239 6.296 dbt_communicate_buffer 594 12.8 0.013 0.014 5.728 5.782 get_2c_integrals 1 6.0 0.000 0.000 5.727 5.727 dbcsr_multiply_generic 30 8.1 0.003 0.003 5.214 5.253 multiply_cannon 30 9.1 0.013 0.014 5.019 5.056 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 5.001 5.001 multiply_cannon_loop 30 10.1 0.005 0.005 4.961 4.999 compute_mat_P_omega_copy_M_occ 250 9.0 0.002 0.002 4.889 4.899 dbt_tas_copy 511 11.5 2.628 2.704 4.469 4.655 multiply_cannon_multrec 60 11.1 0.141 0.144 4.319 4.366 dbcsr_mm_accdrv_process 328 12.3 0.479 0.914 4.014 4.063 jit_kernel_multiply 18 11.5 3.529 3.916 3.529 3.916 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="GW_PBE_4benzene", label="GW_PBE_4benzene", y=179.55, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_6cpu_1gpu", title="Timings of GW_PBE_4benzene with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="rest", label="rest", y=52.76300000000002, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=73.562, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=29.04, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=10.682, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=6.753, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_crop", label="dbt_crop", y=6.75, yerr=0.0 Running RI-HFX_H2O-32.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.022 0.023 195.449 195.451 qs_forces 1 2.0 0.000 0.000 194.979 194.980 rebuild_ks_matrix 7 6.6 0.000 0.000 190.500 190.501 qs_ks_build_kohn_sham_matrix 7 7.6 0.002 0.002 190.500 190.501 hfx_ks_matrix 7 8.6 0.000 0.000 186.599 186.603 dbt_total 849 11.0 0.010 0.010 139.703 139.704 hfx_ri_update_ks 7 9.6 0.000 0.000 108.543 108.543 hfx_ri_update_ks_Pmat 7 10.6 22.026 22.135 108.538 108.539 qs_energies 1 3.0 0.000 0.000 103.555 103.556 scf_env_do_scf 1 4.0 0.000 0.000 101.402 101.403 qs_ks_update_qs_env 8 6.0 0.000 0.000 99.123 99.123 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 91.384 91.385 dbt_contract 207 12.4 0.050 0.050 82.465 82.466 hfx_ri_update_forces 1 7.0 1.105 1.124 78.054 78.058 dbt_tas_total 369 13.4 0.079 0.080 68.669 68.670 dbt_tas_multiply 216 13.5 0.001 0.001 65.977 65.978 scf_env_do_scf_inner_loop 6 5.0 0.000 0.001 55.092 55.092 dbt_copy 423 11.8 0.048 0.048 52.928 53.301 dbt_tas_dbm 216 15.5 0.002 0.002 52.932 52.933 dbm_multiply 216 17.5 49.986 50.056 49.986 50.056 init_scf_loop 2 5.0 0.000 0.000 46.309 46.309 hfx_ri_forces_Pmat_3c 1 8.0 3.101 3.145 44.914 44.938 dbt_reshape 175 13.2 18.306 18.502 39.628 39.936 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 31.819 31.819 precalc_derivatives 1 8.0 1.899 1.904 27.131 27.131 dbt_tas_mm_2 91 16.5 0.001 0.001 22.415 22.415 mp_waitall_2 1022 16.5 18.614 18.673 18.614 18.673 dbt_tas_reserve_blocks_index 1323 15.4 1.597 1.603 17.837 18.137 dbm_reserve_blocks 1491 16.3 16.949 17.243 16.949 17.243 dbt_tas_mm_3T 77 17.1 0.001 0.001 17.044 17.139 dbt_crop 372 13.7 12.634 12.684 16.329 16.333 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 15.946 15.946 dbt_communicate_buffer 175 14.2 0.005 0.005 15.535 15.636 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 15.543 15.543 dbt_reserve_blocks_index 889 14.5 0.594 0.595 14.461 14.761 dbt_reserve_blocks_index_array 859 13.5 0.008 0.008 14.185 14.483 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 14.371 14.371 build_3c_derivatives 3 9.0 2.371 2.386 14.218 14.218 dbt_tas_mm_3N 37 15.4 0.000 0.000 11.172 11.319 dbt_tas_copy 248 12.5 4.324 4.337 8.018 8.026 mp_sync 2901 12.8 6.084 6.346 6.084 6.346 hfx_ri_pre_scf_Pmat_int 1 13.0 0.000 0.000 5.169 5.169 dbt_tas_replicate 168 15.1 2.231 2.233 4.616 4.638 hfx_ri_pre_scf_calc_tensors 1 14.0 0.004 0.004 4.444 4.445 hfx_ri_pre_scf_Pmat_copy_2 9 13.0 1.712 1.717 4.309 4.314 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-HFX_H2O-32", label="RI-HFX_H2O-32", y=195.449, yerr=0.0 Plot: name="RI-HFX_H2O-32_timings_6cpu_1gpu", title="Timings of RI-HFX_H2O-32 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="rest", label="rest", y=69.56800000000001, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=49.986, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=22.026, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=18.614, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=18.306, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=16.949, yerr=0.0 Running RI-MP2_ammonia.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.011 0.012 109.623 109.623 qs_energies 1 2.0 0.000 0.000 109.439 109.439 mp2_main 1 3.0 0.000 0.001 101.837 101.837 mp2_gpw_main 1 4.0 0.001 0.002 101.443 101.444 mp2_ri_gpw_compute_in 1 5.0 0.561 0.565 56.542 56.547 mp2_ri_gpw_compute_in_loop 1 6.0 0.014 0.014 48.040 48.040 mp2_ri_gpw_compute_en 1 5.0 0.102 0.103 44.834 44.837 mp2_ri_gpw_compute_en_RI_loop 1 6.0 13.069 13.071 42.170 42.171 dbcsr_multiply_generic 2666 8.0 0.174 0.175 23.845 23.973 ao_to_mo_and_store_B_mult_1 1328 7.0 0.015 0.016 22.525 22.655 mp2_eri_3c_integrate_gpw 1328 7.0 0.020 0.020 19.054 19.249 mp2_ri_gpw_compute_en_expansio 1040 7.0 0.791 0.794 17.674 17.715 local_gemm 1040 8.0 16.883 16.921 16.883 16.921 make_m2s 5332 9.0 0.056 0.056 13.362 13.421 make_images 5332 10.0 2.364 2.369 13.173 13.230 integrate_v_rspace 1338 8.0 1.113 1.116 10.636 10.742 multiply_cannon 2666 9.0 0.416 0.419 9.775 9.956 hybrid_alltoall_any 6683 11.6 8.859 8.918 9.124 9.184 make_images_data 5332 11.0 0.065 0.066 9.038 9.095 multiply_cannon_loop 2666 10.0 0.199 0.200 8.614 8.789 grid_integrate_task_list 1338 9.0 8.177 8.295 8.177 8.295 fft_wrap_pw1pw2 26668 10.4 0.141 0.141 8.080 8.234 get_2c_integrals 1 6.0 0.004 0.005 7.938 7.941 compute_2c_integrals 1 7.0 0.007 0.008 7.328 7.329 compute_2c_integrals_loop_lm 1 8.0 0.014 0.023 7.220 7.232 mp2_eri_2c_integrate_gpw 1 9.0 1.999 2.003 7.206 7.210 collocate_function 1328 8.0 4.993 5.062 7.077 7.163 scf_env_do_scf 1 3.0 0.000 0.000 6.712 6.713 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 6.712 6.713 ao_to_mo_and_store_B_E_Ex_1 1328 7.0 4.225 4.269 6.173 6.240 mp2_ri_gpw_compute_en_ener 1040 7.0 5.350 5.354 5.350 5.354 qs_scf_new_mos 10 5.0 0.000 0.000 5.164 5.172 mp2_ri_gpw_compute_en_comm 221 7.0 1.084 1.086 4.841 4.882 fft_wrap_pw1pw2_20 10647 11.4 0.022 0.022 4.648 4.810 multiply_cannon_multrec 2676 11.0 1.937 1.953 4.647 4.683 pw_gpu_r3dc1d_3d 13282 12.2 4.031 4.153 4.031 4.153 eigensolver 11 5.8 0.002 0.002 2.928 2.929 potential_pw2rs 2666 10.0 0.101 0.102 2.856 2.922 pw_gpu_c1dr3d_3d 13280 12.7 2.846 2.877 2.846 2.877 mp_sendrecv_dm3 442 8.0 2.629 2.667 2.629 2.667 collocate_single_gaussian 1328 10.0 0.093 0.093 2.452 2.520 fft_wrap_pw1pw2_10 15957 11.5 0.019 0.019 2.505 2.516 dbcsr_mm_accdrv_process 5392 12.0 0.254 0.257 2.452 2.459 mp2_eri_2c_integrate_gpw_pot_l 1328 10.0 0.004 0.004 2.360 2.436 copy_dbcsr_to_fm 1351 8.0 0.035 0.036 2.359 2.374 fill_local_i_aL 884 7.5 2.364 2.373 2.364 2.373 cp_fm_diag_elpa 11 6.8 0.000 0.000 2.310 2.311 cp_fm_diag_elpa_base 11 7.8 2.222 2.240 2.309 2.309 replicate_iaK_2intgroup 1 6.0 2.128 2.138 2.269 2.279 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-MP2_ammonia", label="RI-MP2_ammonia", y=109.623, yerr=0.0 Plot: name="RI-MP2_ammonia_timings_6cpu_1gpu", title="Timings of RI-MP2_ammonia with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="rest", label="rest", y=57.285000000000004, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="local_gemm", label="local_gemm", y=16.883, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=13.069, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=8.859, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.177, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_ener", label="mp2_ri_gpw_compute_en_ener", y=5.35, yerr=0.0 Running diag_cu144_broy.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/diag_cu144_broy_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.090 0.095 213.588 213.590 qs_energies 1 2.0 0.000 0.000 212.434 212.436 scf_env_do_scf 1 3.0 0.000 0.000 198.397 198.399 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 198.397 198.399 qs_ks_update_qs_env 15 5.0 0.000 0.000 97.856 97.883 rebuild_ks_matrix 15 6.0 0.000 0.000 97.649 97.676 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 97.649 97.676 qs_vxc_create 15 8.0 0.060 0.119 60.437 60.490 qs_scf_new_mos 15 5.0 0.001 0.001 57.538 57.623 fft_wrap_pw1pw2 1086 10.0 0.031 0.032 53.162 53.252 calculate_dispersion_nonloc 15 9.0 11.103 11.106 51.954 51.954 eigensolver 15 6.0 0.002 0.002 48.173 48.175 qs_rho_update_rho_low 16 5.0 0.000 0.000 40.986 40.986 calculate_rho_elec 16 6.0 0.182 0.183 40.986 40.986 sum_up_and_integrate 15 8.0 0.000 0.000 35.642 35.730 integrate_v_rspace 15 9.0 0.050 0.050 35.617 35.705 cp_fm_diag_elpa 15 7.0 0.000 0.000 30.340 30.344 cp_fm_diag_elpa_base 15 8.0 28.473 29.050 30.334 30.335 grid_collocate_task_list 16 7.0 29.069 29.103 29.069 29.103 grid_integrate_task_list 15 10.0 28.107 28.135 28.107 28.135 pw_gpu_c1dr3d_3d_ps 585 12.1 6.013 6.035 27.534 27.548 fft_wrap_pw1pw2_150 765 11.0 0.006 0.006 27.314 27.334 pw_gpu_r3dc1d_3d_ps 501 11.9 5.412 5.895 25.589 25.665 cp_fm_cholesky_restore 45 7.0 15.830 16.573 15.830 16.573 fft_wrap_pw1pw2_200 197 11.3 0.001 0.001 13.291 13.302 density_rs2pw 16 7.0 0.002 0.002 11.727 11.758 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 10.060 10.060 vdW_energy 15 10.0 9.525 9.537 9.525 9.537 pw_gpu_ffc 585 13.1 9.239 9.278 9.239 9.278 pw_gpu_cff 501 12.9 8.785 8.803 8.785 8.803 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 8.669 8.719 xc_vxc_pw_create 15 9.0 0.188 0.189 8.424 8.431 mp_alltoall_z22v 1086 14.0 7.214 7.752 7.214 7.752 potential_pw2rs 15 10.0 0.007 0.007 7.460 7.521 pw_gpu_sf 585 13.1 7.299 7.300 7.299 7.300 pw_gpu_fg 501 12.9 6.875 6.892 6.875 6.892 copy_dbcsr_to_fm 16 5.9 0.001 0.001 6.227 6.347 dbcsr_complete_redistribute 46 8.3 1.863 1.928 5.903 5.978 fft_wrap_pw1pw2_10 62 10.5 0.000 0.000 5.662 5.664 cp_fm_uplo_to_full 30 8.0 3.861 5.175 3.861 5.175 yz_to_x 501 12.9 0.992 1.000 4.459 4.983 x_to_yz 585 13.1 1.200 1.209 4.947 4.949 xc_pw_derive 90 11.0 0.001 0.002 4.920 4.938 build_core_ppnl 1 5.0 4.926 4.927 4.926 4.927 xc_rho_set_and_dset_create 15 10.0 0.140 0.143 4.903 4.917 gspace_mixing 14 5.0 0.126 0.126 4.297 4.298 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="diag_cu144_broy", label="diag_cu144_broy", y=213.588, yerr=0.0 Plot: name="diag_cu144_broy_timings_6cpu_1gpu", title="Timings of diag_cu144_broy with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="rest", label="rest", y=101.006, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=29.069, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=28.473, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=28.107, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=15.83, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="calculate_dispersion_nonloc", label="calculate_dispersion_nonloc", y=11.103, yerr=0.0 Running bench_dftb.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/bench_dftb_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.049 0.050 273.206 273.207 qs_energies 1 2.0 0.000 0.000 273.074 273.075 ls_scf 1 3.0 0.000 0.000 272.167 272.169 ls_scf_main 1 4.0 0.001 0.002 262.121 262.121 density_matrix_trs4 11 5.0 0.009 0.009 218.443 218.443 dbcsr_multiply_generic 185 6.1 0.324 0.326 177.562 177.647 multiply_cannon 185 7.1 2.372 2.504 123.652 123.665 multiply_cannon_loop 185 8.1 0.355 0.361 108.459 108.803 multiply_cannon_multrec 370 9.1 81.945 82.277 91.431 91.727 make_m2s 370 7.1 0.030 0.030 45.622 45.632 make_images 370 8.1 11.257 11.427 44.507 44.521 ls_scf_dm_to_ks 11 5.0 0.000 0.000 39.234 39.243 matrix_ls_to_qs 11 6.0 0.000 0.000 36.344 36.457 dbcsr_complete_redistribute 23 7.5 22.257 22.270 31.053 31.069 matrix_decluster 11 7.0 0.000 0.000 28.165 28.181 arnoldi_extremal 12 6.1 0.000 0.000 24.564 24.566 arnoldi_normal_ev 12 7.1 0.010 0.010 24.564 24.565 build_subspace 23 8.1 0.066 0.066 24.073 24.074 dbcsr_matrix_vector_mult 652 9.0 0.159 0.162 22.356 22.985 dbcsr_matrix_vector_mult_local 652 10.0 21.324 21.949 21.333 21.958 make_images_data 370 9.1 0.014 0.014 16.786 16.794 hybrid_alltoall_any 393 9.9 11.879 12.065 16.221 16.226 calculate_norms 740 9.1 16.154 16.195 16.154 16.195 dbcsr_finalize 559 7.6 0.177 0.180 14.764 14.950 dbcsr_merge_all 510 8.6 2.635 2.817 13.600 13.784 dbcsr_copy 761 7.5 1.872 1.934 10.074 10.232 setup_rec_index_2d 370 8.1 9.787 9.816 9.787 9.816 dbcsr_special_finalize 555 9.1 0.012 0.012 9.721 9.724 dbcsr_sort_indices 1283 10.0 8.941 8.945 8.941 8.945 dbcsr_add_d 280 6.0 0.001 0.001 8.719 8.925 dbcsr_add_anytype 280 7.0 3.842 3.864 8.718 8.924 dbcsr_dot 144 6.3 7.928 7.948 8.435 8.622 ls_scf_init_scf 1 4.0 0.000 0.000 8.495 8.497 dbcsr_copy_into_existing 11 8.0 8.177 8.274 8.178 8.275 ls_scf_init_matrix_S 1 5.0 0.000 0.000 8.047 8.049 dbcsr_mm_accdrv_process 14501 10.0 0.836 0.836 7.361 7.383 tree_to_linear_d 23 10.5 7.364 7.377 7.364 7.377 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.000 7.262 7.263 dbcsr_mm_accdrv_process_sort 14501 11.0 6.525 6.547 6.525 6.547 dbcsr_merge_single_wm 370 10.1 0.592 0.609 6.343 6.351 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="bench_dftb", label="bench_dftb", y=273.206, yerr=0.0 Plot: name="bench_dftb_timings_6cpu_1gpu", title="Timings of bench_dftb with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="rest", label="rest", y=119.64700000000002, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=81.945, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=22.257, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=21.324, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="calculate_norms", label="calculate_norms", y=16.154, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=11.879, yerr=0.0 Running dbcsr.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/dbcsr_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.004 0.005 49.434 49.434 lib_test 1 2.0 0.000 0.000 49.422 49.429 dbcsr_run_tests 3 3.0 0.000 0.001 49.421 49.428 test_multiplies_multiproc 3 4.0 0.001 0.001 38.296 38.423 dbcsr_multiply_generic 9 5.0 0.002 0.002 29.827 29.833 multiply_cannon 9 6.0 0.020 0.021 19.857 20.347 multiply_cannon_loop 9 7.0 0.003 0.003 18.375 18.730 multiply_cannon_multrec 18 8.0 9.828 10.145 17.055 17.386 dbcsr_make_random_matrix 9 4.0 7.486 7.521 10.976 11.103 dbcsr_finalize 27 5.7 0.001 0.001 7.642 7.827 dbcsr_merge_all 18 6.5 3.657 3.672 7.524 7.704 dbcsr_mm_accdrv_process 8199 9.0 1.452 1.562 6.939 6.952 dbcsr_redistribute 9 5.0 3.566 3.574 5.781 5.781 make_m2s 18 6.0 0.001 0.001 5.092 5.098 make_images 18 7.0 0.380 0.380 5.054 5.061 dbcsr_mm_accdrv_process_sort 8199 10.0 4.700 4.701 4.700 4.701 make_images_data 18 8.0 0.001 0.001 2.866 2.870 hybrid_alltoall_any 18 9.0 2.473 2.480 2.833 2.836 dbcsr_data_copy_aa2 18 7.5 1.803 1.962 1.803 1.962 tree_to_linear_d 9 7.0 1.926 1.932 1.926 1.932 mp_alltoall_d11v 27 6.0 1.929 1.932 1.929 1.932 dbcsr_data_release 507 7.7 1.475 1.486 1.475 1.486 dbcsr_data_new 354 7.4 1.121 1.336 1.121 1.336 dbcsr_checksum 6 5.0 1.011 1.019 1.019 1.020 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="dbcsr", label="dbcsr", y=49.434, yerr=0.0 Plot: name="dbcsr_timings_6cpu_1gpu", title="Timings of dbcsr with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="rest", label="rest", y=20.197, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=9.828, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=7.486, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_mm_accdrv_process_sort", label="dbcsr_mm_accdrv_process_sort", y=4.7, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.657, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_redistribute", label="dbcsr_redistribute", y=3.566, yerr=0.0 Running MQAE_single_node.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/MQAE_single_node_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.055 0.056 208.904 208.905 qs_mol_dyn_low 1 2.0 0.004 0.004 207.271 207.306 qs_forces 6 3.8 0.001 0.001 130.674 130.675 qs_energies 6 4.8 0.001 0.001 123.417 123.417 scf_env_do_scf 6 5.8 0.000 0.000 115.486 115.487 scf_env_do_scf_inner_loop 113 6.2 0.006 0.009 107.052 107.052 velocity_verlet 5 3.0 0.003 0.003 98.750 98.800 rebuild_ks_matrix 119 8.1 0.000 0.000 87.742 87.745 qs_ks_build_kohn_sham_matrix 119 9.1 0.021 0.022 87.742 87.745 qs_ks_update_qs_env 119 7.3 0.001 0.001 82.804 82.807 fft_wrap_pw1pw2 2059 12.4 0.048 0.050 68.476 68.533 fft_wrap_pw1pw2_150 1321 13.9 0.010 0.010 65.542 65.652 qs_vxc_create 119 10.1 0.002 0.002 56.290 56.291 xc_vxc_pw_create 119 11.1 1.605 1.609 56.288 56.289 qmmm_el_coupling 6 3.8 0.000 0.000 39.967 39.973 qmmm_elec_with_gaussian 6 4.8 0.022 0.022 39.961 39.967 xc_pw_derive 714 13.1 0.013 0.013 38.988 39.025 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 38.694 38.767 pw_gpu_c1dr3d_3d_ps 1095 14.8 11.090 11.100 36.961 36.978 qmmm_elec_gaussian_low_G 6 6.8 33.839 33.900 33.839 33.900 qmmm_forces 6 3.8 0.002 0.002 33.739 33.739 qmmm_forces_with_gaussian 6 4.8 0.024 0.025 33.125 33.374 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 31.748 31.997 pw_gpu_r3dc1d_3d_ps 964 14.0 9.836 9.888 31.454 31.496 xc_rho_set_and_dset_create 119 12.1 2.517 2.519 28.250 28.266 qmmm_forces_gaussian_low_G 6 6.8 26.581 26.813 26.581 26.813 xc_pw_divergence 119 12.1 0.007 0.008 25.980 25.986 qs_rho_update_rho_low 119 7.3 0.001 0.001 22.956 23.030 calculate_rho_elec 119 8.3 1.110 1.110 22.956 23.029 density_rs2pw 119 9.3 0.009 0.010 16.785 16.944 sum_up_and_integrate 119 10.1 0.003 0.003 14.409 14.449 integrate_v_rspace 119 11.1 0.023 0.024 14.187 14.228 dbcsr_multiply_generic 2598 12.3 0.102 0.104 14.057 14.194 mp_alltoall_z22v 2059 16.4 13.202 13.342 13.202 13.342 multiply_cannon 2598 13.3 0.225 0.225 12.415 12.424 multiply_cannon_loop 2598 14.3 0.259 0.262 11.924 11.935 potential_pw2rs 119 12.1 0.037 0.037 10.193 10.194 x_to_yz 1095 15.8 2.653 2.660 9.885 9.966 multiply_cannon_multrec 5196 15.3 4.070 4.140 9.770 9.852 qs_ks_ddapc 119 10.1 0.003 0.003 9.250 9.285 pw_gpu_sf 1095 15.8 9.153 9.175 9.153 9.175 init_scf_loop 6 6.8 0.000 0.000 8.431 8.432 pw_gpu_fg 964 15.0 8.211 8.290 8.211 8.290 yz_to_x 964 15.0 2.047 2.051 8.018 8.065 qs_scf_new_mos 113 7.2 0.001 0.001 7.831 7.836 qs_scf_loop_do_ot 113 8.2 0.001 0.001 7.830 7.835 ot_scf_mini 113 9.2 0.002 0.002 7.540 7.541 pw_gpu_ffc 1095 15.8 6.812 6.879 6.812 6.879 init_scf_run 6 5.8 0.000 0.000 5.685 5.685 scf_env_initial_rho_setup 6 6.8 0.000 0.000 5.684 5.684 dbcsr_mm_accdrv_process 13992 16.0 0.559 0.564 5.630 5.641 xc_functional_eval 238 13.1 0.003 0.004 5.377 5.379 pw_gpu_cff 964 15.0 5.318 5.358 5.318 5.358 pw_poisson_solve 125 9.9 0.004 0.004 5.218 5.219 qmmm_forces_gaussian_low_R 6 6.8 0.000 0.000 5.167 5.185 qmmm_forces_with_gaussian_LG 6 7.8 5.167 5.185 5.167 5.185 grid_collocate_task_list 119 9.3 5.033 5.108 5.033 5.108 ot_mini 113 10.2 0.001 0.001 5.044 5.047 jit_kernel_multiply 24 14.7 5.030 5.036 5.030 5.036 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 4.969 4.969 qmmm_elec_gaussian_low_R 6 6.8 0.000 0.000 4.855 4.866 qmmm_elec_with_gaussian_LG 6 7.8 4.855 4.866 4.855 4.866 pw_derive 1089 13.4 4.735 4.751 4.735 4.751 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="MQAE_single_node", label="MQAE_single_node", y=208.904, yerr=0.0 Plot: name="MQAE_single_node_timings_6cpu_1gpu", title="Timings of MQAE_single_node with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="rest", label="rest", y=114.356, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=33.839, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=26.581, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=13.202, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=11.09, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=9.836, yerr=0.0 Summary: Performance test took 24 minutes. Status: OK ---> Removed intermediate container 51a0a3ebb3bf ---> 256f22f5439e Step 46/47 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in b6ebe0927a69 ---> Removed intermediate container b6ebe0927a69 ---> 2fc85da36411 Step 47/47 : ENTRYPOINT [] ---> Running in cc2c8efa7ac7 ---> Removed intermediate container cc2c8efa7ac7 ---> b87e960d238c [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built b87e960d238c Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2026-01-08 06:50:33+00:00