StartDate: 2026-04-09 06:42:40+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: 59e0b9b155a57cbb847f970b762508bf883f7291 CommitTime: 2026-04-08 11:54:50 +0200 CommitAuthor: BelizSertcan CommitSubject: Bugfix: Reset state_spin and state_spin2 (#5047) #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=59e0b9b155a57cbb847f970b762508bf883f7291 SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 413MB Step 1/46 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 02cb0e091e33: Waiting 9c3d619183d2: Waiting 2cd52cbb1ebe: Waiting 6e8af4fd0a07: Waiting 7f7602a82106: Waiting 5a2aba542b08: Waiting 6cb9b761b877: Waiting 15a17189b2df: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 2cd52cbb1ebe: Verifying Checksum 2cd52cbb1ebe: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 6e8af4fd0a07: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02cb0e091e33: Download complete 9c3d619183d2: Download complete 7f7602a82106: Verifying Checksum 7f7602a82106: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/46 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/46 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/46 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/46 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/46 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/46 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> 1ff2ec46e723 Step 8/46 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> bf4865207130 Step 9/46 : RUN mkdir scripts ---> Using cache ---> 95733bd3ea48 Step 10/46 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./tools/toolchain/scripts/generate_cmake_options.sh ./scripts/ ---> Using cache ---> 436ecf42e4e6 Step 11/46 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> e086cdcf92a6 Step 12/46 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --gpu-ver=V100 --dry-run --list-cmake-options=no ---> Using cache ---> 2e4e5326a0e2 Step 13/46 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> 292ef86ef5e2 Step 14/46 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> 35a6c0774e4a Step 15/46 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 17d2cb9b6367 Step 16/46 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> a726f6399dec Step 17/46 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> 08b3176f5c4b Step 18/46 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> 84df80588d0d Step 19/46 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> 0985b6504af4 Step 20/46 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> 18d84d5810f4 Step 21/46 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> 92a7ee20695a Step 22/46 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> 4eea4f45c46c Step 23/46 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> Using cache ---> 91e667907bdc Step 24/46 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Using cache ---> 5e3045527394 Step 25/46 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> Using cache ---> 0af9d2e9be60 Step 26/46 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Using cache ---> d1c746d88f44 Step 27/46 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> Using cache ---> c1c41ca33047 Step 28/46 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Using cache ---> ec39789c586d Step 29/46 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> Using cache ---> bff0608e0a58 Step 30/46 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Using cache ---> 92adcd501b84 Step 31/46 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> Using cache ---> 95776880c549 Step 32/46 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Using cache ---> 92a8fe130694 Step 33/46 : WORKDIR /opt/cp2k ---> Using cache ---> 72e300efeb1a Step 34/46 : COPY ./src ./src ---> 9f6186201d83 Step 35/46 : COPY ./data ./data ---> b11132d16b96 Step 36/46 : COPY ./tools/build_utils ./tools/build_utils ---> e49f8546c693 Step 37/46 : COPY ./cmake ./cmake ---> 2fdaafc855ff Step 38/46 : COPY ./CMakeLists.txt . ---> d9afd18586f5 Step 39/46 : COPY ./tools/docker/scripts/build_cp2k.sh . ---> ea129c30f574 Step 40/46 : RUN ./build_cp2k.sh toolchain_cuda_V100 psmp ---> Running in da4a6e7215d8 ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpi.so (found version "4.1") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpicxx.so (found version "4.1") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpifort.so (found version "4.1") -- Found MPI: TRUE (found version "4.1") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "4.1") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.32 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.32/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so -- Checking for module 'libxsmm-shared' -- Found libxsmm-shared, version 1.17.0 -- Checking for module 'libxsmmf-shared' -- Found libxsmmf-shared, version 1.17.0 -- Checking for module 'libxsmmext-shared' -- Found libxsmmext-shared, version 1.17.0 -- Checking for module 'libxsmmnoblas-shared' -- Found libxsmmnoblas-shared, version 1.17.0 -- Found LibXSMM: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include -- Using LIBXSMM for Small Matrix Multiplication -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a -- CP2K_WITH_GPU is deprecated in favor of CMAKE_HIP_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 70 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "4.1") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Checking for module 'libxsmmf' -- Found libxsmmf, version 1.17.0 -- Checking for module 'libxsmmext' -- Found libxsmmext, version 1.17.0 ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "2.1.1") found components: C Fortran -- Found MPI: TRUE (found version "4.1") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Checking for module 'libint2' -- Package 'libint2', required by 'virtual:world', not found -- Found Libint2: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.7.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "2.1.1") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'libvdwxc>=0.3.0' -- Looking for vdwxc_init_mpi -- Looking for vdwxc_init_mpi - not found -- Found LibVDWXC: /opt/cp2k-toolchain/install/libvdwxc-0.4.0/lib/libvdwxc.a (Required is at least version "0.3.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - vendor: OpenBLAS - include directories: /opt/cp2k-toolchain/install/openblas-0.3.32/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so - LAPACK - include directories: /opt/cp2k-toolchain/install/openblas-0.3.32/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.32/lib/libopenblas.so - MPI - include directories: /opt/cp2k-toolchain/install/mpich-4.3.2/include - libraries: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpi.so - MPI_F08: ON - ScaLAPACK - vendor: auto - include directories: - libraries: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a - Hardware Acceleration: - CUDA: - GPU architecture number: 70 - GPU profiling enabled: - GPU accelerated modules - ELPA module: ON - GRID module: ON - DBM module: ON - PW module: ON - LibXC - version: 7.0.0 - include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - version: 2.1.1 - include directories: /opt/cp2k-toolchain/install/hdf5-2.1.1/include - libraries: hdf5-shared - FFTW3 - include directories: /opt/cp2k-toolchain/install/fftw-3.3.10/include - libraries: /opt/cp2k-toolchain/install/fftw-3.3.10/lib/libfftw3.a - LIBXSMM - include directories: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include - libraries: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmext.so;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so;/opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmf.so;:libxsmmext.a;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so - SpLA - include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - SIRIUS - include directories: - libraries: - COSMA - include directories: /opt/cp2k-toolchain/install/COSMA-2.8.2/include - libraries: MPI::MPI_CXX;costa::costa;$;$;$<$:cosma::BLAS::blas>;$;$<$:Tiled-MM::Tiled-MM>;$<$:Tiled-MM::Tiled-MM>;$<$:semiprof::semiprof>;$<$:cosma::scalapack::scalapack> - Libint2 - include directories: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include - libraries: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/lib/libint2.a - ELPA - include directories: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/include/elpa_openmp-2026.02.001 - libraries: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -------------------------------------------------------------------- - - - List of dependencies not included in this build - - - -------------------------------------------------------------------- - DFTD4 - DeePMD - PEXSI - ACE (libpace) - TBLITE - Spglib - LibSMEAGOL - MiMiC - openPMD - DLA-Future - PLUMED - Libvori - LibTorch - TREXIO - GreenX After building and installing CP2K the regtests can be run with the following command: /opt/cp2k/tests/do_regtest.py /opt/cp2k/bin psmp -- Configuring done (15.6s) -- Generating done (0.5s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container da4a6e7215d8 ---> f97a0988ef05 Step 41/46 : COPY ./benchmarks ./benchmarks ---> ebfd4531d91e Step 42/46 : COPY ./tools/regtesting ./tools/regtesting ---> aabebd4b886a Step 43/46 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> d4fb513a7490 Step 44/46 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in 1d20be4b78cf ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma xsmm dbcsr_acc sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 82 input files and 0 errors. ========== Running Performance Test ========== Plot: name="total_timings_6cpu_1gpu", title="Total Timings with 6 CPU Cores and 1 GPU", ylabel="time [s]" Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.028 0.029 104.181 104.182 qs_mol_dyn_low 1 2.0 0.004 0.004 103.757 103.760 qs_forces 11 3.9 0.002 0.002 103.705 103.705 qs_energies 11 4.9 0.001 0.001 92.028 92.028 scf_env_do_scf 11 5.9 0.001 0.001 70.715 70.716 velocity_verlet 10 3.0 0.002 0.002 66.480 66.498 scf_env_do_scf_inner_loop 108 6.5 0.006 0.009 60.236 60.236 rebuild_ks_matrix 119 8.3 0.001 0.001 26.533 26.534 qs_ks_build_kohn_sham_matrix 119 9.3 0.020 0.021 26.533 26.533 dbcsr_multiply_generic 2286 12.5 0.146 0.147 25.250 25.291 qs_ks_update_qs_env 119 7.6 0.001 0.002 24.363 24.363 qs_scf_new_mos 108 7.5 0.001 0.001 20.501 20.510 qs_scf_loop_do_ot 108 8.5 0.001 0.001 20.500 20.509 qs_rho_update_rho_low 119 7.7 0.001 0.001 20.150 20.161 calculate_rho_elec 119 8.7 0.902 0.907 20.149 20.160 ot_scf_mini 108 9.5 0.003 0.003 18.538 18.539 fft_wrap_pw1pw2 1201 11.6 0.024 0.025 15.786 15.838 sum_up_and_integrate 119 10.3 0.003 0.003 13.885 13.935 integrate_v_rspace 119 11.3 0.365 0.367 13.789 13.839 fft_wrap_pw1pw2_140 487 12.2 0.003 0.003 13.559 13.633 multiply_cannon 2286 13.5 0.338 0.339 12.750 12.751 multiply_cannon_loop 2286 14.5 0.265 0.266 11.651 11.653 make_m2s 4572 13.5 0.044 0.044 10.842 10.844 ot_mini 108 10.5 0.001 0.001 10.790 10.792 init_scf_run 11 5.9 0.000 0.000 10.695 10.696 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.695 10.695 make_images 4572 14.5 1.222 1.230 10.663 10.664 init_scf_loop 11 6.9 0.000 0.000 10.401 10.401 density_rs2pw 119 9.7 0.008 0.008 10.259 10.352 grid_collocate_task_list 119 9.7 8.950 9.004 8.950 9.004 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.494 8.495 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 8.211 8.366 pw_gpu_r3dc1d_3d_ps 606 13.1 2.357 2.384 8.091 8.110 pw_gpu_c1dr3d_3d_ps 595 14.2 2.296 2.323 7.665 7.699 wfi_extrapolate 11 7.9 0.002 0.002 7.631 7.631 grid_integrate_task_list 119 12.3 7.389 7.440 7.389 7.440 prepare_preconditioner 11 7.9 0.000 0.000 7.126 7.131 make_preconditioner 11 8.9 0.000 0.000 7.126 7.131 multiply_cannon_multrec 4572 15.5 2.203 2.207 6.529 6.554 qs_ot_get_derivative 108 11.5 0.002 0.002 6.471 6.472 hybrid_alltoall_any 4725 16.4 4.857 4.859 6.211 6.241 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.968 6.208 make_images_data 4572 15.5 0.056 0.056 6.111 6.118 potential_pw2rs 119 12.3 0.037 0.037 6.034 6.036 parallel_gemm_fm_cosma 81 9.0 5.700 5.700 5.700 5.700 build_core_ppl_forces 11 5.9 4.183 4.308 4.183 4.308 ot_diis_step 108 11.5 0.005 0.005 4.293 4.293 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.091 4.103 build_core_hamiltonian_matrix 11 6.9 0.001 0.002 4.014 4.076 dbcsr_mm_accdrv_process 9594 16.2 0.947 0.952 3.916 3.935 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.722 3.723 apply_single 119 13.6 0.001 0.001 3.722 3.723 dbcsr_complete_redistribute 329 12.2 1.434 1.456 3.342 3.594 calculate_dm_sparse 119 9.5 0.001 0.001 3.355 3.361 qs_create_task_list 11 7.9 0.000 0.000 3.238 3.317 generate_qs_task_list 11 8.9 1.194 1.203 3.238 3.317 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.306 3.306 qs_ot_get_p 119 10.4 0.001 0.001 3.254 3.256 mp_alltoall_z22v 1201 15.6 3.119 3.206 3.119 3.206 multiply_cannon_sync_h2d 4572 15.5 3.061 3.063 3.061 3.063 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.923 2.924 mp_waitall_1 64495 16.9 2.759 2.794 2.759 2.794 pw_poisson_solve 119 10.3 0.003 0.003 2.712 2.718 transfer_rs2pw 487 10.6 0.008 0.008 2.482 2.615 calculate_first_density_matrix 1 7.0 0.000 0.000 2.606 2.606 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.565 2.573 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.404 2.404 jit_kernel_multiply 10 15.6 2.344 2.357 2.344 2.357 pw_gpu_fg 606 14.1 2.267 2.296 2.267 2.296 transfer_rs2pw_140 130 11.5 1.572 1.583 2.073 2.213 qs_ot_get_derivative_taylor 59 13.0 0.003 0.003 2.163 2.163 dbcsr_special_finalize 6858 15.5 0.043 0.044 2.114 2.119 qs_ot_p2m_diag 50 11.0 0.089 0.092 2.113 2.116 build_core_ppl 11 7.9 2.063 2.114 2.063 2.114 cp_fm_cholesky_invert 11 10.9 2.110 2.110 2.110 2.110 yz_to_x 606 14.1 0.466 0.468 2.045 2.086 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64", label="H2O-64", y=104.181, yerr=0.0 Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=73.102, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.95, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.389, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.7, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.857, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=4.183, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.026 0.026 99.267 99.267 qs_mol_dyn_low 1 2.0 0.005 0.005 98.816 98.819 qs_forces 11 3.9 0.002 0.002 98.767 98.767 qs_energies 11 4.9 0.001 0.001 87.032 87.032 scf_env_do_scf 11 5.9 0.001 0.001 65.723 65.723 velocity_verlet 10 3.0 0.002 0.002 64.277 64.295 scf_env_do_scf_inner_loop 96 6.5 0.005 0.008 54.987 54.987 rebuild_ks_matrix 107 8.3 0.001 0.001 25.790 25.794 qs_ks_build_kohn_sham_matrix 107 9.3 0.018 0.018 25.789 25.793 qs_ks_update_qs_env 107 7.6 0.001 0.001 23.302 23.305 dbcsr_multiply_generic 1966 12.4 0.128 0.128 22.798 22.879 qs_rho_update_rho_low 107 7.7 0.001 0.001 18.472 18.490 calculate_rho_elec 107 8.7 0.810 0.814 18.471 18.489 qs_scf_new_mos 96 7.5 0.001 0.001 18.152 18.155 qs_scf_loop_do_ot 96 8.5 0.001 0.001 18.151 18.154 ot_scf_mini 96 9.5 0.003 0.003 16.438 16.440 sum_up_and_integrate 107 10.3 0.002 0.002 14.398 14.482 fft_wrap_pw1pw2 1081 11.6 0.022 0.022 14.360 14.398 integrate_v_rspace 107 11.3 0.331 0.332 14.312 14.396 fft_wrap_pw1pw2_140 439 12.2 0.003 0.003 12.321 12.391 multiply_cannon 1966 13.4 0.299 0.306 11.597 11.605 multiply_cannon_loop 1966 14.4 0.232 0.232 10.666 10.673 init_scf_loop 11 6.9 0.000 0.000 10.658 10.658 init_scf_run 11 5.9 0.000 0.000 10.581 10.581 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.580 10.580 make_m2s 3932 13.4 0.038 0.038 9.685 9.687 ot_mini 96 10.5 0.001 0.001 9.629 9.632 make_images 3932 14.4 1.093 1.102 9.527 9.529 density_rs2pw 107 9.7 0.007 0.007 9.327 9.488 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.646 8.646 grid_integrate_task_list 107 12.3 8.502 8.585 8.502 8.585 grid_collocate_task_list 107 9.7 8.299 8.420 8.299 8.420 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 8.104 8.220 wfi_extrapolate 11 7.9 0.001 0.001 7.591 7.591 pw_gpu_r3dc1d_3d_ps 546 13.1 2.119 2.147 7.396 7.415 prepare_preconditioner 11 7.9 0.000 0.000 7.294 7.304 make_preconditioner 11 8.9 0.000 0.000 7.294 7.304 pw_gpu_c1dr3d_3d_ps 535 14.2 2.056 2.080 6.936 6.956 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.127 6.367 multiply_cannon_multrec 3932 15.4 1.894 1.903 6.051 6.073 qs_ot_get_derivative 96 11.5 0.001 0.001 5.826 5.829 parallel_gemm_fm_cosma 81 9.0 5.649 5.650 5.649 5.650 hybrid_alltoall_any 4079 16.3 4.350 4.360 5.567 5.580 potential_pw2rs 107 12.3 0.033 0.034 5.479 5.480 make_images_data 3932 15.4 0.048 0.048 5.452 5.461 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.220 4.403 build_core_ppl_forces 11 5.9 4.132 4.233 4.132 4.233 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.971 4.030 dbcsr_mm_accdrv_process 8450 16.1 0.689 0.904 3.800 3.818 ot_diis_step 96 11.5 0.005 0.005 3.781 3.781 dbcsr_complete_redistribute 317 12.2 1.424 1.426 3.476 3.739 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.511 3.511 qs_create_task_list 11 7.9 0.000 0.000 3.375 3.500 generate_qs_task_list 11 8.9 1.463 1.479 3.375 3.500 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.342 3.343 apply_single 107 13.6 0.001 0.001 3.341 3.343 calculate_dm_sparse 107 9.5 0.001 0.001 3.090 3.093 mp_alltoall_z22v 1081 15.6 2.821 2.928 2.821 2.928 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.002 2.826 2.827 qs_ot_get_p 107 10.4 0.001 0.001 2.817 2.817 multiply_cannon_sync_h2d 3932 15.4 2.779 2.794 2.779 2.794 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.725 2.753 jit_kernel_multiply 12 15.7 2.548 2.747 2.548 2.747 calculate_first_density_matrix 1 7.0 0.000 0.000 2.534 2.535 mp_waitall_1 55487 16.8 2.441 2.503 2.441 2.503 transfer_rs2pw 439 10.6 0.007 0.008 2.290 2.487 pw_poisson_solve 107 10.3 0.002 0.003 2.443 2.444 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.314 2.316 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.174 2.199 pw_gpu_fg 546 14.1 2.133 2.165 2.133 2.165 transfer_rs2pw_140 118 11.5 1.418 1.432 1.925 2.130 build_core_ppl 11 7.9 2.051 2.103 2.051 2.103 cp_fm_cholesky_invert 11 10.9 2.085 2.085 2.085 2.085 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64_nonortho", label="H2O-64_nonortho", y=99.267, yerr=0.0 Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=68.335, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.502, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.299, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.649, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.35, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=4.132, yerr=0.0 Running GW_PBE_4benzene.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.019 0.020 174.368 174.369 qs_energies 1 2.0 0.000 0.000 174.040 174.043 mp2_main 1 3.0 0.000 0.000 167.165 167.168 mp2_gpw_main 1 4.0 0.000 0.000 165.315 165.318 rpa_ri_compute_en 1 5.0 0.000 0.000 155.123 155.125 rpa_num_int 1 6.0 0.001 0.001 155.114 155.117 parallel_gemm_fm_cosma 105 8.4 72.208 72.215 72.208 72.215 compute_mat_P_omega 1 7.0 0.001 0.002 69.163 69.163 dbt_total 2336 9.6 0.021 0.021 68.744 68.744 compute_mat_P_omega_contract 10 8.0 5.342 5.348 68.457 68.472 compute_W_cubic_GW 10 7.0 0.004 0.004 46.946 46.948 dbt_contract 787 11.0 0.048 0.049 46.176 46.176 dbt_tas_total 1149 12.2 0.132 0.133 36.031 36.031 dbt_tas_multiply 807 12.1 0.003 0.003 35.345 35.345 dbt_tas_dbm 807 14.1 0.005 0.005 27.787 27.787 dbm_multiply 807 16.1 26.538 26.752 26.538 26.752 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 24.323 24.323 compute_mat_P_omega_calc_M_occ 250 9.0 5.339 5.348 24.290 24.290 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 23.999 24.000 dbt_copy 1107 10.7 0.073 0.073 22.810 22.952 dbt_tas_mm_1N 524 15.1 0.003 0.003 17.947 18.234 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 15.343 15.343 dbt_reshape 594 11.8 6.587 6.751 14.636 14.711 compute_QP_energies 1 7.0 0.000 0.000 12.348 12.348 compute_self_energy_cubic_gw 1 8.0 0.122 0.123 12.347 12.347 dbt_tas_reserve_blocks_index 3266 14.3 0.679 0.689 10.840 10.883 dbm_reserve_blocks 3634 15.3 10.501 10.554 10.501 10.554 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 10.181 10.181 dbt_crop 1042 12.0 6.660 6.721 8.970 9.066 dbt_reserve_blocks_index 2347 13.0 0.337 0.339 9.015 9.019 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 8.937 8.937 dbt_reserve_blocks_index_array 2289 12.1 0.012 0.012 8.815 8.840 dbt_tas_mm_2 251 15.0 0.002 0.003 7.647 7.647 scf_env_do_scf 1 3.0 0.000 0.000 6.312 6.312 scf_env_do_scf_inner_loop 17 4.0 0.001 0.001 6.311 6.311 mp_waitall_2 2656 15.9 5.909 5.909 5.909 5.909 get_2c_integrals 1 6.0 0.000 0.000 5.786 5.786 contract_cubic_gw 21 9.0 0.000 0.000 5.629 5.629 dbt_communicate_buffer 594 12.8 0.011 0.012 5.405 5.408 dbcsr_multiply_generic 30 8.1 0.002 0.003 5.130 5.164 multiply_cannon 30 9.1 0.007 0.010 4.944 4.975 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 4.925 4.932 multiply_cannon_loop 30 10.1 0.004 0.005 4.888 4.920 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.002 4.828 4.860 dbt_tas_copy 511 11.5 2.561 2.602 4.519 4.595 multiply_cannon_multrec 60 11.1 0.151 0.158 4.315 4.315 dbcsr_mm_accdrv_process 328 12.3 0.042 0.042 3.999 3.999 jit_kernel_multiply 18 11.7 3.950 3.951 3.950 3.951 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="GW_PBE_4benzene", label="GW_PBE_4benzene", y=174.368, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_6cpu_1gpu", title="Timings of GW_PBE_4benzene with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="rest", label="rest", y=51.873999999999995, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=72.208, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=26.538, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=10.501, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_crop", label="dbt_crop", y=6.66, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=6.587, yerr=0.0 Running RI-HFX_H2O-32.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.022 0.024 196.001 196.001 qs_forces 1 2.0 0.000 0.000 195.499 195.500 rebuild_ks_matrix 7 6.6 0.000 0.000 190.901 190.901 qs_ks_build_kohn_sham_matrix 7 7.6 0.002 0.002 190.901 190.901 hfx_ks_matrix 7 8.6 0.000 0.000 187.036 187.036 dbt_total 849 11.0 0.009 0.009 139.163 139.163 hfx_ri_update_ks 7 9.6 0.000 0.000 107.123 107.123 hfx_ri_update_ks_Pmat 7 10.6 22.236 22.260 107.118 107.118 qs_energies 1 3.0 0.000 0.000 102.431 102.431 scf_env_do_scf 1 4.0 0.000 0.000 100.180 100.180 qs_ks_update_qs_env 8 6.0 0.000 0.000 97.882 97.883 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 93.026 93.027 dbt_contract 207 12.4 0.048 0.048 81.658 81.658 hfx_ri_update_forces 1 7.0 1.110 1.116 79.911 79.911 dbt_tas_total 369 13.4 0.074 0.074 67.746 67.746 dbt_tas_multiply 216 13.5 0.001 0.001 65.002 65.002 scf_env_do_scf_inner_loop 6 5.0 0.000 0.001 53.810 53.810 dbt_copy 423 11.8 0.046 0.046 53.259 53.480 dbt_tas_dbm 216 15.5 0.002 0.002 51.516 51.516 dbm_multiply 216 17.5 48.753 48.755 48.753 48.755 hfx_ri_forces_Pmat_3c 1 8.0 3.255 3.256 47.170 47.174 init_scf_loop 2 5.0 0.000 0.000 46.369 46.369 dbt_reshape 175 13.2 18.387 18.472 39.845 40.044 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 30.812 30.812 precalc_derivatives 1 8.0 1.885 1.906 26.726 26.726 dbt_tas_mm_2 91 16.5 0.001 0.001 21.446 21.446 mp_waitall_2 1022 16.5 18.751 18.820 18.751 18.820 dbt_tas_reserve_blocks_index 1323 15.4 1.789 1.801 18.531 18.608 dbm_reserve_blocks 1491 16.3 17.481 17.547 17.481 17.547 dbt_crop 372 13.7 12.803 12.921 16.643 16.738 dbt_tas_mm_3T 77 17.1 0.001 0.001 16.357 16.581 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 16.410 16.410 dbt_communicate_buffer 175 14.2 0.004 0.004 15.560 15.608 dbt_reserve_blocks_index 889 14.5 0.645 0.648 15.079 15.196 dbt_reserve_blocks_index_array 859 13.5 0.007 0.007 14.797 14.908 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 14.592 14.592 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 14.221 14.221 build_3c_derivatives 3 9.0 2.331 2.350 14.092 14.096 dbt_tas_mm_3N 37 15.4 0.000 0.000 11.441 11.475 dbt_tas_copy 248 12.5 4.214 4.240 8.018 8.040 mp_sync 2901 12.8 5.756 6.085 5.756 6.085 hfx_ri_pre_scf_Pmat_int 1 13.0 0.000 0.000 5.388 5.388 dbt_tas_replicate 168 15.1 2.282 2.302 4.725 4.761 hfx_ri_pre_scf_calc_tensors 1 14.0 0.004 0.004 4.640 4.646 hfx_ri_pre_scf_Pmat_copy_2 9 13.0 1.747 1.783 4.448 4.484 hfx_ri_pre_scf_Pmat_RIx3C 9 13.0 0.000 0.000 3.946 3.993 dbt_tas_reserve_blocks_templat 266 13.6 0.110 0.111 3.878 3.920 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-HFX_H2O-32", label="RI-HFX_H2O-32", y=196.001, yerr=0.0 Plot: name="RI-HFX_H2O-32_timings_6cpu_1gpu", title="Timings of RI-HFX_H2O-32 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="rest", label="rest", y=70.393, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=48.753, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=22.236, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=18.751, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=18.387, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=17.481, yerr=0.0 Running RI-MP2_ammonia.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.010 0.010 106.932 106.932 qs_energies 1 2.0 0.000 0.000 106.746 106.746 mp2_main 1 3.0 0.000 0.000 98.890 98.890 mp2_gpw_main 1 4.0 0.001 0.001 98.512 98.512 mp2_ri_gpw_compute_in 1 5.0 0.576 0.577 55.331 55.356 mp2_ri_gpw_compute_in_loop 1 6.0 0.013 0.013 46.876 46.901 mp2_ri_gpw_compute_en 1 5.0 0.094 0.095 43.117 43.142 mp2_ri_gpw_compute_en_RI_loop 1 6.0 12.836 12.839 40.432 40.433 dbcsr_multiply_generic 2666 8.0 0.160 0.162 23.277 23.569 ao_to_mo_and_store_B_mult_1 1328 7.0 0.014 0.014 21.869 22.161 mp2_eri_3c_integrate_gpw 1328 7.0 0.018 0.018 19.216 19.483 mp2_ri_gpw_compute_en_expansio 1040 7.0 0.735 0.738 16.556 16.595 local_gemm 1040 8.0 15.821 15.857 15.821 15.857 make_m2s 5332 9.0 0.053 0.055 12.586 12.759 make_images 5332 10.0 2.383 2.394 12.397 12.572 integrate_v_rspace 1338 8.0 1.102 1.116 10.618 10.697 multiply_cannon 2666 9.0 0.403 0.409 10.009 10.126 multiply_cannon_loop 2666 10.0 0.195 0.196 8.821 8.875 hybrid_alltoall_any 6683 11.6 8.157 8.319 8.415 8.580 make_images_data 5332 11.0 0.064 0.065 8.326 8.489 fft_wrap_pw1pw2 26668 10.4 0.141 0.143 8.086 8.475 grid_integrate_task_list 1338 9.0 8.178 8.247 8.178 8.247 get_2c_integrals 1 6.0 0.004 0.005 7.876 7.879 collocate_function 1328 8.0 5.245 5.288 7.426 7.774 compute_2c_integrals 1 7.0 0.007 0.007 7.294 7.295 compute_2c_integrals_loop_lm 1 8.0 0.023 0.023 7.187 7.198 mp2_eri_2c_integrate_gpw 1 9.0 2.092 2.095 7.164 7.175 scf_env_do_scf 1 3.0 0.000 0.000 6.936 6.937 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 6.936 6.937 ao_to_mo_and_store_B_E_Ex_1 1328 7.0 3.551 3.554 5.497 5.499 qs_scf_new_mos 10 5.0 0.000 0.000 5.363 5.369 multiply_cannon_multrec 2676 11.0 2.343 2.444 5.145 5.235 mp2_ri_gpw_compute_en_ener 1040 7.0 5.186 5.207 5.186 5.207 fft_wrap_pw1pw2_20 10647 11.4 0.020 0.021 4.617 4.990 mp2_ri_gpw_compute_en_comm 221 7.0 1.034 1.036 4.641 4.656 pw_gpu_r3dc1d_3d 13282 12.2 4.092 4.444 4.092 4.444 eigensolver 11 5.8 0.002 0.002 3.008 3.010 potential_pw2rs 2666 10.0 0.099 0.100 2.811 2.898 pw_gpu_c1dr3d_3d 13280 12.7 2.781 2.819 2.781 2.819 fft_wrap_pw1pw2_10 15957 11.5 0.021 0.021 2.547 2.562 dbcsr_mm_accdrv_process 5392 12.0 0.863 1.483 2.549 2.561 mp_sendrecv_dm3 442 8.0 2.505 2.523 2.505 2.523 collocate_single_gaussian 1328 10.0 0.093 0.095 2.416 2.506 cp_fm_diag_elpa 11 6.8 0.000 0.000 2.408 2.409 cp_fm_diag_elpa_base 11 7.8 2.328 2.343 2.407 2.407 copy_dbcsr_to_fm 1351 8.0 0.032 0.033 2.361 2.368 mp2_eri_2c_integrate_gpw_pot_l 1328 10.0 0.004 0.004 2.270 2.364 fill_local_i_aL 884 7.5 2.315 2.317 2.315 2.317 replicate_iaK_2intgroup 1 6.0 2.146 2.147 2.287 2.287 jit_kernel_multiply 8 13.0 1.570 2.197 1.570 2.197 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-MP2_ammonia", label="RI-MP2_ammonia", y=106.932, yerr=0.0 Plot: name="RI-MP2_ammonia_timings_6cpu_1gpu", title="Timings of RI-MP2_ammonia with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="rest", label="rest", y=56.695, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="local_gemm", label="local_gemm", y=15.821, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=12.836, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.178, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=8.157, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="collocate_function", label="collocate_function", y=5.245, yerr=0.0 Running diag_cu144_broy.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/diag_cu144_broy_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.079 0.081 209.507 209.515 qs_energies 1 2.0 0.000 0.000 208.376 208.384 scf_env_do_scf 1 3.0 0.000 0.000 194.094 194.102 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 194.094 194.102 qs_ks_update_qs_env 15 5.0 0.000 0.000 96.887 96.923 rebuild_ks_matrix 15 6.0 0.000 0.000 96.675 96.711 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 96.675 96.711 qs_vxc_create 15 8.0 0.000 0.000 59.854 59.878 qs_scf_new_mos 15 5.0 0.000 0.000 54.373 54.436 fft_wrap_pw1pw2 1086 10.0 0.029 0.029 51.893 51.983 calculate_dispersion_nonloc 15 9.0 11.167 11.194 51.633 51.662 eigensolver 15 6.0 0.002 0.002 44.692 44.734 qs_rho_update_rho_low 16 5.0 0.000 0.000 40.906 40.907 calculate_rho_elec 16 6.0 0.188 0.188 40.905 40.907 sum_up_and_integrate 15 8.0 0.000 0.000 35.330 35.394 integrate_v_rspace 15 9.0 0.049 0.050 35.305 35.370 grid_collocate_task_list 16 7.0 29.044 29.066 29.044 29.066 grid_integrate_task_list 15 10.0 28.055 28.088 28.055 28.088 cp_fm_diag_elpa 15 7.0 0.000 0.000 27.315 27.321 cp_fm_diag_elpa_base 15 8.0 25.491 26.075 27.309 27.310 pw_gpu_c1dr3d_3d_ps 585 12.1 5.770 5.783 27.091 27.109 fft_wrap_pw1pw2_150 765 11.0 0.005 0.005 26.574 26.651 pw_gpu_r3dc1d_3d_ps 501 11.9 4.871 5.081 24.766 24.874 cp_fm_cholesky_restore 45 7.0 15.467 16.195 15.467 16.195 fft_wrap_pw1pw2_200 197 11.3 0.001 0.001 12.504 12.537 density_rs2pw 16 7.0 0.002 0.002 11.665 11.683 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 10.346 10.347 vdW_energy 15 10.0 9.753 9.789 9.753 9.789 pw_gpu_ffc 585 13.1 9.394 9.395 9.394 9.395 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 8.858 8.945 pw_gpu_cff 501 12.9 8.808 8.819 8.808 8.819 xc_vxc_pw_create 15 9.0 0.190 0.193 8.221 8.226 mp_alltoall_z22v 1086 14.0 7.000 7.345 7.000 7.345 potential_pw2rs 15 10.0 0.007 0.007 7.201 7.233 pw_gpu_sf 585 13.1 7.147 7.156 7.147 7.156 pw_gpu_fg 501 12.9 6.914 6.944 6.914 6.944 copy_dbcsr_to_fm 16 5.9 0.001 0.001 6.706 6.782 dbcsr_complete_redistribute 46 8.3 1.829 1.838 5.778 5.874 fft_wrap_pw1pw2_10 62 10.5 0.000 0.000 5.788 5.788 cp_fm_uplo_to_full 30 8.0 3.725 4.988 3.725 4.988 build_core_ppnl 1 5.0 4.940 4.950 4.940 4.950 xc_rho_set_and_dset_create 15 10.0 0.134 0.135 4.887 4.905 x_to_yz 585 13.1 1.005 1.019 4.746 4.761 xc_pw_derive 90 11.0 0.001 0.001 4.687 4.694 yz_to_x 501 12.9 0.856 0.864 4.115 4.454 gspace_mixing 14 5.0 0.134 0.135 4.228 4.228 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="diag_cu144_broy", label="diag_cu144_broy", y=209.507, yerr=0.0 Plot: name="diag_cu144_broy_timings_6cpu_1gpu", title="Timings of diag_cu144_broy with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="rest", label="rest", y=100.283, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=29.044, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=28.055, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=25.491, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=15.467, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="calculate_dispersion_nonloc", label="calculate_dispersion_nonloc", y=11.167, yerr=0.0 Running bench_dftb.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/bench_dftb_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.045 0.046 276.393 276.394 qs_energies 1 2.0 0.000 0.000 276.251 276.253 ls_scf 1 3.0 0.000 0.000 275.347 275.349 ls_scf_main 1 4.0 0.001 0.002 265.170 265.171 density_matrix_trs4 11 5.0 0.008 0.008 220.286 220.350 dbcsr_multiply_generic 185 6.1 0.357 0.380 180.795 180.871 multiply_cannon 185 7.1 2.159 2.418 125.899 125.993 multiply_cannon_loop 185 8.1 0.351 0.354 111.285 111.645 multiply_cannon_multrec 370 9.1 84.687 85.133 94.407 94.799 make_m2s 370 7.1 0.030 0.031 46.104 46.138 make_images 370 8.1 11.211 11.429 45.039 45.070 ls_scf_dm_to_ks 11 5.0 0.000 0.000 40.329 40.389 matrix_ls_to_qs 11 6.0 0.000 0.000 37.128 37.412 dbcsr_complete_redistribute 23 7.5 22.952 23.208 31.587 31.891 matrix_decluster 11 7.0 0.000 0.000 28.645 28.938 arnoldi_extremal 12 6.1 0.000 0.000 23.920 23.921 arnoldi_normal_ev 12 7.1 0.009 0.009 23.919 23.921 build_subspace 23 8.1 0.064 0.065 23.433 23.433 dbcsr_matrix_vector_mult 652 9.0 0.156 0.158 21.816 21.824 dbcsr_matrix_vector_mult_local 652 10.0 20.800 20.808 20.808 20.815 make_images_data 370 9.1 0.012 0.013 17.481 17.505 hybrid_alltoall_any 393 9.9 11.962 12.061 16.968 16.979 calculate_norms 740 9.1 15.656 15.713 15.656 15.713 dbcsr_finalize 559 7.6 0.228 0.234 14.304 14.357 dbcsr_merge_all 510 8.6 2.424 2.534 13.049 13.117 dbcsr_copy 761 7.5 1.659 1.714 10.161 10.226 dbcsr_special_finalize 555 9.1 0.010 0.010 9.674 9.677 setup_rec_index_2d 370 8.1 9.536 9.548 9.536 9.548 dbcsr_sort_indices 1283 10.0 9.066 9.071 9.066 9.071 ls_scf_init_scf 1 4.0 0.000 0.000 8.648 8.648 dbcsr_dot 144 6.3 7.800 7.804 8.467 8.646 dbcsr_add_d 280 6.0 0.001 0.001 8.502 8.624 dbcsr_add_anytype 280 7.0 3.861 3.868 8.501 8.623 dbcsr_copy_into_existing 11 8.0 8.481 8.490 8.482 8.491 ls_scf_init_matrix_S 1 5.0 0.000 0.000 8.154 8.158 dbcsr_mm_accdrv_process 14501 10.0 0.828 0.841 7.691 7.745 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.000 7.364 7.366 tree_to_linear_d 23 10.5 7.120 7.129 7.120 7.129 dbcsr_mm_accdrv_process_sort 14501 11.0 6.863 6.904 6.863 6.904 dbcsr_merge_single_wm 370 10.1 0.554 0.561 6.238 6.247 mp_waitall_1 5192 10.5 5.547 5.661 5.547 5.661 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="bench_dftb", label="bench_dftb", y=276.393, yerr=0.0 Plot: name="bench_dftb_timings_6cpu_1gpu", title="Timings of bench_dftb with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="rest", label="rest", y=120.33599999999998, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=84.687, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=22.952, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=20.8, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="calculate_norms", label="calculate_norms", y=15.656, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=11.962, yerr=0.0 Running dbcsr.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/dbcsr_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.004 0.005 51.193 51.193 lib_test 1 2.0 0.000 0.000 51.187 51.188 dbcsr_run_tests 3 3.0 0.000 0.000 51.186 51.187 test_multiplies_multiproc 3 4.0 0.001 0.001 39.418 39.610 dbcsr_multiply_generic 9 5.0 0.002 0.002 30.629 30.630 multiply_cannon 9 6.0 0.211 0.223 20.214 21.032 multiply_cannon_loop 9 7.0 0.003 0.003 18.599 19.150 multiply_cannon_multrec 18 8.0 9.921 10.368 17.405 17.945 dbcsr_make_random_matrix 9 4.0 7.941 8.095 11.619 11.812 dbcsr_finalize 27 5.7 0.001 0.001 7.912 7.936 dbcsr_merge_all 18 6.5 3.747 3.766 7.788 7.808 dbcsr_mm_accdrv_process 8199 9.0 1.265 1.360 7.248 7.338 dbcsr_redistribute 9 5.0 3.691 3.734 5.988 6.017 make_m2s 18 6.0 0.001 0.001 5.110 5.116 make_images 18 7.0 0.382 0.386 5.075 5.081 dbcsr_mm_accdrv_process_sort 8199 10.0 4.935 5.021 4.935 5.021 make_images_data 18 8.0 0.001 0.001 2.919 2.920 hybrid_alltoall_any 18 9.0 2.524 2.532 2.886 2.888 mp_alltoall_d11v 27 6.0 1.981 2.002 1.981 2.002 dbcsr_data_copy_aa2 18 7.5 1.982 1.999 1.982 1.999 tree_to_linear_d 9 7.0 1.915 1.938 1.915 1.938 mp_sum_l 61 4.9 0.832 1.631 0.832 1.631 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.832 1.630 dbcsr_data_release 507 7.7 1.417 1.418 1.417 1.418 dbcsr_data_new 354 7.4 1.128 1.167 1.128 1.167 jit_kernel_multiply 6 10.0 1.048 1.148 1.048 1.148 dbcsr_checksum 6 5.0 1.084 1.086 1.100 1.100 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="dbcsr", label="dbcsr", y=51.193, yerr=0.0 Plot: name="dbcsr_timings_6cpu_1gpu", title="Timings of dbcsr with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="rest", label="rest", y=20.958, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=9.921, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=7.941, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_mm_accdrv_process_sort", label="dbcsr_mm_accdrv_process_sort", y=4.935, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.747, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_redistribute", label="dbcsr_redistribute", y=3.691, yerr=0.0 Running MQAE_single_node.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/MQAE_single_node_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.046 0.048 207.590 207.593 qs_mol_dyn_low 1 2.0 0.004 0.005 205.941 205.978 qs_forces 6 3.8 0.001 0.001 126.309 126.311 qs_energies 6 4.8 0.000 0.000 119.244 119.246 scf_env_do_scf 6 5.8 0.000 0.000 111.348 111.350 scf_env_do_scf_inner_loop 113 6.2 0.005 0.008 103.221 103.222 velocity_verlet 5 3.0 0.003 0.003 99.563 99.614 rebuild_ks_matrix 119 8.1 0.000 0.001 84.497 84.501 qs_ks_build_kohn_sham_matrix 119 9.1 0.019 0.020 84.497 84.500 qs_ks_update_qs_env 119 7.3 0.001 0.001 79.760 79.763 fft_wrap_pw1pw2 2059 12.4 0.046 0.047 66.065 66.096 fft_wrap_pw1pw2_150 1321 13.9 0.009 0.009 63.246 63.252 qs_vxc_create 119 10.1 0.002 0.002 54.171 54.172 xc_vxc_pw_create 119 11.1 1.592 1.595 54.169 54.170 qmmm_el_coupling 6 3.8 0.000 0.000 42.523 42.534 qmmm_elec_with_gaussian 6 4.8 0.021 0.021 42.517 42.528 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 40.822 41.124 xc_pw_derive 714 13.1 0.010 0.011 37.176 37.194 qmmm_elec_gaussian_low_G 6 6.8 35.690 36.091 35.690 36.091 pw_gpu_c1dr3d_3d_ps 1095 14.8 10.837 10.953 35.600 35.650 qmmm_forces 6 3.8 0.001 0.001 34.330 34.331 qmmm_forces_with_gaussian 6 4.8 0.024 0.024 33.534 33.955 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 32.167 32.588 pw_gpu_r3dc1d_3d_ps 964 14.0 9.555 9.694 30.407 30.426 xc_rho_set_and_dset_create 119 12.1 2.543 2.551 27.421 27.448 qmmm_forces_gaussian_low_G 6 6.8 26.874 27.336 26.874 27.336 xc_pw_divergence 119 12.1 0.006 0.006 24.742 24.761 qs_rho_update_rho_low 119 7.3 0.001 0.001 22.425 22.491 calculate_rho_elec 119 8.3 1.151 1.151 22.424 22.490 density_rs2pw 119 9.3 0.008 0.008 16.158 16.260 dbcsr_multiply_generic 2598 12.3 0.099 0.101 14.070 14.070 sum_up_and_integrate 119 10.1 0.002 0.002 13.940 14.009 integrate_v_rspace 119 11.1 0.022 0.022 13.753 13.822 mp_alltoall_z22v 2059 16.4 13.034 13.336 13.034 13.336 multiply_cannon 2598 13.3 0.226 0.229 12.400 12.459 multiply_cannon_loop 2598 14.3 0.263 0.270 11.914 11.975 multiply_cannon_multrec 5196 15.3 4.055 4.179 9.756 9.916 potential_pw2rs 119 12.1 0.034 0.034 9.800 9.800 x_to_yz 1095 15.8 2.274 2.278 9.401 9.535 qs_ks_ddapc 119 10.1 0.002 0.002 9.056 9.068 pw_gpu_sf 1095 15.8 8.766 8.775 8.766 8.775 pw_gpu_fg 964 15.0 8.126 8.197 8.126 8.197 init_scf_loop 6 6.8 0.000 0.000 8.125 8.125 yz_to_x 964 15.0 1.790 1.797 7.697 7.861 qs_scf_new_mos 113 7.2 0.001 0.001 7.360 7.360 qs_scf_loop_do_ot 113 8.2 0.001 0.001 7.359 7.359 ot_scf_mini 113 9.2 0.002 0.002 7.069 7.069 pw_gpu_ffc 1095 15.8 6.576 6.599 6.576 6.599 dbcsr_mm_accdrv_process 13992 16.0 0.562 0.566 5.632 5.666 init_scf_run 6 5.8 0.000 0.000 5.647 5.648 scf_env_initial_rho_setup 6 6.8 0.000 0.000 5.647 5.647 xc_functional_eval 238 13.1 0.003 0.003 5.521 5.537 qmmm_forces_gaussian_low_R 6 6.8 0.000 0.000 5.292 5.333 qmmm_forces_with_gaussian_LG 6 7.8 5.292 5.333 5.292 5.333 qmmm_elec_gaussian_low_R 6 6.8 0.000 0.000 5.133 5.232 qmmm_elec_with_gaussian_LG 6 7.8 5.132 5.232 5.132 5.232 grid_collocate_task_list 119 9.3 5.087 5.130 5.087 5.130 jit_kernel_multiply 24 14.7 5.021 5.051 5.021 5.051 ot_mini 113 10.2 0.001 0.001 5.048 5.050 pw_gpu_cff 964 15.0 4.959 4.984 4.959 4.984 pw_poisson_solve 125 9.9 0.003 0.003 4.828 4.828 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 4.769 4.769 pw_derive 1089 13.4 4.147 4.157 4.147 4.157 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="MQAE_single_node", label="MQAE_single_node", y=207.59, yerr=0.0 Plot: name="MQAE_single_node_timings_6cpu_1gpu", title="Timings of MQAE_single_node with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="rest", label="rest", y=111.60000000000001, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=35.69, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=26.874, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=13.034, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.837, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=9.555, yerr=0.0 Summary: Performance test took 23 minutes. Status: OK ---> Removed intermediate container 1d20be4b78cf ---> a4a450544ca4 Step 45/46 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in 284ac6c98072 ---> Removed intermediate container 284ac6c98072 ---> 29e353e42d44 Step 46/46 : ENTRYPOINT [] ---> Running in f71176656917 ---> Removed intermediate container f71176656917 ---> ded44e42faa1 [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built ded44e42faa1 Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2026-04-09 07:27:34+00:00