StartDate: 2026-01-09 06:06:13+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: 477b1f15e7d9b79889d901a2caca263c459105a1 CommitTime: 2026-01-08 22:27:41 +0100 CommitAuthor: Matthias Krack CommitSubject: Update Spack configuration files #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=477b1f15e7d9b79889d901a2caca263c459105a1 SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 408.9MB Step 1/47 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 7f7602a82106: Waiting 5a2aba542b08: Waiting 6cb9b761b877: Waiting 9c3d619183d2: Waiting 6e8af4fd0a07: Waiting 15a17189b2df: Waiting 02cb0e091e33: Waiting 2cd52cbb1ebe: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 2cd52cbb1ebe: Verifying Checksum 2cd52cbb1ebe: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 6e8af4fd0a07: Verifying Checksum 6e8af4fd0a07: Download complete 02cb0e091e33: Verifying Checksum 02cb0e091e33: Download complete 9c3d619183d2: Verifying Checksum 9c3d619183d2: Download complete 7f7602a82106: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/47 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/47 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/47 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/47 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/47 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/47 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> 852ff7058318 Step 8/47 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> 3cc2e0ec6ea3 Step 9/47 : RUN mkdir scripts ---> Using cache ---> 9264fff48632 Step 10/47 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./scripts/ ---> Using cache ---> 94eaf24213f0 Step 11/47 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> 7e5ef29eeea0 Step 12/47 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --gpu-ver=V100 --dry-run ---> Using cache ---> 4940ae3b8d72 Step 13/47 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> a858e4ab62d2 Step 14/47 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> 5c91d3ddd6af Step 15/47 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 32c866fb1eff Step 16/47 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> af4360843d07 Step 17/47 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> 607f13b74bd4 Step 18/47 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> bcfa76127bf3 Step 19/47 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> a6fd19eb59ef Step 20/47 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> 6c7fb00375da Step 21/47 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> 512fe0ec1dbe Step 22/47 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> d6f0752ae4f0 Step 23/47 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> Using cache ---> d99b15c81b72 Step 24/47 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Using cache ---> c1aef704603d Step 25/47 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> Using cache ---> 1268b070e7e6 Step 26/47 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Using cache ---> 9abe2366d295 Step 27/47 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> Using cache ---> 38c11788c2bb Step 28/47 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Using cache ---> 76cb6c895640 Step 29/47 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> Using cache ---> 55b2b9ffe6ea Step 30/47 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Using cache ---> c07e96d2703d Step 31/47 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> Using cache ---> 5cbdea565e52 Step 32/47 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Using cache ---> f2fefd69812a Step 33/47 : WORKDIR /opt/cp2k ---> Using cache ---> 2c1c209ad735 Step 34/47 : COPY ./src ./src ---> 90d89172ab60 Step 35/47 : COPY ./data ./data ---> 61a0ad8e69a4 Step 36/47 : COPY ./tests ./tests ---> 3073480c7504 Step 37/47 : COPY ./tools/build_utils ./tools/build_utils ---> 11eb68c97119 Step 38/47 : COPY ./cmake ./cmake ---> dc45781d44b9 Step 39/47 : COPY ./CMakeLists.txt . ---> 287d05e64569 Step 40/47 : COPY ./tools/docker/scripts/build_cp2k.sh . ---> bff02432e827 Step 41/47 : RUN ./build_cp2k.sh toolchain_cuda_V100 psmp ---> Running in 971100180cd5 ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpi.so (found version "4.1") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpicxx.so (found version "4.1") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpifort.so (found version "4.1") -- Found MPI: TRUE (found version "4.1") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "4.1") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.30 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.30/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Checking for module 'libxsmm-shared' -- Found libxsmm-shared, version 1.17.0 -- Checking for module 'libxsmmf-shared' -- Found libxsmmf-shared, version 1.17.0 -- Checking for module 'libxsmmext-shared' -- Found libxsmmext-shared, version 1.17.0 -- Checking for module 'libxsmmnoblas-shared' -- Found libxsmmnoblas-shared, version 1.17.0 -- Found LibXSMM: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include -- Using LIBXSMM for Small Matrix Multiplication -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a -- CP2K_WITH_GPU is deprecated in favor of CMAKE_HIP_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 70 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "4.1") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Checking for module 'libxsmmf' -- Found libxsmmf, version 1.17.0 -- Checking for module 'libxsmmext' -- Found libxsmmext, version 1.17.0 ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "1.14.6") found components: C Fortran -- Found MPI: TRUE (found version "4.1") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Checking for module 'libint2' -- Found libint2, version 2.6.0 -- Found Libint2: /opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include;/opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include/libint2 -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.5.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "1.14.6") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'libvdwxc>=0.3.0' -- Looking for vdwxc_init_mpi -- Looking for vdwxc_init_mpi - not found -- Found LibVDWXC: /opt/cp2k-toolchain/install/libvdwxc-0.4.0/lib/libvdwxc.a (Required is at least version "0.3.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - vendor: OpenBLAS - include directories: /opt/cp2k-toolchain/install/openblas-0.3.30/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so - LAPACK - include directories: /opt/cp2k-toolchain/install/openblas-0.3.30/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so - MPI - include directories: /opt/cp2k-toolchain/install/mpich-4.3.2/include - libraries: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpi.so - MPI_F08: ON - ScaLAPACK - vendor: auto - include directories: - libraries: /opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a - Hardware Acceleration: - CUDA: - GPU architecture number: 70 - GPU profiling enabled: - GPU accelerated modules - PW module: ON - GRID module: ON - DBM module: ON - LibXC - version: 7.0.0 - include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - version: 1.14.6 - include directories: /opt/cp2k-toolchain/install/hdf5-1.14.6/include - libraries: hdf5-shared - FFTW3 - include directories: /opt/cp2k-toolchain/install/fftw-3.3.10/include - libraries: /opt/cp2k-toolchain/install/fftw-3.3.10/lib/libfftw3.a - LIBXSMM - include directories: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include - libraries: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmext.so;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so;/opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmf.so;:libxsmmext.a;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so - SpLA - include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - SIRIUS - include directories: - libraries: - COSMA - include directories: /opt/cp2k-toolchain/install/COSMA-2.7.0/include - libraries: MPI::MPI_CXX;costa::costa;$;$;cosma::BLAS::blas;cosma::scalapack::scalapack - Libint2 - include directories: /opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include;/opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include/libint2 - libraries: /opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/lib/libint2.a - ELPA - include directories: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/include/elpa_openmp-2024.05.001 - libraries: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a;:libopenblas.a -------------------------------------------------------------------- - - - List of dependencies not included in this build - - - -------------------------------------------------------------------- - DFTD4 - DeePMD - PEXSI - ACE (libpace) - TBLITE - Spglib - LibSMEAGOL - MiMiC - openPMD - DLA-Future - PLUMED - Libvori - LibTorch - TREXIO - GreenX After building CP2K the regtests can be run with the following command: ./tests/do_regtest.py /opt/cp2k/build/bin psmp -- Configuring done (11.9s) -- Generating done (0.4s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container 971100180cd5 ---> 7be7beec713f Step 42/47 : COPY ./benchmarks ./benchmarks ---> abb688c87e8d Step 43/47 : COPY ./tools/regtesting ./tools/regtesting ---> 86817b8139ab Step 44/47 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> 4b924f5eeb82 Step 45/47 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in 0502f53953a7 ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma xsmm dbcsr_acc sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 77 input files and 0 errors. ========== Running Performance Test ========== Plot: name="total_timings_6cpu_1gpu", title="Total Timings with 6 CPU Cores and 1 GPU", ylabel="time [s]" Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.026 0.027 100.252 100.254 qs_mol_dyn_low 1 2.0 0.004 0.004 99.849 99.850 qs_forces 11 3.9 0.002 0.002 99.803 99.805 qs_energies 11 4.9 0.001 0.001 88.853 88.854 scf_env_do_scf 11 5.9 0.001 0.001 68.862 68.863 velocity_verlet 10 3.0 0.001 0.002 63.893 63.911 scf_env_do_scf_inner_loop 108 6.5 0.006 0.008 58.868 58.869 rebuild_ks_matrix 119 8.3 0.001 0.001 26.112 26.114 qs_ks_build_kohn_sham_matrix 119 9.3 0.019 0.020 26.111 26.113 dbcsr_multiply_generic 2286 12.5 0.139 0.140 24.387 24.436 qs_ks_update_qs_env 119 7.6 0.001 0.001 23.926 23.927 qs_rho_update_rho_low 119 7.7 0.001 0.001 19.836 19.842 calculate_rho_elec 119 8.7 0.840 0.847 19.835 19.841 qs_scf_new_mos 108 7.5 0.001 0.001 19.755 19.758 qs_scf_loop_do_ot 108 8.5 0.001 0.001 19.754 19.757 ot_scf_mini 108 9.5 0.003 0.003 17.849 17.849 fft_wrap_pw1pw2 1201 11.6 0.023 0.024 15.550 15.598 sum_up_and_integrate 119 10.3 0.002 0.003 13.759 13.801 integrate_v_rspace 119 11.3 0.349 0.353 13.669 13.712 fft_wrap_pw1pw2_140 487 12.2 0.003 0.003 13.350 13.402 multiply_cannon 2286 13.5 0.313 0.319 12.323 12.337 multiply_cannon_loop 2286 14.5 0.253 0.259 11.280 11.301 make_m2s 4572 13.5 0.040 0.041 10.469 10.483 ot_mini 108 10.5 0.001 0.001 10.435 10.438 make_images 4572 14.5 1.134 1.139 10.300 10.313 density_rs2pw 119 9.7 0.008 0.008 10.028 10.118 init_scf_run 11 5.9 0.000 0.000 10.103 10.103 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.102 10.102 init_scf_loop 11 6.9 0.000 0.001 9.919 9.920 grid_collocate_task_list 119 9.7 8.928 8.987 8.928 8.987 pw_gpu_r3dc1d_3d_ps 606 13.1 2.281 2.311 7.960 7.967 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 7.918 7.918 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.575 7.684 pw_gpu_c1dr3d_3d_ps 595 14.2 2.232 2.254 7.560 7.602 grid_integrate_task_list 119 12.3 7.370 7.417 7.370 7.417 wfi_extrapolate 11 7.9 0.001 0.001 7.301 7.301 prepare_preconditioner 11 7.9 0.000 0.000 6.761 6.762 make_preconditioner 11 8.9 0.000 0.000 6.761 6.762 qs_ot_get_derivative 108 11.5 0.001 0.001 6.239 6.239 multiply_cannon_multrec 4572 15.5 2.154 2.196 6.152 6.168 hybrid_alltoall_any 4725 16.4 4.757 4.775 6.082 6.110 make_images_data 4572 15.5 0.051 0.052 5.968 5.989 potential_pw2rs 119 12.3 0.036 0.036 5.949 5.949 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.655 5.888 parallel_gemm_fm_cosma 81 9.0 5.350 5.351 5.350 5.351 ot_diis_step 108 11.5 0.005 0.005 4.170 4.170 build_core_ppl_forces 11 5.9 3.863 3.949 3.863 3.949 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.788 3.817 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.728 3.744 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.642 3.645 apply_single 119 13.6 0.001 0.001 3.642 3.644 dbcsr_mm_accdrv_process 9594 16.2 0.813 0.892 3.580 3.583 dbcsr_complete_redistribute 329 12.2 1.329 1.333 3.165 3.411 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.262 3.262 calculate_dm_sparse 119 9.5 0.001 0.001 3.198 3.201 multiply_cannon_sync_h2d 4572 15.5 3.140 3.170 3.140 3.170 mp_alltoall_z22v 1201 15.6 3.060 3.130 3.060 3.130 qs_create_task_list 11 7.9 0.000 0.000 3.009 3.051 generate_qs_task_list 11 8.9 1.117 1.128 3.009 3.051 qs_ot_get_p 119 10.4 0.001 0.001 3.023 3.023 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.718 2.719 mp_waitall_1 64495 16.9 2.679 2.686 2.679 2.686 pw_poisson_solve 119 10.3 0.003 0.003 2.671 2.672 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.448 2.462 transfer_rs2pw 487 10.6 0.008 0.008 2.318 2.397 calculate_first_density_matrix 1 7.0 0.000 0.000 2.363 2.364 jit_kernel_multiply 11 15.7 2.193 2.272 2.193 2.272 pw_gpu_fg 606 14.1 2.255 2.259 2.255 2.259 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.230 2.230 qs_ot_get_derivative_taylor 59 13.0 0.003 0.003 2.076 2.077 yz_to_x 606 14.1 0.459 0.461 2.015 2.047 dbcsr_special_finalize 6858 15.5 0.041 0.041 2.038 2.039 cp_fm_cholesky_invert 11 10.9 2.024 2.024 2.024 2.024 x_to_yz 595 15.2 0.476 0.484 1.980 2.008 transfer_rs2pw_140 130 11.5 1.477 1.486 1.918 2.006 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64", label="H2O-64", y=100.252, yerr=0.0 Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=69.984, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.928, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.37, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.35, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.757, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.863, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.025 0.026 96.474 96.474 qs_mol_dyn_low 1 2.0 0.004 0.004 96.058 96.061 qs_forces 11 3.9 0.002 0.002 96.014 96.014 qs_energies 11 4.9 0.001 0.001 84.693 84.694 scf_env_do_scf 11 5.9 0.001 0.001 64.201 64.202 velocity_verlet 10 3.0 0.001 0.002 62.777 62.793 scf_env_do_scf_inner_loop 96 6.5 0.005 0.007 53.776 53.776 rebuild_ks_matrix 107 8.3 0.001 0.001 25.449 25.449 qs_ks_build_kohn_sham_matrix 107 9.3 0.017 0.017 25.448 25.449 qs_ks_update_qs_env 107 7.6 0.001 0.001 22.944 22.945 dbcsr_multiply_generic 1966 12.4 0.122 0.123 22.071 22.162 qs_rho_update_rho_low 107 7.7 0.001 0.001 18.213 18.217 calculate_rho_elec 107 8.7 0.766 0.766 18.213 18.216 qs_scf_new_mos 96 7.5 0.001 0.001 17.486 17.500 qs_scf_loop_do_ot 96 8.5 0.001 0.001 17.485 17.499 ot_scf_mini 96 9.5 0.002 0.003 15.802 15.804 sum_up_and_integrate 107 10.3 0.002 0.002 14.266 14.338 integrate_v_rspace 107 11.3 0.318 0.320 14.183 14.254 fft_wrap_pw1pw2 1081 11.6 0.021 0.021 14.128 14.179 fft_wrap_pw1pw2_140 439 12.2 0.003 0.003 12.139 12.207 multiply_cannon 1966 13.4 0.282 0.286 11.171 11.196 init_scf_loop 11 6.9 0.000 0.000 10.349 10.350 multiply_cannon_loop 1966 14.4 0.222 0.227 10.286 10.304 init_scf_run 11 5.9 0.000 0.000 10.163 10.164 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.163 10.163 make_m2s 3932 13.4 0.036 0.036 9.432 9.434 ot_mini 96 10.5 0.001 0.001 9.308 9.308 make_images 3932 14.4 1.032 1.053 9.281 9.283 density_rs2pw 107 9.7 0.007 0.007 9.119 9.264 grid_integrate_task_list 107 12.3 8.480 8.552 8.480 8.552 grid_collocate_task_list 107 9.7 8.303 8.428 8.303 8.428 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.307 8.307 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.655 7.833 wfi_extrapolate 11 7.9 0.001 0.001 7.385 7.385 pw_gpu_r3dc1d_3d_ps 546 13.1 2.080 2.099 7.251 7.263 prepare_preconditioner 11 7.9 0.000 0.000 7.039 7.048 make_preconditioner 11 8.9 0.000 0.000 7.039 7.048 pw_gpu_c1dr3d_3d_ps 535 14.2 2.009 2.022 6.851 6.890 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.910 6.152 multiply_cannon_multrec 3932 15.4 1.878 1.930 5.738 5.781 qs_ot_get_derivative 96 11.5 0.001 0.001 5.609 5.611 hybrid_alltoall_any 4079 16.3 4.275 4.288 5.512 5.552 parallel_gemm_fm_cosma 81 9.0 5.459 5.460 5.459 5.460 potential_pw2rs 107 12.3 0.032 0.033 5.384 5.387 make_images_data 3932 15.4 0.045 0.046 5.354 5.377 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.103 4.298 build_core_ppl_forces 11 5.9 3.926 4.065 3.926 4.065 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.779 3.856 ot_diis_step 96 11.5 0.005 0.005 3.676 3.676 dbcsr_complete_redistribute 317 12.2 1.339 1.353 3.332 3.582 dbcsr_mm_accdrv_process 8450 16.1 0.704 0.708 3.506 3.515 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.484 3.484 qs_create_task_list 11 7.9 0.000 0.000 3.251 3.363 generate_qs_task_list 11 8.9 1.408 1.420 3.251 3.363 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.277 3.278 apply_single 107 13.6 0.001 0.001 3.277 3.277 calculate_dm_sparse 107 9.5 0.001 0.001 2.958 2.973 multiply_cannon_sync_h2d 3932 15.4 2.800 2.810 2.800 2.810 mp_alltoall_z22v 1081 15.6 2.752 2.789 2.752 2.789 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.692 2.693 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.630 2.645 qs_ot_get_p 107 10.4 0.001 0.001 2.613 2.613 transfer_rs2pw 439 10.6 0.007 0.007 2.233 2.447 pw_poisson_solve 107 10.3 0.003 0.003 2.409 2.414 mp_waitall_1 55487 16.8 2.365 2.376 2.365 2.376 calculate_first_density_matrix 1 7.0 0.000 0.000 2.335 2.336 jit_kernel_multiply 11 15.7 2.286 2.290 2.286 2.290 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.204 2.205 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.108 2.124 transfer_rs2pw_140 118 11.5 1.346 1.349 1.873 2.088 pw_gpu_fg 546 14.1 2.070 2.078 2.070 2.078 cp_fm_cholesky_invert 11 10.9 2.069 2.069 2.069 2.069 build_core_ppl 11 7.9 1.952 2.009 1.952 2.009 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64_nonortho", label="H2O-64_nonortho", y=96.474, yerr=0.0 Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=66.031, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.48, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.303, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.459, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.275, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.926, yerr=0.0 Running GW_PBE_4benzene.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.018 0.019 164.439 164.441 qs_energies 1 2.0 0.000 0.000 164.113 164.114 mp2_main 1 3.0 0.000 0.000 157.329 157.329 mp2_gpw_main 1 4.0 0.000 0.000 155.500 155.500 rpa_ri_compute_en 1 5.0 0.000 0.000 145.829 145.829 rpa_num_int 1 6.0 0.001 0.001 145.820 145.820 compute_mat_P_omega 1 7.0 0.001 0.002 66.766 66.767 dbt_total 2336 9.6 0.021 0.021 66.583 66.584 parallel_gemm_fm_cosma 105 8.4 66.233 66.299 66.233 66.299 compute_mat_P_omega_contract 10 8.0 5.076 5.155 66.103 66.117 dbt_contract 787 11.0 0.049 0.050 45.033 45.035 compute_W_cubic_GW 10 7.0 0.003 0.004 43.139 43.140 dbt_tas_total 1149 12.2 0.132 0.134 35.309 35.309 dbt_tas_multiply 807 12.1 0.003 0.003 34.648 34.649 dbt_tas_dbm 807 14.1 0.006 0.006 27.295 27.295 dbm_multiply 807 16.1 26.041 26.222 26.041 26.222 compute_mat_P_omega_calc_M_occ 250 9.0 5.052 5.131 23.410 23.411 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 22.033 22.033 dbt_copy 1107 10.7 0.071 0.072 21.800 21.981 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 21.735 21.735 dbt_tas_mm_1N 524 15.1 0.003 0.003 17.567 17.809 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 14.908 14.908 dbt_reshape 594 11.8 6.280 6.462 14.005 14.115 compute_QP_energies 1 7.0 0.000 0.000 11.607 11.607 compute_self_energy_cubic_gw 1 8.0 0.117 0.120 11.607 11.607 dbt_tas_reserve_blocks_index 3266 14.3 0.624 0.625 10.113 10.125 dbm_reserve_blocks 3634 15.3 9.816 9.829 9.816 9.829 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 9.660 9.660 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 8.837 8.838 dbt_crop 1042 12.0 6.428 6.533 8.652 8.789 dbt_reserve_blocks_index 2347 13.0 0.305 0.307 8.298 8.304 dbt_reserve_blocks_index_array 2289 12.1 0.012 0.012 8.096 8.122 dbt_tas_mm_2 251 15.0 0.002 0.003 7.565 7.565 scf_env_do_scf 1 3.0 0.000 0.000 6.202 6.202 scf_env_do_scf_inner_loop 17 4.0 0.001 0.001 6.202 6.202 mp_waitall_2 2656 15.9 5.692 5.704 5.692 5.704 get_2c_integrals 1 6.0 0.000 0.000 5.489 5.490 contract_cubic_gw 21 9.0 0.000 0.000 5.389 5.389 dbt_communicate_buffer 594 12.8 0.011 0.012 5.197 5.211 dbcsr_multiply_generic 30 8.1 0.003 0.003 5.067 5.092 multiply_cannon 30 9.1 0.008 0.009 4.889 4.912 multiply_cannon_loop 30 10.1 0.004 0.004 4.836 4.860 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 4.808 4.811 compute_mat_P_omega_copy_M_occ 250 9.0 0.002 0.002 4.675 4.683 dbt_tas_copy 511 11.5 2.478 2.528 4.400 4.452 multiply_cannon_multrec 60 11.1 0.149 0.154 4.233 4.289 dbcsr_mm_accdrv_process 328 12.3 0.477 0.911 3.933 3.989 jit_kernel_multiply 18 11.5 3.451 3.829 3.451 3.829 qs_scf_new_mos 17 5.0 0.000 0.000 3.403 3.435 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="GW_PBE_4benzene", label="GW_PBE_4benzene", y=164.439, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_6cpu_1gpu", title="Timings of GW_PBE_4benzene with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="rest", label="rest", y=49.64099999999999, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=66.233, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=26.041, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=9.816, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_crop", label="dbt_crop", y=6.428, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=6.28, yerr=0.0 Running RI-HFX_H2O-32.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.019 0.020 187.044 187.046 qs_forces 1 2.0 0.000 0.000 186.619 186.622 rebuild_ks_matrix 7 6.6 0.000 0.000 182.367 182.369 qs_ks_build_kohn_sham_matrix 7 7.6 0.002 0.002 182.367 182.369 hfx_ks_matrix 7 8.6 0.000 0.000 178.682 178.684 dbt_total 849 11.0 0.009 0.009 133.831 133.833 hfx_ri_update_ks 7 9.6 0.000 0.000 101.918 101.919 hfx_ri_update_ks_Pmat 7 10.6 20.877 20.880 101.913 101.914 qs_energies 1 3.0 0.000 0.000 97.533 97.534 scf_env_do_scf 1 4.0 0.000 0.000 95.478 95.479 qs_ks_update_qs_env 8 6.0 0.000 0.000 93.325 93.326 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 89.048 89.049 dbt_contract 207 12.4 0.049 0.049 78.509 78.509 hfx_ri_update_forces 1 7.0 1.046 1.055 76.762 76.763 dbt_tas_total 369 13.4 0.073 0.074 65.290 65.291 dbt_tas_multiply 216 13.5 0.001 0.001 62.626 62.627 scf_env_do_scf_inner_loop 6 5.0 0.000 0.000 51.513 51.513 dbt_copy 423 11.8 0.045 0.045 51.057 51.313 dbt_tas_dbm 216 15.5 0.002 0.002 49.685 49.686 dbm_multiply 216 17.5 46.865 46.909 46.865 46.909 hfx_ri_forces_Pmat_3c 1 8.0 2.946 2.963 45.455 45.455 init_scf_loop 2 5.0 0.000 0.000 43.964 43.964 dbt_reshape 175 13.2 17.520 17.576 38.283 38.517 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 29.474 29.474 precalc_derivatives 1 8.0 1.774 1.801 25.736 25.736 dbt_tas_mm_2 91 16.5 0.001 0.001 20.670 20.670 mp_waitall_2 1022 16.5 18.210 18.215 18.210 18.215 dbt_tas_reserve_blocks_index 1323 15.4 1.567 1.573 17.622 17.638 dbm_reserve_blocks 1491 16.3 16.749 16.771 16.749 16.771 dbt_crop 372 13.7 12.141 12.201 15.824 15.985 dbt_tas_mm_3T 77 17.1 0.001 0.001 15.544 15.718 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 15.617 15.617 dbt_communicate_buffer 175 14.2 0.004 0.004 15.097 15.108 dbt_reserve_blocks_index 889 14.5 0.572 0.574 14.364 14.412 dbt_reserve_blocks_index_array 859 13.5 0.007 0.008 14.093 14.135 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 13.945 13.945 build_3c_derivatives 3 9.0 2.246 2.344 13.611 13.614 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 13.422 13.422 dbt_tas_mm_3N 37 15.4 0.000 0.000 11.306 11.452 dbt_tas_copy 248 12.5 4.084 4.115 7.640 7.675 mp_sync 2901 12.8 5.853 6.393 5.853 6.393 hfx_ri_pre_scf_Pmat_int 1 13.0 0.000 0.000 5.064 5.064 dbt_tas_replicate 168 15.1 2.151 2.190 4.493 4.507 hfx_ri_pre_scf_calc_tensors 1 14.0 0.003 0.003 4.372 4.373 hfx_ri_pre_scf_Pmat_copy_2 9 13.0 1.631 1.646 4.221 4.235 hfx_ri_pre_scf_Pmat_RIx3C 9 13.0 0.000 0.000 3.860 3.899 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-HFX_H2O-32", label="RI-HFX_H2O-32", y=187.044, yerr=0.0 Plot: name="RI-HFX_H2O-32_timings_6cpu_1gpu", title="Timings of RI-HFX_H2O-32 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="rest", label="rest", y=66.82300000000001, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=46.865, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=20.877, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=18.21, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=17.52, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=16.749, yerr=0.0 Running RI-MP2_ammonia.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.010 0.011 103.414 103.414 qs_energies 1 2.0 0.000 0.000 103.239 103.239 mp2_main 1 3.0 0.000 0.000 95.699 95.699 mp2_gpw_main 1 4.0 0.001 0.001 95.309 95.310 mp2_ri_gpw_compute_in 1 5.0 0.543 0.546 53.216 53.219 mp2_ri_gpw_compute_in_loop 1 6.0 0.013 0.013 45.031 45.032 mp2_ri_gpw_compute_en 1 5.0 0.092 0.093 42.025 42.029 mp2_ri_gpw_compute_en_RI_loop 1 6.0 12.706 12.723 39.485 39.486 dbcsr_multiply_generic 2666 8.0 0.150 0.152 22.279 22.666 ao_to_mo_and_store_B_mult_1 1328 7.0 0.012 0.012 20.919 21.307 mp2_eri_3c_integrate_gpw 1328 7.0 0.017 0.018 18.473 18.927 mp2_ri_gpw_compute_en_expansio 1040 7.0 0.707 0.720 16.483 16.536 local_gemm 1040 8.0 15.777 15.816 15.777 15.816 make_m2s 5332 9.0 0.047 0.047 11.737 12.068 make_images 5332 10.0 2.205 2.226 11.566 11.898 integrate_v_rspace 1338 8.0 1.045 1.046 10.501 10.833 multiply_cannon 2666 9.0 0.373 0.389 9.909 10.619 multiply_cannon_loop 2666 10.0 0.178 0.183 8.863 9.557 grid_integrate_task_list 1338 9.0 8.169 8.502 8.169 8.502 hybrid_alltoall_any 6683 11.6 7.631 7.959 7.876 8.202 fft_wrap_pw1pw2 26668 10.4 0.137 0.143 7.973 8.137 make_images_data 5332 11.0 0.059 0.059 7.792 8.120 get_2c_integrals 1 6.0 0.004 0.004 7.640 7.641 compute_2c_integrals 1 7.0 0.007 0.007 7.102 7.103 compute_2c_integrals_loop_lm 1 8.0 0.023 0.024 6.939 7.011 mp2_eri_2c_integrate_gpw 1 9.0 1.915 1.950 6.916 6.987 collocate_function 1328 8.0 4.654 4.693 6.787 6.901 scf_env_do_scf 1 3.0 0.000 0.000 6.690 6.691 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 6.690 6.691 multiply_cannon_multrec 2676 11.0 2.596 3.132 5.238 5.729 ao_to_mo_and_store_B_E_Ex_1 1328 7.0 3.546 3.585 5.361 5.425 qs_scf_new_mos 10 5.0 0.000 0.000 5.158 5.165 fft_wrap_pw1pw2_20 10647 11.4 0.022 0.022 4.596 4.796 mp2_ri_gpw_compute_en_ener 1040 7.0 4.714 4.723 4.714 4.723 mp2_ri_gpw_compute_en_comm 221 7.0 0.993 0.995 4.430 4.496 pw_gpu_r3dc1d_3d 13282 12.2 4.042 4.224 4.042 4.224 eigensolver 11 5.8 0.001 0.001 2.924 2.925 pw_gpu_c1dr3d_3d 13280 12.7 2.766 2.789 2.766 2.789 potential_pw2rs 2666 10.0 0.094 0.094 2.744 2.749 fft_wrap_pw1pw2_10 15957 11.5 0.019 0.019 2.483 2.523 mp_sendrecv_dm3 442 8.0 2.411 2.495 2.411 2.495 dbcsr_mm_accdrv_process 5392 12.0 0.234 0.241 2.402 2.451 collocate_single_gaussian 1328 10.0 0.089 0.093 2.384 2.411 cp_fm_diag_elpa 11 6.8 0.000 0.000 2.344 2.346 cp_fm_diag_elpa_base 11 7.8 2.269 2.284 2.343 2.344 mp2_eri_2c_integrate_gpw_pot_l 1328 10.0 0.004 0.004 2.241 2.243 copy_dbcsr_to_fm 1351 8.0 0.031 0.032 2.202 2.239 fill_local_i_aL 884 7.5 2.177 2.200 2.177 2.200 replicate_iaK_2intgroup 1 6.0 2.038 2.041 2.176 2.179 jit_kernel_multiply 8 13.0 2.065 2.096 2.065 2.096 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-MP2_ammonia", label="RI-MP2_ammonia", y=103.414, yerr=0.0 Plot: name="RI-MP2_ammonia_timings_6cpu_1gpu", title="Timings of RI-MP2_ammonia with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="rest", label="rest", y=54.417, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="local_gemm", label="local_gemm", y=15.777, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=12.706, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.169, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=7.631, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_ener", label="mp2_ri_gpw_compute_en_ener", y=4.714, yerr=0.0 Running diag_cu144_broy.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/diag_cu144_broy_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.078 0.081 203.496 203.498 qs_energies 1 2.0 0.000 0.000 202.422 202.424 scf_env_do_scf 1 3.0 0.000 0.000 189.022 189.024 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 189.022 189.024 qs_ks_update_qs_env 15 5.0 0.000 0.000 93.987 93.988 rebuild_ks_matrix 15 6.0 0.000 0.000 93.788 93.790 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 93.788 93.789 qs_vxc_create 15 8.0 0.056 0.112 57.094 57.156 qs_scf_new_mos 15 5.0 0.000 0.000 53.131 53.178 fft_wrap_pw1pw2 1086 10.0 0.030 0.031 50.049 50.074 calculate_dispersion_nonloc 15 9.0 10.629 10.629 49.025 49.029 eigensolver 15 6.0 0.002 0.002 44.494 44.525 qs_rho_update_rho_low 16 5.0 0.000 0.000 40.148 40.148 calculate_rho_elec 16 6.0 0.178 0.178 40.148 40.148 sum_up_and_integrate 15 8.0 0.000 0.000 35.214 35.276 integrate_v_rspace 15 9.0 0.047 0.047 35.191 35.253 grid_collocate_task_list 16 7.0 28.955 28.981 28.955 28.981 grid_integrate_task_list 15 10.0 28.078 28.078 28.078 28.078 cp_fm_diag_elpa 15 7.0 0.000 0.000 27.881 27.884 cp_fm_diag_elpa_base 15 8.0 26.096 26.657 27.875 27.876 pw_gpu_c1dr3d_3d_ps 585 12.1 5.584 5.609 26.022 26.041 fft_wrap_pw1pw2_150 765 11.0 0.005 0.005 25.552 25.564 pw_gpu_r3dc1d_3d_ps 501 11.9 4.967 5.451 23.990 23.996 cp_fm_cholesky_restore 45 7.0 14.740 15.440 14.740 15.440 fft_wrap_pw1pw2_200 197 11.3 0.001 0.001 12.684 12.699 density_rs2pw 16 7.0 0.001 0.002 11.009 11.038 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 9.578 9.578 vdW_energy 15 10.0 9.078 9.085 9.078 9.085 pw_gpu_ffc 585 13.1 8.875 8.890 8.875 8.890 pw_gpu_cff 501 12.9 8.296 8.296 8.296 8.296 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 8.193 8.296 xc_vxc_pw_create 15 9.0 0.174 0.175 8.013 8.015 mp_alltoall_z22v 1086 14.0 6.794 7.397 6.794 7.397 potential_pw2rs 15 10.0 0.007 0.007 7.065 7.128 pw_gpu_sf 585 13.1 6.936 6.947 6.936 6.947 pw_gpu_fg 501 12.9 6.607 6.643 6.607 6.643 copy_dbcsr_to_fm 16 5.9 0.001 0.001 5.835 5.887 dbcsr_complete_redistribute 46 8.3 1.668 1.673 5.422 5.486 fft_wrap_pw1pw2_10 62 10.5 0.000 0.000 5.293 5.299 cp_fm_uplo_to_full 30 8.0 3.651 4.875 3.651 4.875 xc_pw_derive 90 11.0 0.001 0.001 4.678 4.711 build_core_ppnl 1 5.0 4.656 4.707 4.656 4.707 xc_rho_set_and_dset_create 15 10.0 0.130 0.133 4.647 4.657 x_to_yz 585 13.1 1.003 1.020 4.595 4.643 yz_to_x 501 12.9 0.863 0.876 4.066 4.591 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="diag_cu144_broy", label="diag_cu144_broy", y=203.496, yerr=0.0 Plot: name="diag_cu144_broy_timings_6cpu_1gpu", title="Timings of diag_cu144_broy with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="rest", label="rest", y=94.99800000000002, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=28.955, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=28.078, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=26.096, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=14.74, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="calculate_dispersion_nonloc", label="calculate_dispersion_nonloc", y=10.629, yerr=0.0 Running bench_dftb.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/bench_dftb_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.043 0.044 259.489 259.489 qs_energies 1 2.0 0.000 0.000 259.362 259.362 ls_scf 1 3.0 0.000 0.000 258.536 258.536 ls_scf_main 1 4.0 0.001 0.001 249.080 249.080 density_matrix_trs4 11 5.0 0.008 0.008 207.144 207.210 dbcsr_multiply_generic 185 6.1 0.310 0.310 168.885 168.902 multiply_cannon 185 7.1 1.710 1.908 116.788 116.862 multiply_cannon_loop 185 8.1 0.327 0.331 102.574 103.080 multiply_cannon_multrec 370 9.1 77.256 77.632 86.236 86.662 make_m2s 370 7.1 0.028 0.028 43.937 44.099 make_images 370 8.1 10.709 11.225 42.901 43.054 ls_scf_dm_to_ks 11 5.0 0.000 0.000 37.610 37.675 matrix_ls_to_qs 11 6.0 0.000 0.000 34.698 34.836 dbcsr_complete_redistribute 23 7.5 21.484 21.730 29.790 29.934 matrix_decluster 11 7.0 0.000 0.000 26.999 27.142 arnoldi_extremal 12 6.1 0.000 0.000 22.692 22.694 arnoldi_normal_ev 12 7.1 0.009 0.009 22.691 22.694 build_subspace 23 8.1 0.061 0.062 22.238 22.239 dbcsr_matrix_vector_mult 652 9.0 0.147 0.147 20.863 21.189 dbcsr_matrix_vector_mult_local 652 10.0 19.900 20.225 19.907 20.232 make_images_data 370 9.1 0.012 0.012 16.681 17.178 hybrid_alltoall_any 393 9.9 11.549 11.709 16.167 16.654 calculate_norms 740 9.1 15.389 15.451 15.389 15.451 dbcsr_finalize 559 7.6 0.166 0.169 13.960 14.332 dbcsr_merge_all 510 8.6 2.513 2.804 12.803 13.167 dbcsr_copy 761 7.5 1.684 1.693 9.403 9.417 setup_rec_index_2d 370 8.1 9.348 9.394 9.348 9.394 dbcsr_special_finalize 555 9.1 0.010 0.010 9.133 9.139 dbcsr_add_d 280 6.0 0.001 0.001 8.245 8.600 dbcsr_add_anytype 280 7.0 3.601 3.622 8.244 8.599 dbcsr_sort_indices 1283 10.0 8.541 8.576 8.541 8.576 dbcsr_dot 144 6.3 7.572 7.600 8.173 8.493 ls_scf_init_scf 1 4.0 0.000 0.000 8.009 8.009 dbcsr_copy_into_existing 11 8.0 7.698 7.703 7.698 7.703 ls_scf_init_matrix_S 1 5.0 0.000 0.000 7.594 7.594 dbcsr_mm_accdrv_process 14501 10.0 0.784 0.798 7.048 7.076 tree_to_linear_d 23 10.5 6.855 6.882 6.855 6.882 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.000 6.832 6.833 dbcsr_mm_accdrv_process_sort 14501 11.0 6.264 6.277 6.264 6.277 dbcsr_merge_single_wm 370 10.1 0.547 0.548 5.919 5.927 mp_waitall_1 5192 10.5 4.942 5.589 4.942 5.589 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="bench_dftb", label="bench_dftb", y=259.489, yerr=0.0 Plot: name="bench_dftb_timings_6cpu_1gpu", title="Timings of bench_dftb with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="rest", label="rest", y=113.91099999999997, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=77.256, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=21.484, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=19.9, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="calculate_norms", label="calculate_norms", y=15.389, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=11.549, yerr=0.0 Running dbcsr.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/dbcsr_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.003 0.004 46.259 46.259 lib_test 1 2.0 0.000 0.000 46.247 46.254 dbcsr_run_tests 3 3.0 0.000 0.000 46.247 46.253 test_multiplies_multiproc 3 4.0 0.001 0.001 35.590 35.690 dbcsr_multiply_generic 9 5.0 0.002 0.002 27.536 27.537 multiply_cannon 9 6.0 0.435 0.528 18.203 18.400 multiply_cannon_loop 9 7.0 0.002 0.002 16.842 16.919 multiply_cannon_multrec 18 8.0 8.801 8.857 15.633 15.697 dbcsr_make_random_matrix 9 4.0 7.178 7.239 10.522 10.621 dbcsr_finalize 27 5.7 0.001 0.001 7.331 7.497 dbcsr_merge_all 18 6.5 3.515 3.515 7.214 7.369 dbcsr_mm_accdrv_process 8199 9.0 1.262 1.425 6.594 6.602 dbcsr_redistribute 9 5.0 3.429 3.435 5.565 5.566 make_m2s 18 6.0 0.001 0.001 4.858 4.859 make_images 18 7.0 0.356 0.363 4.823 4.823 dbcsr_mm_accdrv_process_sort 8199 10.0 4.450 4.459 4.450 4.459 make_images_data 18 8.0 0.001 0.001 2.785 2.792 hybrid_alltoall_any 18 9.0 2.399 2.400 2.754 2.760 dbcsr_data_copy_aa2 18 7.5 1.746 1.893 1.746 1.893 mp_alltoall_d11v 27 6.0 1.873 1.874 1.873 1.874 tree_to_linear_d 9 7.0 1.822 1.830 1.822 1.830 dbcsr_data_release 507 7.7 1.339 1.354 1.339 1.354 jit_kernel_multiply 6 10.0 0.882 1.045 0.882 1.045 dbcsr_checksum 6 5.0 0.961 0.978 0.979 0.979 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="dbcsr", label="dbcsr", y=46.259, yerr=0.0 Plot: name="dbcsr_timings_6cpu_1gpu", title="Timings of dbcsr with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="rest", label="rest", y=18.886, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=8.801, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=7.178, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_mm_accdrv_process_sort", label="dbcsr_mm_accdrv_process_sort", y=4.45, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.515, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_redistribute", label="dbcsr_redistribute", y=3.429, yerr=0.0 Running MQAE_single_node.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/MQAE_single_node_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.046 0.048 197.428 197.428 qs_mol_dyn_low 1 2.0 0.004 0.004 195.960 195.994 qs_forces 6 3.8 0.001 0.001 123.549 123.549 qs_energies 6 4.8 0.000 0.000 116.711 116.712 scf_env_do_scf 6 5.8 0.000 0.000 109.153 109.154 scf_env_do_scf_inner_loop 113 6.2 0.005 0.007 101.268 101.268 velocity_verlet 5 3.0 0.003 0.003 93.210 93.258 rebuild_ks_matrix 119 8.1 0.000 0.001 82.482 82.482 qs_ks_build_kohn_sham_matrix 119 9.1 0.019 0.019 82.481 82.482 qs_ks_update_qs_env 119 7.3 0.001 0.001 77.814 77.815 fft_wrap_pw1pw2 2059 12.4 0.045 0.046 64.710 64.752 fft_wrap_pw1pw2_150 1321 13.9 0.009 0.009 61.969 62.064 qs_vxc_create 119 10.1 0.002 0.002 52.764 52.764 xc_vxc_pw_create 119 11.1 1.485 1.491 52.762 52.762 qmmm_el_coupling 6 3.8 0.000 0.000 37.766 37.768 qmmm_elec_with_gaussian 6 4.8 0.019 0.019 37.759 37.761 xc_pw_derive 714 13.1 0.010 0.011 36.578 36.618 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 36.259 36.581 pw_gpu_c1dr3d_3d_ps 1095 14.8 10.394 10.434 34.971 34.985 qmmm_elec_gaussian_low_G 6 6.8 31.603 31.958 31.603 31.958 qmmm_forces 6 3.8 0.001 0.001 31.845 31.845 qmmm_forces_with_gaussian 6 4.8 0.023 0.023 30.925 31.011 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 29.656 29.741 pw_gpu_r3dc1d_3d_ps 964 14.0 9.120 9.190 29.681 29.710 xc_rho_set_and_dset_create 119 12.1 2.386 2.390 26.601 26.626 qmmm_forces_gaussian_low_G 6 6.8 24.727 24.866 24.727 24.866 xc_pw_divergence 119 12.1 0.006 0.006 24.277 24.293 qs_rho_update_rho_low 119 7.3 0.001 0.001 21.941 22.001 calculate_rho_elec 119 8.3 1.068 1.069 21.940 22.000 density_rs2pw 119 9.3 0.008 0.008 15.778 15.907 sum_up_and_integrate 119 10.1 0.002 0.002 13.740 13.742 dbcsr_multiply_generic 2598 12.3 0.095 0.098 13.586 13.657 integrate_v_rspace 119 11.1 0.020 0.020 13.566 13.569 mp_alltoall_z22v 2059 16.4 12.689 12.974 12.689 12.974 multiply_cannon 2598 13.3 0.207 0.209 12.026 12.051 multiply_cannon_loop 2598 14.3 0.249 0.252 11.567 11.593 potential_pw2rs 119 12.1 0.033 0.034 9.568 9.569 multiply_cannon_multrec 5196 15.3 4.028 4.097 9.467 9.554 x_to_yz 1095 15.8 2.313 2.349 9.276 9.400 qs_ks_ddapc 119 10.1 0.002 0.002 8.756 8.781 pw_gpu_sf 1095 15.8 8.723 8.753 8.723 8.753 pw_gpu_fg 964 15.0 8.033 8.163 8.033 8.163 init_scf_loop 6 6.8 0.000 0.000 7.882 7.882 yz_to_x 964 15.0 1.811 1.816 7.536 7.655 qs_scf_new_mos 113 7.2 0.001 0.001 7.645 7.645 qs_scf_loop_do_ot 113 8.2 0.001 0.001 7.644 7.644 ot_scf_mini 113 9.2 0.002 0.002 7.360 7.362 pw_gpu_ffc 1095 15.8 6.559 6.627 6.559 6.627 init_scf_run 6 5.8 0.000 0.000 5.427 5.427 scf_env_initial_rho_setup 6 6.8 0.000 0.000 5.426 5.426 dbcsr_mm_accdrv_process 13992 16.0 0.533 0.536 5.372 5.387 grid_collocate_task_list 119 9.3 5.065 5.118 5.065 5.118 xc_functional_eval 238 13.1 0.003 0.003 5.073 5.076 qmmm_forces_gaussian_low_R 6 6.8 0.000 0.000 4.928 4.982 qmmm_forces_with_gaussian_LG 6 7.8 4.928 4.982 4.928 4.982 pw_gpu_cff 964 15.0 4.925 4.977 4.925 4.977 ot_mini 113 10.2 0.001 0.001 4.941 4.942 jit_kernel_multiply 24 14.7 4.800 4.812 4.800 4.812 pw_poisson_solve 125 9.9 0.003 0.004 4.806 4.809 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 4.698 4.698 qmmm_elec_gaussian_low_R 6 6.8 0.000 0.000 4.656 4.690 qmmm_elec_with_gaussian_LG 6 7.8 4.656 4.690 4.656 4.690 pw_derive 1089 13.4 4.228 4.259 4.228 4.259 qs_ot_get_derivative 113 11.2 0.001 0.001 4.045 4.046 grid_integrate_task_list 119 12.1 3.977 3.978 3.977 3.978 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="MQAE_single_node", label="MQAE_single_node", y=197.428, yerr=0.0 Plot: name="MQAE_single_node_timings_6cpu_1gpu", title="Timings of MQAE_single_node with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="rest", label="rest", y=108.895, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=31.603, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=24.727, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=12.689, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.394, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=9.12, yerr=0.0 Summary: Performance test took 22 minutes. Status: OK ---> Removed intermediate container 0502f53953a7 ---> 62bbe369b5b6 Step 46/47 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in 3c8de2a06872 ---> Removed intermediate container 3c8de2a06872 ---> 978c5ec59b50 Step 47/47 : ENTRYPOINT [] ---> Running in efbfe7e2b8ca ---> Removed intermediate container efbfe7e2b8ca ---> 151d3a871a59 [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built 151d3a871a59 Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2026-01-09 06:47:57+00:00