StartDate: 2026-05-15 20:05:35+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: d43dca4c757fce3ab5e1f10939987612a9525bb2 CommitTime: 2026-05-15 20:38:26 +0200 CommitAuthor: Dynamics of Condensed Matter CommitSubject: Add finite-volume Kubo transport property (E. Prodan) (#5209) #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=d43dca4c757fce3ab5e1f10939987612a9525bb2 SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 417.7MB Step 1/46 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 02cb0e091e33: Waiting 9c3d619183d2: Waiting 5a2aba542b08: Waiting 6cb9b761b877: Waiting 2cd52cbb1ebe: Waiting 6e8af4fd0a07: Waiting 7f7602a82106: Waiting 15a17189b2df: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 2cd52cbb1ebe: Verifying Checksum 2cd52cbb1ebe: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 6e8af4fd0a07: Download complete 02cb0e091e33: Verifying Checksum 02cb0e091e33: Download complete 9c3d619183d2: Download complete 7f7602a82106: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/46 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/46 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/46 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/46 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/46 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/46 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> d8bfc1674c90 Step 8/46 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> de928c312410 Step 9/46 : RUN mkdir scripts ---> Using cache ---> 4aed4b85b643 Step 10/46 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./scripts/ ---> Using cache ---> ce9efe84db60 Step 11/46 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> b6277fd0f5d6 Step 12/46 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --gpu-ver=V100 --dry-run ---> Using cache ---> dc9d9c9cec02 Step 13/46 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> 33d924df7e2c Step 14/46 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> 1f1b396ca359 Step 15/46 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 48e32731327f Step 16/46 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> de7e4137e2e7 Step 17/46 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> 7e6dac44dc05 Step 18/46 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> b1d950667934 Step 19/46 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> 2ae460a79c5b Step 20/46 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> 9162e9cff8b8 Step 21/46 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> b2d58e4bc4bd Step 22/46 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> 1b123dba6269 Step 23/46 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> Using cache ---> aac7b756ebc0 Step 24/46 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Using cache ---> 40cf4af23da4 Step 25/46 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> Using cache ---> 2fa2556f8070 Step 26/46 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Using cache ---> 08efdbb17bd8 Step 27/46 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> Using cache ---> 31b3064d7339 Step 28/46 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Using cache ---> cdb353c57325 Step 29/46 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> Using cache ---> 67329b526e18 Step 30/46 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Using cache ---> b40ae9a6ca99 Step 31/46 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> Using cache ---> 1d377e548d92 Step 32/46 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Using cache ---> abb7f7fdc159 Step 33/46 : WORKDIR /opt/cp2k ---> Using cache ---> 611010d2d478 Step 34/46 : COPY ./src ./src ---> cfb87e0a6557 Step 35/46 : COPY ./data ./data ---> 27c3bc1012d3 Step 36/46 : COPY ./tools/build_utils ./tools/build_utils ---> 4170448dc4b8 Step 37/46 : COPY ./cmake ./cmake ---> 609df2e9fccf Step 38/46 : COPY ./CMakeLists.txt . ---> 94ae044bfeac Step 39/46 : COPY ./tools/docker/scripts/build_cp2k.sh . ---> faf4fbc087e3 Step 40/46 : RUN ./build_cp2k.sh toolchain_cuda_V100 psmp ---> Running in d02d17ae417e ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpi.so (found version "5.0") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpicxx.so (found version "5.0") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpifort.so (found version "5.0") -- Found MPI: TRUE (found version "5.0") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "5.0") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.33 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Checking for module 'libxsmm-shared' -- Found libxsmm-shared, version 1.17.0 -- Checking for module 'libxsmmf-shared' -- Found libxsmmf-shared, version 1.17.0 -- Checking for module 'libxsmmext-shared' -- Found libxsmmext-shared, version 1.17.0 -- Checking for module 'libxsmmnoblas-shared' -- Found libxsmmnoblas-shared, version 1.17.0 -- Found LibXSMM: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include -- Using LIBXSMM for Small Matrix Multiplication -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a -- CP2K_WITH_GPU is deprecated in favor of CMAKE_HIP_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 70 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "5.0") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Checking for module 'libxsmmf' -- Found libxsmmf, version 1.17.0 -- Checking for module 'libxsmmext' -- Found libxsmmext, version 1.17.0 ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "2.1.1") found components: C Fortran -- Found MPI: TRUE (found version "5.0") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.11/include -- Checking for module 'libint2' -- Package 'libint2', required by 'virtual:world', not found -- Found Libint2: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.7.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "2.1.1") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.11/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'elpa;elpa_openmp;elpa-openmp-2019.05.001;elpa_openmp-2019.11.001;elpa_openmp-2020.05.001;elpa-2019.05.001;elpa-2019.11.001;elpa-2020.05.001' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so -- Checking for module 'libvdwxc>=0.5.0' -- Found libvdwxc, version 0.5.0 -- Checking for module 'fftw3' -- Found fftw3, version 3.3.11 -- Found LibVDWXC: vdwxc;fftw3 (Required is at least version "0.5.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - vendor: OpenBLAS - include directories: /opt/cp2k-toolchain/install/openblas-0.3.33/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so - LAPACK - include directories: /opt/cp2k-toolchain/install/openblas-0.3.33/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so - MPI - include directories: /opt/cp2k-toolchain/install/mpich-5.0.1/include - libraries: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpi.so - MPI_F08: ON - ScaLAPACK - vendor: auto - include directories: - libraries: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a - Hardware Acceleration: - CUDA: - GPU architecture number: 70 - GPU profiling enabled: - GPU accelerated modules - ELPA module: ON - GRID module: ON - DBM module: ON - PW module: ON - LibXC - version: 7.0.0 - include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - version: 2.1.1 - include directories: /opt/cp2k-toolchain/install/hdf5-2.1.1/include - libraries: hdf5-shared - FFTW3 - include directories: /opt/cp2k-toolchain/install/fftw-3.3.11/include - libraries: /opt/cp2k-toolchain/install/fftw-3.3.11/lib/libfftw3.a - LIBXSMM - include directories: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include - libraries: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmext.so;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so;/opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmf.so;:libxsmmext.a;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so - SpLA - include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - SIRIUS - include directories: - libraries: - COSMA - include directories: /opt/cp2k-toolchain/install/COSMA-2.8.4/include - libraries: MPI::MPI_CXX;costa::costa;$;$;$<$:cosma::BLAS::blas>;$;$<$:Tiled-MM::Tiled-MM>;$<$:Tiled-MM::Tiled-MM>;$<$:semiprof::semiprof>;$<$:cosma::scalapack::scalapack> - Libint2 - include directories: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include - libraries: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/lib/libint2.a - ELPA - include directories: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/include/elpa_openmp-2026.02.001 - libraries: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -------------------------------------------------------------------- - - - List of dependencies not included in this build - - - -------------------------------------------------------------------- - DFTD4 - DeePMD - PEXSI - ACE (libpace) - TBLITE - Spglib - LibSMEAGOL - MiMiC - openPMD - DLA-Future - PLUMED - LibFCI - Libvori - LibTorch - TREXIO - GreenX After building and installing CP2K the regtests can be run with the following command: /opt/cp2k/tests/do_regtest.py /opt/cp2k/bin psmp -- Configuring done (14.9s) -- Generating done (0.5s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container d02d17ae417e ---> a44b78e8aaba Step 41/46 : COPY ./benchmarks ./benchmarks ---> 0b6ef1c10473 Step 42/46 : COPY ./tools/regtesting ./tools/regtesting ---> 297700a758b5 Step 43/46 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> 6f96794b840c Step 44/46 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in 96722a6ebba0 ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma xsmm dbcsr_acc sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 83 input files and 0 errors. ========== Running Performance Test ========== Plot: name="total_timings_6cpu_1gpu", title="Total Timings with 6 CPU Cores and 1 GPU", ylabel="time [s]" Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.028 0.029 101.163 101.163 qs_mol_dyn_low 1 2.0 0.006 0.007 100.743 100.746 qs_forces 11 3.9 0.002 0.002 100.695 100.696 qs_energies 11 4.9 0.001 0.001 89.270 89.272 scf_env_do_scf 11 5.9 0.001 0.001 73.424 73.424 velocity_verlet 10 3.0 0.001 0.002 62.969 62.986 scf_env_do_scf_inner_loop 108 6.5 0.006 0.008 62.361 62.361 rebuild_ks_matrix 119 8.3 0.001 0.001 27.441 27.441 qs_ks_build_kohn_sham_matrix 119 9.3 0.019 0.019 27.440 27.441 dbcsr_multiply_generic 2286 12.5 0.145 0.146 26.096 26.150 qs_ks_update_qs_env 119 7.6 0.001 0.001 25.167 25.168 qs_scf_new_mos 108 7.5 0.001 0.001 21.261 21.263 qs_scf_loop_do_ot 108 8.5 0.001 0.001 21.260 21.262 qs_rho_update_rho_low 119 7.7 0.001 0.001 20.809 20.824 calculate_rho_elec 119 8.7 0.885 0.890 20.808 20.823 ot_scf_mini 108 9.5 0.003 0.003 19.285 19.286 fft_wrap_pw1pw2 1201 11.6 0.023 0.024 16.975 16.988 fft_wrap_pw1pw2_140 487 12.2 0.003 0.003 14.632 14.644 sum_up_and_integrate 119 10.3 0.002 0.003 14.401 14.451 integrate_v_rspace 119 11.3 0.361 0.364 14.308 14.358 multiply_cannon 2286 13.5 0.347 0.353 13.039 13.127 multiply_cannon_loop 2286 14.5 0.262 0.263 11.863 11.866 make_m2s 4572 13.5 0.044 0.044 11.412 11.504 make_images 4572 14.5 1.266 1.322 11.232 11.324 ot_mini 108 10.5 0.001 0.001 11.267 11.268 density_rs2pw 119 9.7 0.008 0.008 11.028 11.110 init_scf_loop 11 6.9 0.000 0.000 10.974 10.975 grid_collocate_task_list 119 9.7 8.865 8.922 8.865 8.922 pw_gpu_r3dc1d_3d_ps 606 13.1 2.385 2.396 8.678 8.678 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.331 8.332 pw_gpu_c1dr3d_3d_ps 595 14.2 2.274 2.298 8.268 8.281 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.883 8.024 prepare_preconditioner 11 7.9 0.000 0.000 7.602 7.611 make_preconditioner 11 8.9 0.000 0.000 7.602 7.611 grid_integrate_task_list 119 12.3 7.372 7.427 7.372 7.427 init_scf_run 11 5.9 0.000 0.000 6.846 6.846 scf_env_initial_rho_setup 11 6.9 0.000 0.001 6.845 6.845 hybrid_alltoall_any 4725 16.4 4.880 4.900 6.728 6.798 qs_ot_get_derivative 108 11.5 0.002 0.002 6.779 6.781 make_images_data 4572 15.5 0.054 0.054 6.593 6.640 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.335 6.599 potential_pw2rs 119 12.3 0.036 0.037 6.575 6.576 multiply_cannon_multrec 4572 15.5 2.150 2.154 6.317 6.330 ot_diis_step 108 11.5 0.005 0.006 4.462 4.462 mp_alltoall_z22v 1201 15.6 4.328 4.360 4.328 4.360 build_core_ppl_forces 11 5.9 4.048 4.152 4.048 4.152 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.044 4.100 wfi_extrapolate 11 7.9 0.001 0.001 3.988 3.988 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.874 3.925 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.920 3.923 apply_single 119 13.6 0.001 0.001 3.919 3.923 mp_waitall_1 64495 16.9 3.808 3.897 3.808 3.897 dbcsr_complete_redistribute 329 12.2 1.378 1.429 3.531 3.817 dbcsr_mm_accdrv_process 9594 16.2 0.860 0.941 3.780 3.789 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.398 3.398 calculate_dm_sparse 119 9.5 0.001 0.001 3.335 3.337 qs_ot_get_p 119 10.4 0.001 0.001 3.310 3.310 qs_create_task_list 11 7.9 0.000 0.000 3.172 3.277 generate_qs_task_list 11 8.9 1.165 1.173 3.172 3.277 multiply_cannon_sync_h2d 4572 15.5 3.109 3.118 3.109 3.118 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.918 2.918 transfer_rs2pw 487 10.6 0.008 0.008 2.656 2.779 pw_poisson_solve 119 10.3 0.003 0.003 2.745 2.746 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.685 2.699 yz_to_x 606 14.1 0.479 0.481 2.683 2.698 x_to_yz 595 15.2 0.533 0.535 2.657 2.674 calculate_first_density_matrix 1 7.0 0.000 0.000 2.407 2.407 jit_kernel_multiply 11 15.6 2.312 2.400 2.312 2.400 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.366 2.367 transfer_rs2pw_140 130 11.5 1.589 1.602 2.207 2.338 cp_fm_cholesky_invert 11 10.9 2.298 2.298 2.298 2.298 qs_ot_get_derivative_taylor 59 13.0 0.003 0.003 2.262 2.263 pw_gpu_fg 606 14.1 2.191 2.192 2.191 2.192 copy_fm_to_dbcsr 176 11.2 0.002 0.002 1.852 2.124 qs_ot_p2m_diag 50 11.0 0.087 0.088 2.112 2.113 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.072 2.086 dbcsr_special_finalize 6858 15.5 0.039 0.039 2.072 2.078 build_core_ppl 11 7.9 2.015 2.053 2.015 2.053 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64", label="H2O-64", y=101.163, yerr=0.0 Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=71.66999999999999, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.865, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.372, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.88, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=4.328, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=4.048, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.028 0.029 96.485 96.486 qs_mol_dyn_low 1 2.0 0.004 0.005 96.024 96.027 qs_forces 11 3.9 0.002 0.002 95.976 95.976 qs_energies 11 4.9 0.001 0.001 84.330 84.332 scf_env_do_scf 11 5.9 0.001 0.001 68.208 68.208 velocity_verlet 10 3.0 0.001 0.002 61.175 61.192 scf_env_do_scf_inner_loop 96 6.5 0.005 0.007 56.844 56.844 rebuild_ks_matrix 107 8.3 0.001 0.001 26.578 26.579 qs_ks_build_kohn_sham_matrix 107 9.3 0.018 0.018 26.578 26.579 qs_ks_update_qs_env 107 7.6 0.001 0.001 23.969 23.970 dbcsr_multiply_generic 1966 12.4 0.126 0.126 23.670 23.733 qs_rho_update_rho_low 107 7.7 0.001 0.001 19.223 19.242 calculate_rho_elec 107 8.7 0.788 0.794 19.222 19.241 qs_scf_new_mos 96 7.5 0.001 0.001 18.775 18.783 qs_scf_loop_do_ot 96 8.5 0.001 0.001 18.774 18.782 ot_scf_mini 96 9.5 0.003 0.003 17.024 17.026 fft_wrap_pw1pw2 1081 11.6 0.022 0.022 15.366 15.384 sum_up_and_integrate 107 10.3 0.002 0.002 14.801 14.873 integrate_v_rspace 107 11.3 0.321 0.323 14.716 14.788 fft_wrap_pw1pw2_140 439 12.2 0.003 0.003 13.232 13.264 multiply_cannon 1966 13.4 0.303 0.305 11.858 11.973 init_scf_loop 11 6.9 0.000 0.000 11.276 11.276 multiply_cannon_loop 1966 14.4 0.228 0.229 10.835 10.866 make_m2s 3932 13.4 0.038 0.039 10.323 10.424 make_images 3932 14.4 1.131 1.173 10.163 10.263 density_rs2pw 107 9.7 0.007 0.007 9.985 10.082 ot_mini 96 10.5 0.001 0.001 9.918 9.918 grid_integrate_task_list 107 12.3 8.503 8.576 8.503 8.576 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.554 8.554 grid_collocate_task_list 107 9.7 8.418 8.492 8.418 8.492 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.909 8.037 pw_gpu_r3dc1d_3d_ps 546 13.1 2.205 2.269 7.936 7.940 prepare_preconditioner 11 7.9 0.000 0.000 7.766 7.772 make_preconditioner 11 8.9 0.000 0.000 7.766 7.772 pw_gpu_c1dr3d_3d_ps 535 14.2 2.049 2.076 7.403 7.425 init_scf_run 11 5.9 0.000 0.000 6.917 6.917 scf_env_initial_rho_setup 11 6.9 0.000 0.001 6.917 6.917 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.524 6.780 hybrid_alltoall_any 4079 16.3 4.461 4.549 6.180 6.189 make_images_data 3932 15.4 0.047 0.047 6.015 6.018 qs_ot_get_derivative 96 11.5 0.001 0.001 5.979 5.981 multiply_cannon_multrec 3932 15.4 1.897 1.916 5.890 5.901 potential_pw2rs 107 12.3 0.033 0.033 5.891 5.892 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.251 4.440 build_core_ppl_forces 11 5.9 4.061 4.163 4.061 4.163 wfi_extrapolate 11 7.9 0.001 0.001 4.005 4.005 mp_alltoall_z22v 1081 15.6 3.878 3.974 3.878 3.974 ot_diis_step 96 11.5 0.005 0.005 3.917 3.917 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.844 3.893 dbcsr_complete_redistribute 317 12.2 1.380 1.399 3.628 3.890 dbcsr_mm_accdrv_process 8450 16.1 0.802 0.887 3.652 3.653 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.602 3.602 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.535 3.536 apply_single 107 13.6 0.001 0.001 3.535 3.536 qs_create_task_list 11 7.9 0.000 0.000 3.367 3.506 generate_qs_task_list 11 8.9 1.451 1.461 3.367 3.506 mp_waitall_1 55487 16.8 3.369 3.483 3.369 3.483 calculate_dm_sparse 107 9.5 0.001 0.001 3.101 3.110 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.922 2.922 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.872 2.895 qs_ot_get_p 107 10.4 0.001 0.001 2.884 2.886 multiply_cannon_sync_h2d 3932 15.4 2.811 2.826 2.811 2.826 transfer_rs2pw 439 10.6 0.008 0.008 2.398 2.541 yz_to_x 546 14.1 0.430 0.434 2.433 2.503 pw_poisson_solve 107 10.3 0.002 0.002 2.483 2.488 calculate_first_density_matrix 1 7.0 0.000 0.000 2.432 2.432 jit_kernel_multiply 11 15.6 2.309 2.394 2.309 2.394 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.369 2.369 x_to_yz 535 15.2 0.471 0.471 2.347 2.368 cp_fm_cholesky_invert 11 10.9 2.333 2.333 2.333 2.333 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.266 2.288 transfer_rs2pw_140 118 11.5 1.424 1.436 2.000 2.149 copy_fm_to_dbcsr 170 11.1 0.001 0.001 1.821 2.076 build_core_ppl 11 7.9 1.988 2.022 1.988 2.022 pw_gpu_fg 546 14.1 1.995 1.997 1.995 1.997 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64_nonortho", label="H2O-64_nonortho", y=96.485, yerr=0.0 Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=67.164, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.503, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.418, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.461, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=4.061, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=3.878, yerr=0.0 Running w64PBE.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/w64PBE_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.046 0.048 238.330 238.330 qs_mol_dyn_low 1 2.0 0.005 0.007 237.593 237.596 qs_forces 11 3.9 0.002 0.002 237.545 237.546 qs_energies 11 4.9 0.001 0.001 206.325 206.325 velocity_verlet 10 3.0 0.001 0.002 186.418 186.435 scf_env_do_scf 11 5.9 0.001 0.002 184.421 184.421 scf_env_do_scf_inner_loop 106 6.8 0.006 0.008 160.504 160.504 rebuild_ks_matrix 117 8.5 0.001 0.001 119.849 119.851 qs_ks_build_kohn_sham_matrix 117 9.5 0.019 0.019 119.848 119.850 qs_ks_update_qs_env 120 7.8 0.001 0.001 106.154 106.157 fft_wrap_pw1pw2 2000 12.9 0.048 0.048 69.848 69.867 qs_vxc_create 117 10.5 0.002 0.002 66.417 66.424 xc_vxc_pw_create 117 11.5 1.460 1.477 66.415 66.422 fft_wrap_pw1pw2_200 1298 14.3 0.008 0.008 66.227 66.241 qs_rho_update_rho_low 117 7.9 0.001 0.001 61.381 61.391 calculate_rho_elec 117 8.9 1.228 1.231 61.381 61.390 grid_collocate_task_list 117 9.9 41.766 41.817 41.766 41.817 xc_pw_derive 702 13.5 0.010 0.010 39.025 39.084 sum_up_and_integrate 117 10.5 0.003 0.003 38.992 39.021 integrate_v_rspace 117 11.5 0.211 0.212 38.800 38.830 xc_rho_set_and_dset_create 117 12.5 0.945 0.946 38.637 38.661 pw_gpu_c1dr3d_3d_ps 1053 15.2 10.731 10.771 37.521 37.523 pw_gpu_r3dc1d_3d_ps 947 14.5 9.599 9.723 32.268 32.290 grid_integrate_task_list 117 12.5 27.628 27.668 27.628 27.668 xc_pw_divergence 117 12.5 0.005 0.005 25.913 25.945 init_scf_loop 14 6.8 0.001 0.001 23.849 23.849 dbcsr_multiply_generic 2035 12.5 0.136 0.137 19.207 19.242 mp_alltoall_z22v 2000 16.9 18.501 18.681 18.501 18.681 density_rs2pw 117 9.9 0.009 0.009 18.361 18.422 xc_functional_eval 117 13.5 0.002 0.002 17.342 17.384 pbe_lda_eval 117 14.5 17.340 17.382 17.340 17.382 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 16.604 16.747 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 14.467 14.467 qs_scf_new_mos 106 7.8 0.001 0.001 13.256 13.257 qs_scf_loop_do_ot 106 8.8 0.001 0.001 13.255 13.257 x_to_yz 1053 16.2 2.628 2.633 12.427 12.478 ot_scf_mini 106 9.8 0.003 0.003 11.861 11.865 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 11.694 11.696 potential_pw2rs 117 12.5 0.057 0.058 10.961 10.970 yz_to_x 947 15.5 1.847 1.849 10.548 10.685 multiply_cannon 2035 13.5 0.299 0.299 10.235 10.242 init_scf_run 11 5.9 0.000 0.000 9.720 9.721 scf_env_initial_rho_setup 11 6.9 0.000 0.000 9.720 9.720 multiply_cannon_loop 2035 14.5 0.230 0.231 9.255 9.265 prepare_preconditioner 14 7.8 0.000 0.000 8.917 8.923 make_preconditioner 14 8.8 0.000 0.000 8.917 8.923 build_core_ppl_forces 11 5.9 8.498 8.657 8.498 8.657 pw_gpu_sf 1053 16.2 8.416 8.422 8.416 8.422 make_m2s 4070 13.5 0.041 0.042 7.506 7.513 pw_gpu_fg 947 15.5 7.442 7.452 7.442 7.452 make_images 4070 14.5 1.048 1.051 7.328 7.336 ot_mini 106 10.8 0.001 0.001 7.195 7.198 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 7.146 7.184 wfi_extrapolate 11 7.9 0.001 0.001 6.957 6.957 pw_gpu_ffc 1053 16.2 5.927 5.947 5.927 5.947 multiply_cannon_multrec 4070 15.5 1.755 1.761 5.378 5.387 build_kinetic_matrix_low 22 6.9 5.156 5.157 5.242 5.242 build_overlap_matrix_low 22 6.9 5.028 5.050 5.103 5.125 pw_poisson_solve 117 10.5 0.003 0.003 4.815 4.823 make_full_single_inverse 14 9.8 0.002 0.002 4.628 4.629 pw_gpu_cff 947 15.5 4.614 4.616 4.614 4.616 qs_ot_get_derivative 106 11.8 0.001 0.001 4.460 4.464 transfer_rs2pw 479 10.8 0.010 0.010 4.241 4.341 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.188 4.219 pw_derive 1053 13.8 4.177 4.198 4.177 4.198 make_images_data 4070 15.5 0.049 0.049 3.913 3.916 hybrid_alltoall_any 4213 16.4 2.755 2.762 3.909 3.913 make_full_inverse_cholesky 14 9.8 0.000 0.000 3.525 3.673 transfer_rs2pw_200 128 11.7 2.607 2.641 3.519 3.624 mp_waitall_1 57459 16.9 3.407 3.484 3.407 3.484 dbcsr_mm_accdrv_process 9388 16.2 1.051 1.619 3.365 3.382 build_core_ppl 11 7.9 3.258 3.287 3.258 3.287 transfer_pw2rs 479 13.4 0.006 0.006 3.109 3.115 qs_create_task_list 11 7.9 0.000 0.000 2.899 2.908 generate_qs_task_list 11 8.9 1.353 1.362 2.899 2.908 pw_copy 1755 13.0 2.797 2.805 2.797 2.805 ot_diis_step 106 11.8 0.005 0.005 2.712 2.712 fft_wrap_pw1pw2_70 234 13.2 0.001 0.002 2.636 2.651 arnoldi_generalized_ev 14 10.8 0.000 0.000 2.577 2.578 dbcsr_sym_matrix_vector_mult 1269 12.5 0.033 0.034 2.539 2.539 transfer_pw2rs_200 128 14.1 1.617 1.634 2.494 2.501 cp_dbcsr_sm_fm_multiply 46 9.3 0.002 0.002 2.437 2.439 calculate_dm_sparse 117 9.7 0.001 0.001 2.405 2.408 dbcsr_complete_redistribute 323 11.8 1.008 1.009 2.230 2.399 gev_build_subspace 23 11.5 0.009 0.009 2.376 2.376 jit_kernel_multiply 12 15.0 1.812 2.362 1.812 2.362 apply_preconditioner_dbcsr 120 12.8 0.000 0.000 2.336 2.338 apply_single 120 13.8 0.001 0.001 2.336 2.338 dbcsr_sym_matrix_vector_mult_l 1269 13.5 2.222 2.223 2.227 2.228 pw_poisson_set 118 11.5 0.004 0.004 2.190 2.198 qs_ot_get_derivative_taylor 89 12.9 0.003 0.003 2.017 2.020 calculate_first_density_matrix 1 7.0 0.000 0.000 1.952 1.952 cp_dbcsr_sm_fm_multiply_core 46 10.3 0.000 0.000 1.923 1.926 multiply_cannon_sync_h2d 4070 15.5 1.799 1.845 1.799 1.845 pw_integral_ab_c1d_c1d_gs 117 11.5 1.813 1.817 1.839 1.841 pw_axpy 1170 12.0 1.636 1.637 1.636 1.637 qs_ot_get_p 120 10.5 0.001 0.001 1.629 1.632 copy_dbcsr_to_fm 143 10.8 0.004 0.004 1.518 1.560 copy_fm_to_dbcsr 180 10.8 0.002 0.002 1.344 1.490 dbcsr_special_finalize 6105 15.5 0.032 0.032 1.441 1.443 cp_fm_cholesky_invert 14 10.8 1.353 1.353 1.353 1.353 multiply_cannon_metrocomm1 4070 15.5 0.012 0.012 1.293 1.338 dbcsr_merge_single_wm 4070 16.5 0.127 0.129 1.334 1.337 calculate_rho_core 11 7.9 0.165 0.165 1.238 1.278 cp_dbcsr_plus_fm_fm_t 22 8.9 0.001 0.001 1.244 1.245 grid_create_task_list 11 9.9 1.224 1.233 1.224 1.233 dbcsr_dot 1125 12.2 1.155 1.156 1.226 1.227 mp_sendrecv_dv 479 12.8 1.129 1.195 1.129 1.195 transfer_dbcsr_to_fm 14 10.8 0.001 0.001 0.993 1.027 dbcsr_sort_data 4070 17.5 0.941 0.941 0.941 0.941 dbcsr_finalize 4628 13.9 0.058 0.058 0.886 0.908 transfer_fm_to_dbcsr 14 9.8 0.000 0.000 0.764 0.907 qs_ot_get_orbitals 106 10.8 0.001 0.001 0.826 0.828 dbcsr_merge_all 4098 15.1 0.177 0.179 0.781 0.803 build_core_ppnl_forces 11 5.9 0.784 0.796 0.784 0.796 dbcsr_copy 7812 13.3 0.199 0.201 0.784 0.787 evaluate_core_matrix_traces 117 8.5 0.001 0.001 0.771 0.772 calculate_ptrace_kp 234 9.5 0.001 0.001 0.770 0.771 mp_alltoall_d11v 1899 13.8 0.752 0.756 0.752 0.756 cp_fm_cholesky_decompose 28 10.5 0.708 0.742 0.708 0.742 qs_ot_p2m_diag 19 11.0 0.034 0.035 0.740 0.741 fft_wrap_pw1pw2_30 234 13.2 0.001 0.001 0.679 0.692 make_images_pack 4070 15.5 0.667 0.668 0.680 0.680 cp_fm_uplo_to_full 47 13.4 0.500 0.653 0.500 0.653 qs_init_subsys 1 2.0 0.001 0.001 0.649 0.649 qs_env_setup 1 3.0 0.000 0.000 0.640 0.641 qs_env_rebuild_pw_env 23 5.3 0.000 0.000 0.639 0.640 pw_env_rebuild 1 5.0 0.000 0.000 0.639 0.640 cp_dbcsr_syevd 19 12.0 0.002 0.002 0.618 0.618 pw_grid_setup 4 6.0 0.000 0.000 0.613 0.614 pw_grid_setup_internal 4 7.0 0.007 0.007 0.602 0.603 cp_fm_diag_elpa 19 13.0 0.000 0.000 0.585 0.585 cp_fm_diag_elpa_base 19 14.0 0.575 0.577 0.585 0.585 transfer_rs2pw_70 117 11.9 0.390 0.392 0.558 0.563 qs_ot_get_derivative_diag 17 12.0 0.001 0.001 0.559 0.559 mp_sum_d 3821 11.6 0.385 0.556 0.385 0.556 pw_zero 585 13.0 0.553 0.555 0.553 0.555 make_basis_sm 14 9.3 0.001 0.001 0.545 0.545 dbcsr_copy_into_existing 22 7.9 0.533 0.534 0.534 0.534 acc_transpose_blocks 4070 15.5 0.019 0.020 0.506 0.507 dbcsr_mm_accdrv_process_sort 9388 17.2 0.502 0.503 0.502 0.503 pw_grid_sort 4 8.0 0.353 0.355 0.480 0.483 transfer_pw2rs_70 117 14.5 0.314 0.314 0.478 0.478 dbcsr_sort_indices 10929 16.5 0.423 0.423 0.423 0.423 ot_scf_init 14 7.8 0.002 0.002 0.393 0.397 compute_matrix_w 11 5.9 0.000 0.000 0.396 0.397 calculate_w_matrix_ot 11 6.9 0.003 0.003 0.396 0.397 dbcsr_data_copy_aa2 2343 15.5 0.377 0.394 0.377 0.394 reorthogonalize_vectors 10 9.0 0.000 0.000 0.391 0.391 mp_sum_l 6134 13.5 0.326 0.370 0.326 0.370 dbcsr_desymmetrize_deep 143 11.8 0.122 0.123 0.367 0.369 parallel_gemm_fm_cosma 96 8.9 0.361 0.363 0.361 0.363 mp_alltoall_i22 633 13.6 0.203 0.350 0.203 0.350 cp_dbcsr_alloc_block_from_nbl 88 7.7 0.221 0.223 0.338 0.339 calculate_ecore_overlap 22 5.9 0.001 0.001 0.182 0.333 build_qs_neighbor_lists 11 6.9 0.001 0.001 0.322 0.322 dbcsr_add_d 1795 13.1 0.003 0.003 0.312 0.312 pw_scale 468 12.0 0.303 0.311 0.303 0.311 distribute_tasks 11 9.9 0.307 0.310 0.307 0.310 integrate_v_core_rspace 11 7.9 0.068 0.068 0.304 0.310 dbcsr_add_anytype 1795 14.1 0.169 0.171 0.309 0.309 setup_rec_index_2d 4070 14.5 0.278 0.281 0.278 0.281 fft_wrap_pw1pw2_10 234 13.2 0.001 0.001 0.258 0.265 pw_multiply_with 117 11.5 0.261 0.262 0.261 0.262 multiply_cannon_multrec_finali 2035 16.5 0.004 0.004 0.258 0.260 dbcsr_mm_multrec_finalize 2035 17.5 0.021 0.021 0.254 0.256 dbcsr_make_untransposed_blocks 2481 13.4 0.237 0.237 0.248 0.248 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="w64PBE", label="w64PBE", y=238.33, yerr=0.0 Plot: name="w64PBE_timings_6cpu_1gpu", title="Timings of w64PBE with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="rest", label="rest", y=122.36400000000002, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=41.766, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=27.628, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=18.501, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="pbe_lda_eval", label="pbe_lda_eval", y=17.34, yerr=0.0 PlotPoint: plot="w64PBE_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.731, yerr=0.0 Running w64SCAN.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/w64SCAN_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.190 0.192 954.876 954.882 qs_mol_dyn_low 1 2.0 0.004 0.004 952.607 952.609 qs_forces 11 3.9 0.002 0.002 952.558 952.564 qs_energies 11 4.9 0.001 0.001 821.320 821.325 scf_env_do_scf 11 5.9 0.001 0.002 780.749 780.754 velocity_verlet 10 3.0 0.002 0.002 763.562 763.583 scf_env_do_scf_inner_loop 106 6.8 0.006 0.009 703.753 703.757 rebuild_ks_matrix 117 8.5 0.001 0.001 659.710 659.720 qs_ks_build_kohn_sham_matrix 117 9.5 0.021 0.021 659.709 659.719 qs_ks_update_qs_env 119 7.8 0.001 0.001 542.582 542.592 fft_wrap_pw1pw2 3053 12.6 0.074 0.075 433.052 433.216 fft_wrap_pw1pw2_400 1649 13.9 0.010 0.011 415.328 415.428 qs_vxc_create 117 10.5 0.002 0.002 388.123 388.142 xc_vxc_pw_create 117 11.5 4.605 4.615 388.121 388.140 xc_rho_set_and_dset_create 117 12.5 5.975 5.988 259.640 259.757 qs_rho_update_rho_low 117 7.9 0.001 0.001 238.246 238.247 calculate_rho_elec 234 8.9 6.758 6.763 238.245 238.246 sum_up_and_integrate 117 10.5 0.005 0.005 217.168 217.615 pw_gpu_c1dr3d_3d_ps 1521 15.1 121.077 121.291 217.223 217.284 integrate_v_rspace 234 11.5 0.426 0.430 216.301 216.745 pw_gpu_r3dc1d_3d_ps 1532 14.1 122.855 122.968 215.736 215.961 xc_pw_derive 702 13.5 0.012 0.012 186.179 186.289 density_rs2pw 234 9.9 0.020 0.020 165.359 165.708 xc_functional_eval 234 13.5 0.003 0.003 156.615 156.708 libxc_lda_eval 234 14.5 156.605 156.698 156.612 156.705 xc_pw_divergence 117 12.5 0.006 0.006 122.592 122.660 grid_integrate_task_list 234 12.5 118.050 118.629 118.050 118.629 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 117.886 117.887 potential_pw2rs 234 12.5 0.285 0.288 97.824 97.964 init_scf_loop 13 6.8 0.000 0.000 76.928 76.928 mp_alltoall_z22v 3053 16.6 72.482 73.290 72.482 73.290 grid_collocate_task_list 234 9.9 65.939 66.324 65.939 66.324 x_to_yz 1521 16.1 9.521 9.522 44.996 45.500 yz_to_x 1532 15.1 7.726 7.744 44.732 45.056 transfer_rs2pw 947 10.9 0.021 0.022 36.065 36.468 transfer_rs2pw_400 245 11.8 25.871 25.973 31.552 31.939 pw_gpu_sf 1521 16.1 31.034 31.384 31.034 31.384 pw_gpu_fg 1532 15.1 30.676 30.683 30.676 30.683 transfer_pw2rs 947 13.5 0.017 0.017 29.725 29.725 init_scf_run 11 5.9 0.000 0.000 26.518 26.518 scf_env_initial_rho_setup 11 6.9 0.000 0.001 26.518 26.518 transfer_pw2rs_400 245 14.3 21.118 21.232 26.444 26.450 wfi_extrapolate 11 7.9 0.002 0.002 22.115 22.115 dbcsr_multiply_generic 2100 12.6 0.137 0.140 19.843 20.284 pw_gpu_ffc 1521 16.1 20.088 20.090 20.088 20.090 pw_poisson_solve 117 10.5 0.003 0.003 17.917 17.925 pw_gpu_cff 1532 15.1 17.314 17.319 17.314 17.319 fft_wrap_pw1pw2_140 468 13.2 0.003 0.003 13.936 14.072 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 13.569 13.569 qs_scf_new_mos 106 7.8 0.001 0.001 13.438 13.439 qs_scf_loop_do_ot 106 8.8 0.001 0.001 13.437 13.439 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 13.222 13.334 pw_derive 1053 13.8 12.576 12.598 12.576 12.598 ot_scf_mini 106 9.8 0.003 0.003 12.040 12.043 multiply_cannon 2100 13.6 0.302 0.305 10.238 10.259 pw_copy 2223 13.1 9.494 9.497 9.494 9.497 multiply_cannon_loop 2100 14.6 0.239 0.241 9.246 9.278 mp_waitall_1 59747 17.0 8.780 8.816 8.780 8.816 prepare_preconditioner 13 7.8 0.000 0.000 8.682 8.685 make_preconditioner 13 8.8 0.000 0.000 8.682 8.685 pw_integral_ab_c1d_c1d_gs 117 11.5 8.155 8.226 8.529 8.548 make_m2s 4200 13.6 0.041 0.042 7.538 7.542 qs_env_update_s_mstruct 11 6.9 0.000 0.000 7.307 7.371 mp_sendrecv_dv 947 12.9 6.887 7.367 6.887 7.367 make_images 4200 14.6 1.054 1.058 7.359 7.365 ot_mini 106 10.8 0.001 0.001 7.291 7.294 pw_poisson_set 118 11.5 0.006 0.006 6.947 6.955 build_core_ppl_forces 11 5.9 6.224 6.334 6.224 6.334 pw_axpy 1638 11.7 6.056 6.065 6.056 6.065 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 5.889 5.914 multiply_cannon_multrec 4200 15.6 1.795 1.803 5.371 5.376 calculate_rho_core 11 7.9 0.438 0.440 4.967 4.991 qs_ot_get_derivative 106 11.8 0.001 0.001 4.544 4.547 build_kinetic_matrix_low 22 6.9 4.374 4.380 4.455 4.462 build_overlap_matrix_low 22 6.9 4.386 4.391 4.455 4.461 make_full_single_inverse 13 9.8 0.002 0.002 4.363 4.365 hybrid_alltoall_any 4338 16.5 2.754 2.756 3.936 3.940 make_images_data 4200 15.6 0.049 0.049 3.922 3.924 transfer_rs2pw_140 234 11.9 2.820 2.842 3.791 3.819 make_full_inverse_cholesky 13 9.8 0.000 0.000 3.512 3.651 dbcsr_mm_accdrv_process 9484 16.3 0.482 0.483 3.317 3.319 fft_wrap_pw1pw2_50 468 13.2 0.003 0.003 2.823 2.894 ot_diis_step 106 11.8 0.005 0.005 2.725 2.725 transfer_pw2rs_140 234 14.5 1.703 1.713 2.648 2.654 build_core_ppl 11 7.9 2.494 2.528 2.494 2.528 dbcsr_complete_redistribute 312 11.8 1.079 1.100 2.366 2.522 arnoldi_generalized_ev 13 10.8 0.000 0.000 2.438 2.438 cp_dbcsr_sm_fm_multiply 45 9.4 0.002 0.002 2.409 2.410 dbcsr_sym_matrix_vector_mult 1206 12.5 0.031 0.032 2.404 2.405 calculate_dm_sparse 117 9.7 0.001 0.001 2.356 2.359 jit_kernel_multiply 12 15.0 2.320 2.321 2.320 2.321 apply_preconditioner_dbcsr 119 12.8 0.000 0.000 2.310 2.314 apply_single 119 13.8 0.001 0.001 2.310 2.314 gev_build_subspace 22 11.5 0.009 0.009 2.249 2.249 pw_zero 702 12.6 2.225 2.228 2.225 2.228 qs_create_task_list 11 7.9 0.000 0.000 2.111 2.204 generate_qs_task_list 11 8.9 0.895 0.902 2.111 2.204 qs_ot_get_derivative_taylor 89 12.9 0.003 0.003 2.148 2.149 dbcsr_sym_matrix_vector_mult_l 1206 13.5 2.096 2.116 2.101 2.121 qs_init_subsys 1 2.0 0.001 0.001 2.000 2.000 qs_env_setup 1 3.0 0.000 0.000 1.992 1.993 qs_env_rebuild_pw_env 23 5.3 0.000 0.000 1.991 1.992 pw_env_rebuild 1 5.0 0.000 0.000 1.991 1.992 pw_grid_setup 4 6.0 0.000 0.000 1.923 1.924 cp_dbcsr_sm_fm_multiply_core 45 10.4 0.000 0.000 1.910 1.913 pw_grid_setup_internal 4 7.0 0.019 0.019 1.892 1.893 calculate_first_density_matrix 1 7.0 0.000 0.000 1.890 1.891 multiply_cannon_sync_h2d 4200 15.6 1.801 1.821 1.801 1.821 copy_dbcsr_to_fm 138 10.8 0.004 0.004 1.732 1.757 qs_ot_get_p 119 10.6 0.001 0.001 1.688 1.690 pw_grid_sort 4 8.0 1.145 1.146 1.554 1.554 mp_sum_d 3885 11.5 1.174 1.516 1.174 1.516 copy_fm_to_dbcsr 174 10.8 0.002 0.002 1.373 1.515 dbcsr_special_finalize 6300 15.6 0.034 0.034 1.439 1.444 mp_sum_l 6329 13.5 0.931 1.388 0.931 1.388 integrate_v_core_rspace 11 7.9 0.153 0.154 1.346 1.348 dbcsr_merge_single_wm 4200 16.6 0.125 0.126 1.329 1.332 multiply_cannon_metrocomm1 4200 15.6 0.012 0.012 1.267 1.330 transfer_dbcsr_to_fm 13 10.8 0.001 0.001 1.226 1.243 cp_fm_cholesky_invert 13 10.8 1.241 1.242 1.241 1.242 dbcsr_dot 1134 12.2 1.147 1.149 1.221 1.222 cp_dbcsr_plus_fm_fm_t 22 8.9 0.001 0.001 1.197 1.197 pw_scale 585 11.9 1.102 1.102 1.102 1.102 grid_create_task_list 11 9.9 0.981 1.046 0.981 1.046 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="w64SCAN", label="w64SCAN", y=954.876, yerr=0.0 Plot: name="w64SCAN_timings_6cpu_1gpu", title="Timings of w64SCAN with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="rest", label="rest", y=363.807, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="libxc_lda_eval", label="libxc_lda_eval", y=156.605, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=122.855, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=121.077, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=118.05, yerr=0.0 PlotPoint: plot="w64SCAN_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=72.482, yerr=0.0 Running GW_PBE_4benzene.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.018 0.020 104.471 104.472 qs_energies 1 2.0 0.000 0.000 104.139 104.139 mp2_main 1 3.0 0.000 0.000 97.359 97.360 mp2_gpw_main 1 4.0 0.000 0.000 95.547 95.548 rpa_ri_compute_en 1 5.0 0.000 0.000 87.696 87.697 rpa_num_int 1 6.0 0.001 0.001 87.687 87.688 dbt_total 2336 9.6 0.020 0.020 69.257 69.257 compute_mat_P_omega 1 7.0 0.001 0.002 69.010 69.011 compute_mat_P_omega_contract 10 8.0 5.209 5.272 68.694 68.714 dbt_contract 787 11.0 0.048 0.049 45.668 45.670 dbt_tas_total 1149 12.2 0.136 0.139 35.799 35.800 dbt_tas_multiply 807 12.1 0.003 0.003 35.087 35.087 dbt_tas_dbm 807 14.1 0.006 0.006 27.494 27.494 dbm_multiply 807 16.1 25.603 26.366 25.603 26.366 dbt_copy 1107 10.7 0.073 0.075 23.907 24.116 compute_mat_P_omega_calc_M_occ 250 9.0 5.192 5.254 23.995 23.995 dbt_tas_mm_1N 524 15.1 0.003 0.003 17.103 17.933 dbt_reshape 594 11.8 6.460 6.628 15.918 16.013 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 15.031 15.031 compute_QP_energies 1 7.0 0.000 0.000 11.903 11.903 compute_self_energy_cubic_gw 1 8.0 0.114 0.116 11.903 11.903 dbt_tas_reserve_blocks_index 3266 14.3 0.639 0.640 10.654 10.743 dbm_reserve_blocks 3634 15.3 10.346 10.436 10.346 10.436 dbt_reserve_blocks_index 2347 13.0 0.317 0.319 8.837 8.895 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 8.874 8.874 dbt_crop 1042 12.0 6.449 6.538 8.706 8.836 dbt_reserve_blocks_index_array 2289 12.1 0.011 0.012 8.633 8.704 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 7.841 7.841 mp_waitall_2 2656 15.9 7.561 7.576 7.561 7.576 dbt_tas_mm_2 251 15.0 0.003 0.003 7.573 7.573 dbt_communicate_buffer 594 12.8 0.011 0.011 6.897 6.908 scf_env_do_scf 1 3.0 0.000 0.000 6.234 6.234 scf_env_do_scf_inner_loop 17 4.0 0.001 0.001 6.234 6.234 contract_cubic_gw 21 9.0 0.000 0.000 5.653 5.653 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 5.557 5.568 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.002 5.319 5.336 dbcsr_multiply_generic 30 8.1 0.003 0.003 5.047 5.093 multiply_cannon 30 9.1 0.007 0.008 4.853 4.896 multiply_cannon_loop 30 10.1 0.004 0.004 4.800 4.843 dbt_tas_copy 511 11.5 2.536 2.571 4.470 4.531 multiply_cannon_multrec 60 11.1 0.136 0.142 4.224 4.235 mp_sync 8688 11.6 2.983 3.947 2.983 3.947 dbcsr_mm_accdrv_process 328 12.3 0.042 0.042 3.928 3.939 jit_kernel_multiply 18 11.7 3.880 3.891 3.880 3.891 get_2c_integrals 1 6.0 0.000 0.000 3.573 3.574 qs_scf_new_mos 17 5.0 0.000 0.000 3.274 3.308 compute_2c_integrals 1 7.0 0.000 0.000 2.785 2.785 trace_sigma_gw 21 9.0 0.356 0.359 2.616 2.616 fft_wrap_pw1pw2 301 10.2 0.005 0.005 2.460 2.466 qs_ks_build_kohn_sham_matrix 18 6.9 0.002 0.002 2.443 2.452 qs_ks_update_qs_env 17 5.0 0.000 0.000 2.415 2.423 rebuild_ks_matrix 17 6.0 0.000 0.000 2.407 2.416 mp2_ri_gpw_compute_in_copy_3c 6 6.0 0.218 0.219 2.268 2.407 convert_to_new_pgrid 2421 14.1 0.035 0.035 2.381 2.384 compute_W_cubic_GW 10 7.0 0.004 0.004 2.362 2.365 dbm_copy 1614 15.1 2.347 2.350 2.347 2.350 parallel_gemm_fm_cosma 105 8.4 2.299 2.309 2.299 2.309 fill_fm_L_from_L_loc_non_block 1 8.0 0.000 0.000 2.256 2.276 dbt_split_copyback 70 10.6 0.860 0.862 2.182 2.207 fill_fm_L_from_L_loc_non_block 1 9.0 2.163 2.182 2.163 2.182 build_3c_integrals 5 6.0 1.406 1.445 1.990 2.129 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 2.093 2.093 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="GW_PBE_4benzene", label="GW_PBE_4benzene", y=104.471, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_6cpu_1gpu", title="Timings of GW_PBE_4benzene with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="rest", label="rest", y=48.052, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=25.603, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=10.346, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=7.561, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=6.46, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_crop", label="dbt_crop", y=6.449, yerr=0.0 Running RI-HFX_H2O-32.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.023 0.025 191.988 191.989 qs_forces 1 2.0 0.000 0.000 191.541 191.542 rebuild_ks_matrix 7 6.6 0.000 0.000 187.303 187.304 qs_ks_build_kohn_sham_matrix 7 7.6 0.002 0.002 187.303 187.304 hfx_ks_matrix 7 8.6 0.000 0.000 183.456 183.464 dbt_total 849 11.0 0.009 0.009 137.017 137.018 hfx_ri_update_ks 7 9.6 0.000 0.000 104.587 104.587 hfx_ri_update_ks_Pmat 7 10.6 21.220 21.273 104.582 104.583 qs_energies 1 3.0 0.000 0.000 100.084 100.085 scf_env_do_scf 1 4.0 0.000 0.000 98.003 98.004 qs_ks_update_qs_env 8 6.0 0.000 0.000 95.891 95.892 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 91.419 91.420 dbt_contract 207 12.4 0.048 0.048 79.333 79.333 hfx_ri_update_forces 1 7.0 1.043 1.056 78.867 78.876 dbt_tas_total 369 13.4 0.073 0.073 66.107 66.107 dbt_tas_multiply 216 13.5 0.001 0.001 63.324 63.324 dbt_copy 423 11.8 0.045 0.046 52.966 53.175 scf_env_do_scf_inner_loop 6 5.0 0.000 0.001 52.713 52.713 dbt_tas_dbm 216 15.5 0.002 0.002 50.259 50.259 dbm_multiply 216 17.5 47.371 47.418 47.371 47.418 hfx_ri_forces_Pmat_3c 1 8.0 3.660 3.667 46.700 46.738 init_scf_loop 2 5.0 0.000 0.000 45.289 45.289 dbt_reshape 175 13.2 17.544 17.644 40.501 40.791 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 30.190 30.190 precalc_derivatives 1 8.0 1.798 1.810 26.187 26.187 dbt_tas_mm_2 91 16.5 0.001 0.001 21.010 21.010 mp_waitall_2 1022 16.5 20.958 20.976 20.958 20.976 dbt_tas_reserve_blocks_index 1323 15.4 1.607 1.610 17.388 17.692 dbt_communicate_buffer 175 14.2 0.004 0.004 17.337 17.365 dbm_reserve_blocks 1491 16.3 16.471 16.771 16.471 16.771 dbt_tas_mm_3T 77 17.1 0.001 0.001 15.935 16.254 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 16.012 16.012 dbt_crop 372 13.7 12.164 12.246 15.823 16.003 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 14.540 14.540 dbt_reserve_blocks_index 889 14.5 0.598 0.601 14.337 14.423 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 14.237 14.237 dbt_reserve_blocks_index_array 859 13.5 0.007 0.007 14.068 14.147 build_3c_derivatives 3 9.0 2.151 2.164 13.690 13.691 dbt_tas_mm_3N 37 15.4 0.000 0.000 11.027 11.149 dbt_tas_copy 248 12.5 4.026 4.211 7.408 7.984 mp_sync 2901 12.8 6.217 6.532 6.217 6.532 hfx_ri_pre_scf_Pmat_int 1 13.0 0.000 0.000 5.083 5.084 dbt_tas_replicate 168 15.1 2.145 2.164 4.745 4.759 hfx_ri_pre_scf_Pmat_copy_2 9 13.0 1.641 1.642 4.470 4.471 hfx_ri_pre_scf_calc_tensors 1 14.0 0.003 0.003 4.364 4.365 hfx_ri_pre_scf_Pmat_RIx3C 9 13.0 0.000 0.000 3.812 3.844 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-HFX_H2O-32", label="RI-HFX_H2O-32", y=191.988, yerr=0.0 Plot: name="RI-HFX_H2O-32_timings_6cpu_1gpu", title="Timings of RI-HFX_H2O-32 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="rest", label="rest", y=68.424, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=47.371, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=21.22, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=20.958, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=17.544, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=16.471, yerr=0.0 Running RI-MP2_ammonia.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.010 0.011 105.866 105.866 qs_energies 1 2.0 0.000 0.000 105.680 105.680 mp2_main 1 3.0 0.000 0.000 98.281 98.281 mp2_gpw_main 1 4.0 0.001 0.001 97.866 97.867 mp2_ri_gpw_compute_in 1 5.0 0.564 0.567 54.490 54.527 mp2_ri_gpw_compute_in_loop 1 6.0 0.014 0.014 46.385 46.423 mp2_ri_gpw_compute_en 1 5.0 0.095 0.095 43.312 43.347 mp2_ri_gpw_compute_en_RI_loop 1 6.0 12.824 12.855 40.588 40.589 dbcsr_multiply_generic 2666 8.0 0.161 0.162 23.333 23.370 ao_to_mo_and_store_B_mult_1 1328 7.0 0.014 0.014 21.901 21.937 mp2_eri_3c_integrate_gpw 1328 7.0 0.017 0.018 18.813 18.821 mp2_ri_gpw_compute_en_expansio 1040 7.0 0.720 0.731 16.095 16.117 local_gemm 1040 8.0 15.375 15.386 15.375 15.386 make_m2s 5332 9.0 0.053 0.053 12.734 12.842 make_images 5332 10.0 2.353 2.358 12.546 12.654 integrate_v_rspace 1338 8.0 1.061 1.072 10.483 10.573 multiply_cannon 2666 9.0 0.407 0.412 9.926 10.074 multiply_cannon_loop 2666 10.0 0.191 0.193 8.753 8.962 hybrid_alltoall_any 6683 11.6 8.344 8.429 8.621 8.704 make_images_data 5332 11.0 0.066 0.067 8.526 8.612 grid_integrate_task_list 1338 9.0 8.135 8.233 8.135 8.233 fft_wrap_pw1pw2 26668 10.4 0.141 0.141 7.663 7.912 get_2c_integrals 1 6.0 0.004 0.004 7.540 7.541 collocate_function 1328 8.0 5.136 5.273 7.194 7.312 compute_2c_integrals 1 7.0 0.007 0.007 6.998 6.998 compute_2c_integrals_loop_lm 1 8.0 0.021 0.021 6.848 6.891 mp2_eri_2c_integrate_gpw 1 9.0 2.028 2.058 6.826 6.870 scf_env_do_scf 1 3.0 0.000 0.000 6.485 6.486 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 6.485 6.486 mp2_ri_gpw_compute_en_comm 221 7.0 1.023 1.025 5.659 5.719 ao_to_mo_and_store_B_E_Ex_1 1328 7.0 3.503 3.505 5.377 5.378 multiply_cannon_multrec 2676 11.0 2.190 2.223 4.949 5.053 qs_scf_new_mos 10 5.0 0.000 0.000 4.903 4.903 mp2_ri_gpw_compute_en_ener 1040 7.0 4.845 4.851 4.845 4.851 fft_wrap_pw1pw2_20 10647 11.4 0.021 0.022 4.341 4.586 pw_gpu_r3dc1d_3d 13282 12.2 3.797 4.046 3.797 4.046 mp_sendrecv_dm3 442 8.0 3.607 3.674 3.607 3.674 eigensolver 11 5.8 0.001 0.001 3.101 3.102 potential_pw2rs 2666 10.0 0.096 0.096 2.644 2.684 pw_gpu_c1dr3d_3d 13280 12.7 2.621 2.622 2.621 2.622 dbcsr_mm_accdrv_process 5392 12.0 0.882 1.525 2.528 2.606 cp_fm_diag_elpa 11 6.8 0.000 0.000 2.442 2.442 cp_fm_diag_elpa_base 11 7.8 2.353 2.372 2.440 2.440 fft_wrap_pw1pw2_10 15957 11.5 0.019 0.019 2.368 2.374 replicate_iaK_2intgroup 1 6.0 2.171 2.173 2.312 2.313 collocate_single_gaussian 1328 10.0 0.090 0.093 2.275 2.310 copy_dbcsr_to_fm 1351 8.0 0.033 0.033 2.304 2.309 fill_local_i_aL 884 7.5 2.194 2.201 2.194 2.201 mp2_eri_2c_integrate_gpw_pot_l 1328 10.0 0.004 0.004 2.145 2.187 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-MP2_ammonia", label="RI-MP2_ammonia", y=105.866, yerr=0.0 Plot: name="RI-MP2_ammonia_timings_6cpu_1gpu", title="Timings of RI-MP2_ammonia with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="rest", label="rest", y=56.052, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="local_gemm", label="local_gemm", y=15.375, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=12.824, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=8.344, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.135, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="collocate_function", label="collocate_function", y=5.136, yerr=0.0 Running diag_cu144_broy.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/diag_cu144_broy_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.077 0.078 207.273 207.274 qs_energies 1 2.0 0.000 0.000 206.175 206.175 scf_env_do_scf 1 3.0 0.000 0.000 192.357 192.358 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 192.357 192.357 qs_ks_update_qs_env 15 5.0 0.000 0.000 97.402 97.479 rebuild_ks_matrix 15 6.0 0.000 0.000 97.196 97.274 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 97.196 97.274 qs_vxc_create 15 8.0 0.111 0.128 60.579 60.593 qs_scf_new_mos 15 5.0 0.000 0.000 54.723 54.824 fft_wrap_pw1pw2 1086 10.0 0.029 0.029 53.332 53.365 calculate_dispersion_nonloc 15 9.0 10.920 11.002 51.995 51.996 eigensolver 15 6.0 0.002 0.002 44.754 44.765 qs_rho_update_rho_low 16 5.0 0.000 0.000 38.260 38.260 calculate_rho_elec 16 6.0 0.181 0.181 38.259 38.260 sum_up_and_integrate 15 8.0 0.000 0.000 35.087 35.183 integrate_v_rspace 15 9.0 0.047 0.047 35.062 35.156 grid_integrate_task_list 15 10.0 27.739 27.817 27.739 27.817 fft_wrap_pw1pw2_150 765 11.0 0.005 0.005 27.735 27.798 pw_gpu_c1dr3d_3d_ps 585 12.1 5.625 5.637 27.691 27.726 cp_fm_diag_elpa 15 7.0 0.000 0.000 27.392 27.396 cp_fm_diag_elpa_base 15 8.0 25.506 26.099 27.387 27.388 grid_collocate_task_list 16 7.0 26.618 26.706 26.618 26.706 pw_gpu_r3dc1d_3d_ps 501 11.9 5.477 5.668 25.606 25.676 cp_fm_cholesky_restore 45 7.0 15.355 16.103 15.355 16.103 fft_wrap_pw1pw2_200 197 11.3 0.001 0.001 13.472 13.519 density_rs2pw 16 7.0 0.001 0.002 11.453 11.539 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 9.921 9.921 vdW_energy 15 10.0 9.425 9.457 9.425 9.457 mp_alltoall_z22v 1086 14.0 9.279 9.383 9.279 9.383 pw_gpu_ffc 585 13.1 9.011 9.016 9.011 9.016 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 8.514 8.580 xc_vxc_pw_create 15 9.0 0.180 0.182 8.473 8.475 pw_gpu_cff 501 12.9 8.395 8.401 8.395 8.401 potential_pw2rs 15 10.0 0.007 0.007 7.276 7.293 copy_dbcsr_to_fm 16 5.9 0.001 0.001 6.869 7.021 pw_gpu_sf 585 13.1 6.960 6.981 6.960 6.981 pw_gpu_fg 501 12.9 6.508 6.535 6.508 6.535 x_to_yz 585 13.1 1.063 1.073 6.059 6.062 dbcsr_complete_redistribute 46 8.3 1.858 1.896 5.893 5.923 fft_wrap_pw1pw2_10 62 10.5 0.000 0.000 5.435 5.438 yz_to_x 501 12.9 0.887 0.894 5.170 5.258 cp_fm_uplo_to_full 30 8.0 3.886 5.210 3.886 5.210 xc_pw_derive 90 11.0 0.001 0.001 4.993 5.015 xc_rho_set_and_dset_create 15 10.0 0.131 0.134 4.902 4.912 build_core_ppnl 1 5.0 4.819 4.823 4.819 4.823 gspace_mixing 14 5.0 0.128 0.128 4.153 4.153 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="diag_cu144_broy", label="diag_cu144_broy", y=207.273, yerr=0.0 Plot: name="diag_cu144_broy_timings_6cpu_1gpu", title="Timings of diag_cu144_broy with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="rest", label="rest", y=101.13499999999999, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=27.739, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=26.618, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=25.506, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=15.355, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="calculate_dispersion_nonloc", label="calculate_dispersion_nonloc", y=10.92, yerr=0.0 Running bench_dftb.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/bench_dftb_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 2.017 2.046 162.008 162.009 qs_energies 1 2.0 0.000 0.000 159.890 159.890 ls_scf 1 3.0 0.000 0.000 152.908 152.910 ls_scf_main 1 4.0 0.001 0.001 141.451 141.452 density_matrix_trs4 5 5.0 0.003 0.003 112.220 112.250 dbcsr_multiply_generic 95 6.2 0.156 0.156 97.092 97.175 multiply_cannon 95 7.2 2.032 2.115 67.758 67.915 multiply_cannon_loop 95 8.2 0.167 0.168 56.733 56.881 multiply_cannon_multrec 190 9.2 43.510 43.636 48.694 48.831 ls_scf_dm_to_ks 5 5.0 0.000 0.000 27.086 27.118 make_m2s 190 7.2 0.014 0.014 24.808 24.824 make_images 190 8.2 5.672 5.843 24.263 24.277 matrix_ls_to_qs 5 6.0 0.000 0.000 18.201 18.329 dbcsr_complete_redistribute 11 7.5 11.495 11.666 15.919 16.135 matrix_decluster 5 7.0 0.000 0.000 14.407 14.617 arnoldi_extremal 6 6.2 0.000 0.000 11.442 11.443 arnoldi_normal_ev 6 7.2 0.005 0.005 11.442 11.443 build_subspace 12 8.2 0.030 0.031 11.215 11.215 qs_ks_update_qs_env 6 6.2 0.000 0.000 10.793 10.890 rebuild_ks_matrix 6 7.2 0.000 0.000 10.370 10.376 build_dftb_ks_matrix 6 8.2 0.001 0.001 10.370 10.376 make_images_data 190 9.2 0.006 0.006 10.152 10.353 dbcsr_matrix_vector_mult 310 9.0 0.074 0.074 10.193 10.343 build_dftb_coulomb 6 9.2 0.772 0.775 10.076 10.083 hybrid_alltoall_any 201 10.0 6.654 6.752 9.771 9.981 dbcsr_matrix_vector_mult_local 310 10.0 9.721 9.871 9.725 9.875 ls_scf_init_scf 1 4.0 0.000 0.000 9.738 9.740 tb_ewald_overlap 6 10.2 9.026 9.044 9.026 9.044 ls_scf_init_matrix_S 1 5.0 0.000 0.000 7.792 7.795 dbcsr_finalize 277 7.6 0.102 0.106 7.573 7.760 calculate_norms 380 9.2 7.485 7.490 7.485 7.490 dbcsr_merge_all 247 8.6 1.462 1.609 6.950 7.134 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.000 7.046 7.047 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 6.920 6.921 build_qs_neighbor_lists 1 4.0 0.000 0.000 6.346 6.380 build_neighbor_lists_sab_tbe 1 5.0 6.174 6.209 6.174 6.209 dbcsr_copy 443 8.0 0.933 0.958 4.739 4.797 setup_rec_index_2d 190 8.2 4.714 4.735 4.714 4.735 dbcsr_special_finalize 285 9.2 0.005 0.005 4.645 4.655 dbcsr_add_d 130 6.0 0.001 0.001 4.352 4.497 dbcsr_add_anytype 130 7.0 1.828 1.829 4.351 4.497 dbcsr_dot 66 6.3 3.781 3.784 4.180 4.469 dbcsr_sort_indices 643 10.1 4.377 4.387 4.377 4.387 dbcsr_mm_accdrv_process 8119 10.0 0.422 0.502 4.113 4.150 dbcsr_data_new 3509 9.3 4.008 4.099 4.008 4.099 dbcsr_copy_into_existing 5 8.0 3.794 3.876 3.794 3.876 mp_waitall_1 2666 10.6 3.447 3.760 3.447 3.760 dbcsr_mm_accdrv_process_sort 8119 11.0 3.613 3.648 3.613 3.648 tree_to_linear_d 11 10.5 3.557 3.558 3.557 3.558 dbcsr_mm_multrec_init 95 8.2 0.000 0.000 3.263 3.295 dbcsr_mm_csr_init 95 9.2 0.006 0.006 3.263 3.294 dbcsr_mm_sched_init 95 10.2 0.000 0.000 3.234 3.265 dbcsr_mm_accdrv_init 95 11.2 0.429 0.484 3.234 3.264 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="bench_dftb", label="bench_dftb", y=162.008, yerr=0.0 Plot: name="bench_dftb_timings_6cpu_1gpu", title="Timings of bench_dftb with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="rest", label="rest", y=80.77100000000002, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=43.51, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=11.495, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=9.721, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="tb_ewald_overlap", label="tb_ewald_overlap", y=9.026, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="calculate_norms", label="calculate_norms", y=7.485, yerr=0.0 Running dbcsr.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/dbcsr_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.004 0.004 48.899 48.900 lib_test 1 2.0 0.000 0.000 48.886 48.892 dbcsr_run_tests 3 3.0 0.000 0.000 48.886 48.892 test_multiplies_multiproc 3 4.0 0.001 0.001 37.804 37.844 dbcsr_multiply_generic 9 5.0 0.002 0.002 29.280 29.287 multiply_cannon 9 6.0 0.021 0.021 19.174 19.685 multiply_cannon_loop 9 7.0 0.003 0.003 17.716 18.094 multiply_cannon_multrec 18 8.0 9.503 9.783 16.530 16.902 dbcsr_make_random_matrix 9 4.0 7.516 7.629 10.942 10.981 dbcsr_finalize 27 5.7 0.001 0.001 7.593 7.727 dbcsr_merge_all 18 6.5 3.690 3.710 7.478 7.610 dbcsr_mm_accdrv_process 8199 9.0 1.121 1.235 6.801 6.883 dbcsr_redistribute 9 5.0 3.573 3.576 5.994 6.001 make_m2s 18 6.0 0.001 0.001 5.208 5.211 make_images 18 7.0 0.373 0.377 5.175 5.178 dbcsr_mm_accdrv_process_sort 8199 10.0 4.671 4.674 4.671 4.674 make_images_data 18 8.0 0.001 0.001 3.039 3.043 hybrid_alltoall_any 18 9.0 2.509 2.516 2.993 2.996 mp_alltoall_d11v 27 6.0 2.156 2.156 2.156 2.156 dbcsr_data_copy_aa2 18 7.5 1.786 1.945 1.786 1.945 tree_to_linear_d 9 7.0 1.869 1.876 1.869 1.876 dbcsr_data_release 507 7.7 1.383 1.398 1.383 1.398 jit_kernel_multiply 7 10.0 1.009 1.202 1.009 1.202 dbcsr_data_new 354 7.4 1.017 1.142 1.017 1.142 dbcsr_checksum 6 5.0 0.999 1.005 1.009 1.009 mp_sum_l 61 4.9 0.497 0.985 0.497 0.985 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.496 0.984 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="dbcsr", label="dbcsr", y=48.899, yerr=0.0 Plot: name="dbcsr_timings_6cpu_1gpu", title="Timings of dbcsr with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="rest", label="rest", y=19.946, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=9.503, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=7.516, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_mm_accdrv_process_sort", label="dbcsr_mm_accdrv_process_sort", y=4.671, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.69, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_redistribute", label="dbcsr_redistribute", y=3.573, yerr=0.0 Running MQAE_single_node.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/MQAE_single_node_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.046 0.047 210.458 210.458 qs_mol_dyn_low 1 2.0 0.006 0.008 208.876 208.913 qs_forces 6 3.8 0.001 0.001 130.330 130.330 qs_energies 6 4.8 0.000 0.001 123.127 123.127 scf_env_do_scf 6 5.8 0.000 0.000 115.358 115.358 scf_env_do_scf_inner_loop 113 6.2 0.006 0.008 107.189 107.189 velocity_verlet 5 3.0 0.003 0.003 100.283 100.335 rebuild_ks_matrix 119 8.1 0.000 0.001 88.042 88.043 qs_ks_build_kohn_sham_matrix 119 9.1 0.019 0.019 88.041 88.042 qs_ks_update_qs_env 119 7.3 0.001 0.001 83.107 83.107 fft_wrap_pw1pw2 2059 12.4 0.044 0.046 70.502 70.518 fft_wrap_pw1pw2_150 1321 13.9 0.009 0.009 67.639 67.695 qs_vxc_create 119 10.1 0.002 0.002 56.788 56.790 xc_vxc_pw_create 119 11.1 1.548 1.558 56.786 56.788 qmmm_el_coupling 6 3.8 0.000 0.000 41.440 41.445 qmmm_elec_with_gaussian 6 4.8 0.020 0.020 41.434 41.438 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 40.037 40.200 xc_pw_derive 714 13.1 0.009 0.009 39.673 39.688 pw_gpu_c1dr3d_3d_ps 1095 14.8 10.695 10.819 37.994 38.039 qmmm_elec_gaussian_low_G 6 6.8 35.219 35.336 35.219 35.336 qmmm_forces 6 3.8 0.001 0.001 34.200 34.200 qmmm_forces_with_gaussian 6 4.8 0.023 0.023 32.864 33.832 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 31.561 32.533 pw_gpu_r3dc1d_3d_ps 964 14.0 9.612 9.719 32.452 32.483 xc_rho_set_and_dset_create 119 12.1 2.482 2.504 28.408 28.474 qmmm_forces_gaussian_low_G 6 6.8 26.390 27.415 26.390 27.415 xc_pw_divergence 119 12.1 0.006 0.006 26.406 26.438 qs_rho_update_rho_low 119 7.3 0.001 0.001 23.180 23.249 calculate_rho_elec 119 8.3 1.112 1.114 23.179 23.248 mp_alltoall_z22v 2059 16.4 18.059 18.529 18.059 18.529 density_rs2pw 119 9.3 0.007 0.008 17.205 17.301 sum_up_and_integrate 119 10.1 0.002 0.002 14.585 14.642 integrate_v_rspace 119 11.1 0.021 0.021 14.396 14.453 dbcsr_multiply_generic 2598 12.3 0.098 0.099 13.969 14.112 x_to_yz 1095 15.8 2.424 2.425 12.266 12.490 multiply_cannon 2598 13.3 0.226 0.228 12.235 12.265 multiply_cannon_loop 2598 14.3 0.258 0.259 11.739 11.763 potential_pw2rs 119 12.1 0.034 0.034 10.469 10.470 yz_to_x 964 15.0 1.833 1.842 10.050 10.285 multiply_cannon_multrec 5196 15.3 4.052 4.148 9.531 9.615 qs_ks_ddapc 119 10.1 0.002 0.002 9.307 9.311 pw_gpu_sf 1095 15.8 8.615 8.627 8.615 8.627 init_scf_loop 6 6.8 0.000 0.000 8.166 8.166 pw_gpu_fg 964 15.0 7.823 7.996 7.823 7.996 qs_scf_new_mos 113 7.2 0.001 0.001 7.440 7.443 qs_scf_loop_do_ot 113 8.2 0.001 0.001 7.439 7.442 ot_scf_mini 113 9.2 0.002 0.002 7.146 7.146 pw_gpu_ffc 1095 15.8 6.399 6.443 6.399 6.443 init_scf_run 6 5.8 0.000 0.000 5.536 5.536 scf_env_initial_rho_setup 6 6.8 0.000 0.000 5.536 5.536 dbcsr_mm_accdrv_process 13992 16.0 0.540 0.544 5.412 5.424 xc_functional_eval 238 13.1 0.003 0.003 5.305 5.339 qmmm_forces_gaussian_low_R 6 6.8 0.000 0.000 5.171 5.224 qmmm_forces_with_gaussian_LG 6 7.8 5.171 5.224 5.171 5.224 ot_mini 113 10.2 0.001 0.001 5.084 5.085 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 4.966 4.966 pw_gpu_cff 964 15.0 4.898 4.911 4.898 4.911 pw_poisson_solve 125 9.9 0.003 0.003 4.877 4.882 qmmm_elec_gaussian_low_R 6 6.8 0.000 0.000 4.817 4.864 qmmm_elec_with_gaussian_LG 6 7.8 4.817 4.864 4.817 4.864 grid_collocate_task_list 119 9.3 4.838 4.861 4.838 4.861 jit_kernel_multiply 24 14.7 4.823 4.838 4.823 4.838 pw_derive 1089 13.4 4.230 4.252 4.230 4.252 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="MQAE_single_node", label="MQAE_single_node", y=210.458, yerr=0.0 Plot: name="MQAE_single_node_timings_6cpu_1gpu", title="Timings of MQAE_single_node with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="rest", label="rest", y=110.48299999999999, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=35.219, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=26.39, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=18.059, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.695, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=9.612, yerr=0.0 Summary: Performance test took 40 minutes. Status: OK ---> Removed intermediate container 96722a6ebba0 ---> c10ca2d4ee94 Step 45/46 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in 36079f2e68da ---> Removed intermediate container 36079f2e68da ---> 01049da5ccab Step 46/46 : ENTRYPOINT [] ---> Running in 9e1e25a61a3e ---> Removed intermediate container 9e1e25a61a3e ---> 16468fc7ca74 [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built 16468fc7ca74 Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2026-05-15 21:05:12+00:00