StartDate: 2026-03-15 06:06:29+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: eeadd9fdbbb203edce77f8d85748c3440d47b5c1 CommitTime: 2026-03-14 18:14:28 +0100 CommitAuthor: Matthias Krack CommitSubject: Fix occupied orbital printout for GUESS atomic and RESTART off #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=eeadd9fdbbb203edce77f8d85748c3440d47b5c1 SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 413.3MB Step 1/46 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 6e8af4fd0a07: Waiting 15a17189b2df: Waiting 02cb0e091e33: Waiting 9c3d619183d2: Waiting 7f7602a82106: Waiting 5a2aba542b08: Waiting 6cb9b761b877: Waiting 2cd52cbb1ebe: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 2cd52cbb1ebe: Verifying Checksum 2cd52cbb1ebe: Download complete 6e8af4fd0a07: Verifying Checksum 6e8af4fd0a07: Download complete 02cb0e091e33: Verifying Checksum 02cb0e091e33: Download complete 9c3d619183d2: Verifying Checksum 9c3d619183d2: Download complete 7f7602a82106: Verifying Checksum 7f7602a82106: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 6cb9b761b877: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/46 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/46 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/46 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/46 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/46 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/46 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> 1ff2ec46e723 Step 8/46 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> bf4865207130 Step 9/46 : RUN mkdir scripts ---> Using cache ---> 95733bd3ea48 Step 10/46 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./tools/toolchain/scripts/generate_cmake_options.sh ./scripts/ ---> Using cache ---> a904da742dac Step 11/46 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> ab9ff0a1b59f Step 12/46 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --gpu-ver=V100 --dry-run --list-cmake-options=no ---> Using cache ---> 4511b172b39e Step 13/46 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> 22adbf9a925f Step 14/46 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> 095f8d5aa7e9 Step 15/46 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 544f9254a114 Step 16/46 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> b7c417a6e75e Step 17/46 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> 0793ea1f42c0 Step 18/46 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> d0a4b2d741ae Step 19/46 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> f5abdb49f06a Step 20/46 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> 6c2514b11137 Step 21/46 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> a31b8b74c01f Step 22/46 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> 615341e30de1 Step 23/46 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> Using cache ---> 6f7a6d138b48 Step 24/46 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Using cache ---> 083105451c17 Step 25/46 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> Using cache ---> e8ce0167285f Step 26/46 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Using cache ---> 3748c3ea8cc1 Step 27/46 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> Using cache ---> 3cc69bde76fc Step 28/46 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Using cache ---> 53853ef7a9c2 Step 29/46 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> Using cache ---> fd1ab743be8a Step 30/46 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Using cache ---> 9e6121717693 Step 31/46 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> Using cache ---> 2952f49f6ccc Step 32/46 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Using cache ---> e9b7c8b52f96 Step 33/46 : WORKDIR /opt/cp2k ---> Using cache ---> 4e8ac2417996 Step 34/46 : COPY ./src ./src ---> d631e15658a1 Step 35/46 : COPY ./data ./data ---> 1bdda49dc61e Step 36/46 : COPY ./tools/build_utils ./tools/build_utils ---> 0416f73d47a2 Step 37/46 : COPY ./cmake ./cmake ---> e49d28db10ea Step 38/46 : COPY ./CMakeLists.txt . ---> bae615dd5378 Step 39/46 : COPY ./tools/docker/scripts/build_cp2k.sh . ---> 41dd110d9aec Step 40/46 : RUN ./build_cp2k.sh toolchain_cuda_V100 psmp ---> Running in 67b2980226d2 ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpi.so (found version "4.1") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpicxx.so (found version "4.1") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpifort.so (found version "4.1") -- Found MPI: TRUE (found version "4.1") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "4.1") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.31 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.31/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.31/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.31/lib/libopenblas.so -- Checking for module 'libxsmm-shared' -- Found libxsmm-shared, version 1.17.0 -- Checking for module 'libxsmmf-shared' -- Found libxsmmf-shared, version 1.17.0 -- Checking for module 'libxsmmext-shared' -- Found libxsmmext-shared, version 1.17.0 -- Checking for module 'libxsmmnoblas-shared' -- Found libxsmmnoblas-shared, version 1.17.0 -- Found LibXSMM: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include -- Using LIBXSMM for Small Matrix Multiplication -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a -- CP2K_WITH_GPU is deprecated in favor of CMAKE_HIP_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 70 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "4.1") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Checking for module 'libxsmmf' -- Found libxsmmf, version 1.17.0 -- Checking for module 'libxsmmext' -- Found libxsmmext, version 1.17.0 ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "2.1.0") found components: C Fortran -- Found MPI: TRUE (found version "4.1") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.31/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.31/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Checking for module 'libint2' -- Package 'libint2', required by 'virtual:world', not found -- Found Libint2: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.7.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "2.1.0") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.31/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.31/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'libvdwxc>=0.3.0' -- Looking for vdwxc_init_mpi -- Looking for vdwxc_init_mpi - not found -- Found LibVDWXC: /opt/cp2k-toolchain/install/libvdwxc-0.4.0/lib/libvdwxc.a (Required is at least version "0.3.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - vendor: OpenBLAS - include directories: /opt/cp2k-toolchain/install/openblas-0.3.31/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.31/lib/libopenblas.so - LAPACK - include directories: /opt/cp2k-toolchain/install/openblas-0.3.31/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.31/lib/libopenblas.so - MPI - include directories: /opt/cp2k-toolchain/install/mpich-4.3.2/include - libraries: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpi.so - MPI_F08: ON - ScaLAPACK - vendor: auto - include directories: - libraries: /opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a - Hardware Acceleration: - CUDA: - GPU architecture number: 70 - GPU profiling enabled: - GPU accelerated modules - ELPA module: ON - GRID module: ON - DBM module: ON - PW module: ON - LibXC - version: 7.0.0 - include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - version: 2.1.0 - include directories: /opt/cp2k-toolchain/install/hdf5-2.1.0/include - libraries: hdf5-shared - FFTW3 - include directories: /opt/cp2k-toolchain/install/fftw-3.3.10/include - libraries: /opt/cp2k-toolchain/install/fftw-3.3.10/lib/libfftw3.a - LIBXSMM - include directories: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include - libraries: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmext.so;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so;/opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmf.so;:libxsmmext.a;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so - SpLA - include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - SIRIUS - include directories: - libraries: - COSMA - include directories: /opt/cp2k-toolchain/install/COSMA-2.8.1/include - libraries: MPI::MPI_CXX;costa::costa;$;$;$<$:cosma::BLAS::blas>;$;$<$:Tiled-MM::Tiled-MM>;$<$:Tiled-MM::Tiled-MM>;$<$:semiprof::semiprof>;$<$:cosma::scalapack::scalapack> - Libint2 - include directories: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include - libraries: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/lib/libint2.a - ELPA - include directories: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/include/elpa_openmp-2024.05.001 - libraries: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a;:libopenblas.a -------------------------------------------------------------------- - - - List of dependencies not included in this build - - - -------------------------------------------------------------------- - DFTD4 - DeePMD - PEXSI - ACE (libpace) - TBLITE - Spglib - LibSMEAGOL - MiMiC - openPMD - DLA-Future - PLUMED - Libvori - LibTorch - TREXIO - GreenX After building CP2K the regtests can be run with the following command: ./tests/do_regtest.py /opt/cp2k/bin psmp -- Configuring done (14.4s) -- Generating done (0.5s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container 67b2980226d2 ---> 1edb01735b1e Step 41/46 : COPY ./benchmarks ./benchmarks ---> d5488740ff8e Step 42/46 : COPY ./tools/regtesting ./tools/regtesting ---> bf24a655c4d7 Step 43/46 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> dd73ed64049d Step 44/46 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in 1215446aebf3 ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma xsmm dbcsr_acc sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 82 input files and 0 errors. ========== Running Performance Test ========== Plot: name="total_timings_6cpu_1gpu", title="Total Timings with 6 CPU Cores and 1 GPU", ylabel="time [s]" Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.029 0.030 108.388 108.388 qs_mol_dyn_low 1 2.0 0.004 0.005 107.925 107.928 qs_forces 11 3.9 0.002 0.002 107.875 107.875 qs_energies 11 4.9 0.001 0.001 96.047 96.048 scf_env_do_scf 11 5.9 0.001 0.001 73.960 73.960 velocity_verlet 10 3.0 0.001 0.002 69.299 69.316 scf_env_do_scf_inner_loop 108 6.5 0.006 0.009 62.902 62.902 rebuild_ks_matrix 119 8.3 0.001 0.001 27.945 27.945 qs_ks_build_kohn_sham_matrix 119 9.3 0.022 0.022 27.945 27.945 dbcsr_multiply_generic 2286 12.5 0.161 0.162 26.475 26.492 qs_ks_update_qs_env 119 7.6 0.001 0.001 25.651 25.653 qs_scf_new_mos 108 7.5 0.001 0.001 21.363 21.379 qs_scf_loop_do_ot 108 8.5 0.001 0.001 21.362 21.378 qs_rho_update_rho_low 119 7.7 0.001 0.001 20.852 20.876 calculate_rho_elec 119 8.7 0.924 0.932 20.851 20.875 ot_scf_mini 108 9.5 0.003 0.003 19.332 19.332 fft_wrap_pw1pw2 1201 11.6 0.026 0.027 16.675 16.713 fft_wrap_pw1pw2_140 487 12.2 0.003 0.003 14.319 14.402 sum_up_and_integrate 119 10.3 0.003 0.003 14.341 14.380 integrate_v_rspace 119 11.3 0.380 0.382 14.213 14.252 multiply_cannon 2286 13.5 0.364 0.367 13.250 13.271 multiply_cannon_loop 2286 14.5 0.278 0.278 12.098 12.122 make_m2s 4572 13.5 0.049 0.050 11.486 11.487 ot_mini 108 10.5 0.001 0.001 11.351 11.351 make_images 4572 14.5 1.260 1.267 11.299 11.300 init_scf_run 11 5.9 0.000 0.000 11.183 11.184 scf_env_initial_rho_setup 11 6.9 0.000 0.001 11.183 11.183 density_rs2pw 119 9.7 0.009 0.009 10.871 10.975 init_scf_loop 11 6.9 0.000 0.000 10.975 10.975 grid_collocate_task_list 119 9.7 9.023 9.089 9.023 9.089 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.688 8.688 pw_gpu_r3dc1d_3d_ps 606 13.1 2.484 2.509 8.484 8.506 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.002 8.243 8.359 pw_gpu_c1dr3d_3d_ps 595 14.2 2.415 2.430 8.159 8.175 wfi_extrapolate 11 7.9 0.002 0.002 8.120 8.120 prepare_preconditioner 11 7.9 0.000 0.000 7.582 7.582 make_preconditioner 11 8.9 0.000 0.000 7.582 7.582 grid_integrate_task_list 119 12.3 7.457 7.496 7.457 7.496 qs_ot_get_derivative 108 11.5 0.002 0.002 6.836 6.838 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.376 6.636 multiply_cannon_multrec 4572 15.5 2.195 2.231 6.584 6.634 hybrid_alltoall_any 4725 16.4 5.083 5.088 6.483 6.506 potential_pw2rs 119 12.3 0.040 0.040 6.375 6.376 make_images_data 4572 15.5 0.058 0.059 6.364 6.369 parallel_gemm_fm_cosma 81 9.0 6.079 6.079 6.079 6.079 ot_diis_step 108 11.5 0.006 0.006 4.489 4.489 build_core_ppl_forces 11 5.9 4.193 4.283 4.193 4.283 build_core_hamiltonian_matrix 11 6.9 0.002 0.002 4.157 4.213 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.156 4.156 dbcsr_mm_accdrv_process 9594 16.2 0.986 0.990 3.967 3.977 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.903 3.906 apply_single 119 13.6 0.001 0.001 3.903 3.906 dbcsr_complete_redistribute 329 12.2 1.507 1.520 3.488 3.761 calculate_dm_sparse 119 9.5 0.001 0.001 3.465 3.484 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.461 3.462 mp_alltoall_z22v 1201 15.6 3.257 3.338 3.257 3.338 multiply_cannon_sync_h2d 4572 15.5 3.325 3.334 3.325 3.334 qs_create_task_list 11 7.9 0.000 0.000 3.270 3.324 generate_qs_task_list 11 8.9 1.204 1.210 3.270 3.324 qs_ot_get_p 119 10.4 0.001 0.002 3.205 3.205 pw_poisson_solve 119 10.3 0.003 0.003 3.073 3.078 cp_dbcsr_sm_fm_multiply 37 9.5 0.002 0.002 2.994 2.995 mp_waitall_1 64495 16.9 2.877 2.885 2.877 2.885 transfer_rs2pw 487 10.6 0.009 0.010 2.622 2.765 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.703 2.715 calculate_first_density_matrix 1 7.0 0.000 0.000 2.601 2.601 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.445 2.446 pw_gpu_fg 606 14.1 2.318 2.351 2.318 2.351 jit_kernel_multiply 10 15.6 2.343 2.348 2.343 2.348 transfer_rs2pw_140 130 11.5 1.665 1.675 2.191 2.339 dbcsr_special_finalize 6858 15.5 0.046 0.046 2.326 2.332 cp_fm_cholesky_invert 11 10.9 2.284 2.284 2.284 2.284 qs_ot_get_derivative_taylor 59 13.0 0.003 0.003 2.266 2.267 yz_to_x 606 14.1 0.548 0.552 2.195 2.239 x_to_yz 595 15.2 0.591 0.595 2.201 2.231 dbcsr_merge_single_wm 4572 16.5 0.163 0.167 2.190 2.195 build_core_ppl 11 7.9 2.125 2.169 2.125 2.169 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64", label="H2O-64", y=108.388, yerr=0.0 Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=76.553, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=9.023, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.457, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=6.079, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=5.083, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=4.193, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.029 0.029 101.538 101.539 qs_mol_dyn_low 1 2.0 0.004 0.004 101.096 101.099 qs_forces 11 3.9 0.002 0.002 101.049 101.049 qs_energies 11 4.9 0.001 0.001 89.181 89.182 scf_env_do_scf 11 5.9 0.001 0.001 67.300 67.301 velocity_verlet 10 3.0 0.001 0.002 65.273 65.290 scf_env_do_scf_inner_loop 96 6.5 0.005 0.008 56.381 56.382 rebuild_ks_matrix 107 8.3 0.001 0.001 26.562 26.562 qs_ks_build_kohn_sham_matrix 107 9.3 0.019 0.019 26.561 26.561 qs_ks_update_qs_env 107 7.6 0.001 0.001 24.010 24.012 dbcsr_multiply_generic 1966 12.4 0.135 0.136 23.605 23.650 qs_rho_update_rho_low 107 7.7 0.001 0.001 18.825 18.853 calculate_rho_elec 107 8.7 0.821 0.831 18.825 18.852 qs_scf_new_mos 96 7.5 0.001 0.001 18.598 18.622 qs_scf_loop_do_ot 96 8.5 0.001 0.001 18.597 18.621 ot_scf_mini 96 9.5 0.003 0.003 16.830 16.831 fft_wrap_pw1pw2 1081 11.6 0.022 0.023 14.698 14.735 sum_up_and_integrate 107 10.3 0.003 0.003 14.623 14.698 integrate_v_rspace 107 11.3 0.337 0.342 14.521 14.596 fft_wrap_pw1pw2_140 439 12.2 0.003 0.003 12.635 12.699 multiply_cannon 1966 13.4 0.311 0.317 11.950 11.991 multiply_cannon_loop 1966 14.4 0.234 0.236 10.997 11.054 init_scf_run 11 5.9 0.000 0.000 10.900 10.900 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.899 10.899 init_scf_loop 11 6.9 0.000 0.000 10.837 10.837 make_m2s 3932 13.4 0.040 0.041 10.100 10.101 make_images 3932 14.4 1.114 1.119 9.938 9.939 ot_mini 96 10.5 0.001 0.001 9.914 9.914 density_rs2pw 107 9.7 0.007 0.007 9.599 9.716 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.833 8.833 grid_integrate_task_list 107 12.3 8.554 8.634 8.554 8.634 grid_collocate_task_list 107 9.7 8.379 8.463 8.379 8.463 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 8.142 8.270 wfi_extrapolate 11 7.9 0.002 0.002 7.739 7.739 pw_gpu_r3dc1d_3d_ps 546 13.1 2.204 2.219 7.522 7.534 prepare_preconditioner 11 7.9 0.000 0.000 7.401 7.411 make_preconditioner 11 8.9 0.000 0.000 7.401 7.411 pw_gpu_c1dr3d_3d_ps 535 14.2 2.113 2.129 7.148 7.173 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.226 6.473 multiply_cannon_multrec 3932 15.4 1.947 1.974 6.234 6.285 qs_ot_get_derivative 96 11.5 0.002 0.002 6.010 6.010 hybrid_alltoall_any 4079 16.3 4.510 4.521 5.784 5.818 parallel_gemm_fm_cosma 81 9.0 5.795 5.796 5.795 5.796 make_images_data 3932 15.4 0.048 0.050 5.646 5.650 potential_pw2rs 107 12.3 0.035 0.035 5.630 5.630 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.286 4.445 build_core_ppl_forces 11 5.9 4.130 4.231 4.130 4.231 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 4.117 4.187 dbcsr_mm_accdrv_process 8450 16.1 0.949 0.962 3.916 3.930 ot_diis_step 96 11.5 0.005 0.005 3.881 3.881 dbcsr_complete_redistribute 317 12.2 1.450 1.467 3.527 3.781 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.591 3.592 qs_create_task_list 11 7.9 0.000 0.000 3.406 3.494 generate_qs_task_list 11 8.9 1.488 1.502 3.406 3.494 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.430 3.432 apply_single 107 13.6 0.001 0.001 3.430 3.431 calculate_dm_sparse 107 9.5 0.001 0.001 3.217 3.244 mp_alltoall_z22v 1081 15.6 2.887 2.936 2.887 2.936 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.935 2.935 multiply_cannon_sync_h2d 3932 15.4 2.872 2.891 2.872 2.891 qs_ot_get_p 107 10.4 0.001 0.001 2.782 2.783 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.757 2.777 calculate_first_density_matrix 1 7.0 0.000 0.000 2.686 2.686 pw_poisson_solve 107 10.3 0.003 0.003 2.677 2.682 mp_waitall_1 55487 16.8 2.525 2.562 2.525 2.562 transfer_rs2pw 439 10.6 0.008 0.008 2.374 2.553 jit_kernel_multiply 10 15.6 2.406 2.411 2.406 2.411 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.402 2.403 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.185 2.204 cp_fm_cholesky_invert 11 10.9 2.200 2.200 2.200 2.200 transfer_rs2pw_140 118 11.5 1.479 1.491 1.996 2.181 build_core_ppl 11 7.9 2.108 2.168 2.108 2.168 pw_gpu_fg 546 14.1 2.044 2.056 2.044 2.056 dbcsr_special_finalize 5898 15.4 0.039 0.039 2.036 2.038 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64_nonortho", label="H2O-64_nonortho", y=101.538, yerr=0.0 Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=70.17, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.554, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.379, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.795, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.51, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=4.13, yerr=0.0 Running GW_PBE_4benzene.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.021 0.022 181.680 181.682 qs_energies 1 2.0 0.000 0.000 181.335 181.335 mp2_main 1 3.0 0.000 0.000 174.149 174.150 mp2_gpw_main 1 4.0 0.000 0.000 172.276 172.276 rpa_ri_compute_en 1 5.0 0.000 0.000 162.847 162.847 rpa_num_int 1 6.0 0.001 0.001 162.838 162.838 parallel_gemm_fm_cosma 105 8.4 76.885 76.892 76.885 76.892 compute_mat_P_omega 1 7.0 0.001 0.002 72.075 72.075 compute_mat_P_omega_contract 10 8.0 5.720 5.762 71.334 71.341 dbt_total 2336 9.6 0.022 0.022 70.879 70.879 compute_W_cubic_GW 10 7.0 0.004 0.004 49.909 49.909 dbt_contract 787 11.0 0.051 0.051 47.947 47.950 dbt_tas_total 1149 12.2 0.144 0.145 37.461 37.461 dbt_tas_multiply 807 12.1 0.003 0.003 36.741 36.742 dbt_tas_dbm 807 14.1 0.006 0.006 28.977 28.978 dbm_multiply 807 16.1 27.649 27.856 27.649 27.856 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 25.930 25.931 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 25.568 25.569 compute_mat_P_omega_calc_M_occ 250 9.0 5.712 5.740 25.492 25.492 dbt_copy 1107 10.7 0.073 0.074 23.196 23.243 dbt_tas_mm_1N 524 15.1 0.003 0.003 18.692 18.974 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 15.969 15.969 dbt_reshape 594 11.8 6.839 7.008 14.914 15.003 compute_QP_energies 1 7.0 0.000 0.000 12.457 12.457 compute_self_energy_cubic_gw 1 8.0 0.137 0.138 12.457 12.457 dbt_tas_reserve_blocks_index 3266 14.3 0.693 0.701 10.877 10.983 dbm_reserve_blocks 3634 15.3 10.525 10.624 10.525 10.624 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 9.418 9.418 dbt_crop 1042 12.0 6.975 7.070 9.304 9.413 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 9.372 9.372 dbt_reserve_blocks_index 2347 13.0 0.348 0.351 9.154 9.179 dbt_reserve_blocks_index_array 2289 12.1 0.012 0.012 8.947 8.957 dbt_tas_mm_2 251 15.0 0.003 0.003 8.009 8.009 scf_env_do_scf 1 3.0 0.000 0.000 6.574 6.574 scf_env_do_scf_inner_loop 17 4.0 0.001 0.002 6.574 6.574 mp_waitall_2 2656 15.9 5.967 5.995 5.967 5.995 contract_cubic_gw 21 9.0 0.000 0.000 5.664 5.664 dbt_communicate_buffer 594 12.8 0.013 0.014 5.448 5.476 dbcsr_multiply_generic 30 8.1 0.003 0.003 5.331 5.368 multiply_cannon 30 9.1 0.006 0.007 5.139 5.173 multiply_cannon_loop 30 10.1 0.005 0.005 5.083 5.117 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 5.087 5.089 get_2c_integrals 1 6.0 0.000 0.000 5.055 5.055 compute_mat_P_omega_copy_M_occ 250 9.0 0.002 0.002 4.947 4.960 dbt_tas_copy 511 11.5 2.666 2.682 4.539 4.609 multiply_cannon_multrec 60 11.1 0.163 0.175 4.477 4.479 dbcsr_mm_accdrv_process 328 12.3 0.044 0.046 4.138 4.139 jit_kernel_multiply 18 11.7 4.087 4.087 4.087 4.087 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="GW_PBE_4benzene", label="GW_PBE_4benzene", y=181.68, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_6cpu_1gpu", title="Timings of GW_PBE_4benzene with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="rest", label="rest", y=52.80699999999999, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=76.885, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=27.649, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=10.525, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_crop", label="dbt_crop", y=6.975, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=6.839, yerr=0.0 Running RI-HFX_H2O-32.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.024 0.025 204.419 204.419 qs_forces 1 2.0 0.000 0.000 203.951 203.951 rebuild_ks_matrix 7 6.6 0.000 0.000 199.316 199.317 qs_ks_build_kohn_sham_matrix 7 7.6 0.002 0.002 199.316 199.317 hfx_ks_matrix 7 8.6 0.000 0.000 195.235 195.244 dbt_total 849 11.0 0.010 0.010 145.113 145.113 hfx_ri_update_ks 7 9.6 0.000 0.000 111.347 111.348 hfx_ri_update_ks_Pmat 7 10.6 23.686 23.730 111.343 111.343 qs_energies 1 3.0 0.000 0.000 106.325 106.325 scf_env_do_scf 1 4.0 0.000 0.000 104.044 104.044 qs_ks_update_qs_env 8 6.0 0.000 0.000 101.744 101.744 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 97.580 97.580 dbt_contract 207 12.4 0.051 0.052 85.464 85.465 hfx_ri_update_forces 1 7.0 1.239 1.247 83.885 83.894 dbt_tas_total 369 13.4 0.081 0.082 71.018 71.019 dbt_tas_multiply 216 13.5 0.001 0.001 68.145 68.145 scf_env_do_scf_inner_loop 6 5.0 0.000 0.001 56.154 56.154 dbt_copy 423 11.8 0.048 0.049 54.833 55.743 dbt_tas_dbm 216 15.5 0.002 0.002 54.150 54.150 dbm_multiply 216 17.5 51.051 51.107 51.051 51.107 hfx_ri_forces_Pmat_3c 1 8.0 3.590 3.624 49.332 49.353 init_scf_loop 2 5.0 0.000 0.000 47.888 47.888 dbt_reshape 175 13.2 19.161 19.179 41.301 41.697 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 32.244 32.244 precalc_derivatives 1 8.0 2.010 2.015 27.945 27.946 dbt_tas_mm_2 91 16.5 0.001 0.001 22.493 22.493 mp_waitall_2 1022 16.5 19.450 19.592 19.450 19.592 dbt_tas_reserve_blocks_index 1323 15.4 1.817 1.821 18.478 18.950 dbm_reserve_blocks 1491 16.3 17.417 17.893 17.417 17.893 dbt_tas_mm_3T 77 17.1 0.001 0.001 17.187 17.488 dbt_crop 372 13.7 13.459 13.466 17.351 17.395 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 16.701 16.701 dbt_communicate_buffer 175 14.2 0.005 0.005 16.059 16.180 dbt_reserve_blocks_index 889 14.5 0.663 0.666 15.220 15.392 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 15.196 15.196 dbt_reserve_blocks_index_array 859 13.5 0.008 0.008 14.934 15.102 build_3c_derivatives 3 9.0 2.317 2.350 14.686 14.686 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 14.581 14.581 dbt_tas_mm_3N 37 15.4 0.000 0.000 11.957 12.084 dbt_tas_copy 248 12.5 4.322 4.452 7.937 8.372 mp_sync 2901 12.8 6.536 7.566 6.536 7.566 hfx_ri_pre_scf_Pmat_int 1 13.0 0.000 0.000 5.402 5.402 dbt_tas_replicate 168 15.1 2.336 2.386 4.945 4.990 hfx_ri_pre_scf_calc_tensors 1 14.0 0.004 0.004 4.663 4.663 hfx_ri_pre_scf_Pmat_copy_2 9 13.0 1.817 1.820 4.606 4.609 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-HFX_H2O-32", label="RI-HFX_H2O-32", y=204.419, yerr=0.0 Plot: name="RI-HFX_H2O-32_timings_6cpu_1gpu", title="Timings of RI-HFX_H2O-32 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="rest", label="rest", y=73.654, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=51.051, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=23.686, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=19.45, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=19.161, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=17.417, yerr=0.0 Running RI-MP2_ammonia.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.011 0.013 111.970 111.971 qs_energies 1 2.0 0.000 0.000 111.761 111.762 mp2_main 1 3.0 0.000 0.001 103.709 103.710 mp2_gpw_main 1 4.0 0.002 0.002 103.316 103.317 mp2_ri_gpw_compute_in 1 5.0 0.574 0.580 58.332 58.362 mp2_ri_gpw_compute_in_loop 1 6.0 0.015 0.016 49.821 49.845 mp2_ri_gpw_compute_en 1 5.0 0.093 0.093 44.917 44.944 mp2_ri_gpw_compute_en_RI_loop 1 6.0 12.991 12.995 42.165 42.165 dbcsr_multiply_generic 2666 8.0 0.182 0.183 25.604 25.901 ao_to_mo_and_store_B_mult_1 1328 7.0 0.016 0.016 24.181 24.478 mp2_eri_3c_integrate_gpw 1328 7.0 0.020 0.020 19.423 19.732 mp2_ri_gpw_compute_en_expansio 1040 7.0 0.802 0.805 17.389 17.432 local_gemm 1040 8.0 16.587 16.627 16.587 16.627 make_m2s 5332 9.0 0.060 0.061 13.906 13.923 make_images 5332 10.0 2.470 2.470 13.707 13.726 multiply_cannon 2666 9.0 0.446 0.465 10.963 11.270 integrate_v_rspace 1338 8.0 1.120 1.131 10.848 10.913 multiply_cannon_loop 2666 10.0 0.204 0.204 9.756 10.042 hybrid_alltoall_any 6683 11.6 9.216 9.258 9.487 9.528 make_images_data 5332 11.0 0.069 0.070 9.401 9.443 grid_integrate_task_list 1338 9.0 8.381 8.454 8.381 8.454 fft_wrap_pw1pw2 26668 10.4 0.150 0.153 7.963 8.208 get_2c_integrals 1 6.0 0.004 0.004 7.932 7.937 collocate_function 1328 8.0 5.180 5.200 7.346 7.597 compute_2c_integrals 1 7.0 0.008 0.008 7.303 7.303 compute_2c_integrals_loop_lm 1 8.0 0.022 0.022 7.190 7.203 mp2_eri_2c_integrate_gpw 1 9.0 2.111 2.124 7.168 7.181 scf_env_do_scf 1 3.0 0.000 0.000 7.083 7.085 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 7.083 7.084 ao_to_mo_and_store_B_E_Ex_1 1328 7.0 3.897 3.912 5.911 5.942 mp2_ri_gpw_compute_en_ener 1040 7.0 5.573 5.642 5.573 5.642 multiply_cannon_multrec 2676 11.0 2.586 2.699 5.457 5.575 qs_scf_new_mos 10 5.0 0.000 0.000 5.510 5.512 mp2_ri_gpw_compute_en_comm 221 7.0 1.110 1.120 4.971 5.088 fft_wrap_pw1pw2_20 10647 11.4 0.023 0.023 4.614 4.853 pw_gpu_r3dc1d_3d 13282 12.2 3.963 4.219 3.963 4.219 eigensolver 11 5.8 0.002 0.002 3.096 3.097 mp_sendrecv_dm3 442 8.0 2.717 2.818 2.717 2.818 potential_pw2rs 2666 10.0 0.104 0.107 2.763 2.767 pw_gpu_c1dr3d_3d 13280 12.7 2.723 2.735 2.723 2.735 dbcsr_mm_accdrv_process 5392 12.0 0.884 1.521 2.588 2.589 copy_dbcsr_to_fm 1351 8.0 0.035 0.036 2.453 2.464 cp_fm_diag_elpa 11 6.8 0.000 0.000 2.428 2.429 cp_fm_diag_elpa_base 11 7.8 2.341 2.359 2.427 2.427 fft_wrap_pw1pw2_10 15957 11.5 0.021 0.021 2.385 2.390 fill_local_i_aL 884 7.5 2.385 2.389 2.385 2.389 collocate_single_gaussian 1328 10.0 0.098 0.099 2.364 2.383 replicate_iaK_2intgroup 1 6.0 2.194 2.197 2.338 2.340 mp2_eri_2c_integrate_gpw_pot_l 1328 10.0 0.004 0.004 2.297 2.299 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-MP2_ammonia", label="RI-MP2_ammonia", y=111.97, yerr=0.0 Plot: name="RI-MP2_ammonia_timings_6cpu_1gpu", title="Timings of RI-MP2_ammonia with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="rest", label="rest", y=59.222, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="local_gemm", label="local_gemm", y=16.587, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=12.991, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=9.216, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.381, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_ener", label="mp2_ri_gpw_compute_en_ener", y=5.573, yerr=0.0 Running diag_cu144_broy.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/diag_cu144_broy_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.090 0.091 209.317 209.317 qs_energies 1 2.0 0.000 0.000 208.200 208.201 scf_env_do_scf 1 3.0 0.000 0.000 193.780 193.780 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 193.779 193.780 qs_ks_update_qs_env 15 5.0 0.000 0.000 96.521 96.532 rebuild_ks_matrix 15 6.0 0.000 0.000 96.307 96.319 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 96.307 96.319 qs_vxc_create 15 8.0 0.036 0.071 59.769 59.783 qs_scf_new_mos 15 5.0 0.000 0.001 55.072 55.100 fft_wrap_pw1pw2 1086 10.0 0.030 0.030 51.831 51.875 calculate_dispersion_nonloc 15 9.0 11.182 11.202 51.435 51.489 eigensolver 15 6.0 0.002 0.003 45.285 45.426 qs_rho_update_rho_low 16 5.0 0.000 0.000 40.332 40.334 calculate_rho_elec 16 6.0 0.187 0.187 40.332 40.333 sum_up_and_integrate 15 8.0 0.000 0.000 35.001 35.010 integrate_v_rspace 15 9.0 0.049 0.049 34.974 34.982 grid_collocate_task_list 16 7.0 28.620 28.637 28.620 28.637 cp_fm_diag_elpa 15 7.0 0.000 0.000 28.013 28.019 cp_fm_diag_elpa_base 15 8.0 26.235 26.758 28.008 28.008 grid_integrate_task_list 15 10.0 27.799 27.810 27.799 27.810 pw_gpu_c1dr3d_3d_ps 585 12.1 5.787 5.870 27.032 27.098 fft_wrap_pw1pw2_150 765 11.0 0.005 0.005 26.691 26.730 pw_gpu_r3dc1d_3d_ps 501 11.9 4.947 5.215 24.763 24.785 cp_fm_cholesky_restore 45 7.0 15.359 16.045 15.359 16.045 fft_wrap_pw1pw2_200 197 11.3 0.001 0.001 12.506 12.625 density_rs2pw 16 7.0 0.002 0.002 11.511 11.517 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 10.397 10.397 vdW_energy 15 10.0 9.582 9.590 9.582 9.590 pw_gpu_ffc 585 13.1 9.207 9.230 9.207 9.230 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 8.923 8.961 pw_gpu_cff 501 12.9 8.736 8.745 8.736 8.745 xc_vxc_pw_create 15 9.0 0.187 0.189 8.299 8.304 mp_alltoall_z22v 1086 14.0 6.960 7.314 6.960 7.314 pw_gpu_sf 585 13.1 7.201 7.204 7.201 7.204 potential_pw2rs 15 10.0 0.007 0.007 7.126 7.146 pw_gpu_fg 501 12.9 6.837 6.861 6.837 6.861 copy_dbcsr_to_fm 16 5.9 0.001 0.001 6.755 6.818 dbcsr_complete_redistribute 46 8.3 1.914 2.025 5.804 5.929 fft_wrap_pw1pw2_10 62 10.5 0.000 0.000 5.734 5.735 build_core_ppnl 1 5.0 5.076 5.120 5.076 5.120 xc_rho_set_and_dset_create 15 10.0 0.137 0.139 4.919 4.941 x_to_yz 585 13.1 1.106 1.119 4.800 4.836 xc_pw_derive 90 11.0 0.001 0.001 4.765 4.787 cp_fm_uplo_to_full 30 8.0 3.682 4.744 3.682 4.744 yz_to_x 501 12.9 0.917 0.918 4.183 4.488 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="diag_cu144_broy", label="diag_cu144_broy", y=209.317, yerr=0.0 Plot: name="diag_cu144_broy_timings_6cpu_1gpu", title="Timings of diag_cu144_broy with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="rest", label="rest", y=100.12200000000001, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=28.62, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=27.799, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=26.235, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=15.359, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="calculate_dispersion_nonloc", label="calculate_dispersion_nonloc", y=11.182, yerr=0.0 Running bench_dftb.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/bench_dftb_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.050 0.051 279.945 279.945 qs_energies 1 2.0 0.000 0.000 279.817 279.817 ls_scf 1 3.0 0.000 0.000 278.963 278.963 ls_scf_main 1 4.0 0.001 0.002 268.915 268.916 density_matrix_trs4 11 5.0 0.008 0.008 223.212 223.231 dbcsr_multiply_generic 185 6.1 0.345 0.350 182.332 182.390 multiply_cannon 185 7.1 2.199 2.404 127.240 128.070 multiply_cannon_loop 185 8.1 0.354 0.354 112.369 112.902 multiply_cannon_multrec 370 9.1 86.033 86.081 95.965 96.035 make_m2s 370 7.1 0.031 0.032 46.248 46.301 make_images 370 8.1 11.320 11.792 45.145 45.195 ls_scf_dm_to_ks 11 5.0 0.000 0.000 41.002 41.020 matrix_ls_to_qs 11 6.0 0.000 0.000 37.686 37.928 dbcsr_complete_redistribute 23 7.5 23.187 23.555 32.098 32.476 matrix_decluster 11 7.0 0.000 0.000 29.055 29.431 arnoldi_extremal 12 6.1 0.000 0.000 24.103 24.105 arnoldi_normal_ev 12 7.1 0.010 0.010 24.102 24.104 build_subspace 23 8.1 0.065 0.067 23.590 23.591 dbcsr_matrix_vector_mult 652 9.0 0.157 0.161 22.198 22.535 dbcsr_matrix_vector_mult_local 652 10.0 21.147 21.488 21.156 21.497 make_images_data 370 9.1 0.014 0.014 17.351 17.645 hybrid_alltoall_any 393 9.9 12.116 12.152 16.807 17.101 calculate_norms 740 9.1 15.394 15.875 15.394 15.875 dbcsr_finalize 559 7.6 0.223 0.226 14.750 14.949 dbcsr_merge_all 510 8.6 2.480 2.599 13.499 13.665 dbcsr_copy 761 7.5 1.756 1.771 10.408 10.558 dbcsr_special_finalize 555 9.1 0.011 0.011 9.931 9.946 setup_rec_index_2d 370 8.1 9.744 9.807 9.744 9.807 dbcsr_dot 144 6.3 8.185 8.238 9.112 9.340 dbcsr_sort_indices 1283 10.0 9.267 9.297 9.267 9.297 dbcsr_add_d 280 6.0 0.001 0.001 8.674 8.847 dbcsr_add_anytype 280 7.0 3.939 3.955 8.672 8.846 dbcsr_copy_into_existing 11 8.0 8.629 8.764 8.630 8.764 ls_scf_init_scf 1 4.0 0.000 0.000 8.412 8.412 ls_scf_init_matrix_S 1 5.0 0.000 0.000 7.967 7.972 dbcsr_mm_accdrv_process 14501 10.0 0.876 0.897 7.843 7.888 tree_to_linear_d 23 10.5 7.379 7.422 7.379 7.422 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.000 7.177 7.180 dbcsr_mm_accdrv_process_sort 14501 11.0 6.967 6.991 6.967 6.991 dbcsr_merge_single_wm 370 10.1 0.567 0.569 6.397 6.397 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="bench_dftb", label="bench_dftb", y=279.945, yerr=0.0 Plot: name="bench_dftb_timings_6cpu_1gpu", title="Timings of bench_dftb with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="rest", label="rest", y=122.06799999999998, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=86.033, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=23.187, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=21.147, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="calculate_norms", label="calculate_norms", y=15.394, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=12.116, yerr=0.0 Running dbcsr.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/dbcsr_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.004 0.005 50.981 50.981 lib_test 1 2.0 0.000 0.000 50.973 50.975 dbcsr_run_tests 3 3.0 0.000 0.001 50.972 50.974 test_multiplies_multiproc 3 4.0 0.001 0.001 39.350 39.413 dbcsr_multiply_generic 9 5.0 0.002 0.002 30.753 30.760 multiply_cannon 9 6.0 0.200 0.201 20.505 21.060 multiply_cannon_loop 9 7.0 0.003 0.003 19.031 19.469 multiply_cannon_multrec 18 8.0 10.210 10.657 17.778 18.184 dbcsr_make_random_matrix 9 4.0 7.741 7.795 11.467 11.528 dbcsr_finalize 27 5.7 0.001 0.001 7.947 7.962 dbcsr_merge_all 18 6.5 3.720 3.728 7.823 7.843 dbcsr_mm_accdrv_process 8199 9.0 1.509 1.654 7.337 7.367 dbcsr_redistribute 9 5.0 3.637 3.639 5.888 5.899 make_m2s 18 6.0 0.001 0.001 5.180 5.189 make_images 18 7.0 0.381 0.386 5.145 5.155 dbcsr_mm_accdrv_process_sort 8199 10.0 4.997 5.027 4.997 5.027 make_images_data 18 8.0 0.001 0.001 2.933 2.943 hybrid_alltoall_any 18 9.0 2.534 2.540 2.899 2.909 dbcsr_data_copy_aa2 18 7.5 2.000 2.008 2.000 2.008 mp_alltoall_d11v 27 6.0 1.983 1.996 1.983 1.996 tree_to_linear_d 9 7.0 1.960 1.962 1.960 1.962 dbcsr_data_release 507 7.7 1.439 1.440 1.439 1.440 dbcsr_data_new 354 7.4 1.217 1.341 1.217 1.341 mp_sum_l 61 4.9 0.598 1.154 0.598 1.154 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.598 1.153 dbcsr_checksum 6 5.0 1.090 1.112 1.112 1.112 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="dbcsr", label="dbcsr", y=50.981, yerr=0.0 Plot: name="dbcsr_timings_6cpu_1gpu", title="Timings of dbcsr with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="rest", label="rest", y=20.676000000000002, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=10.21, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=7.741, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_mm_accdrv_process_sort", label="dbcsr_mm_accdrv_process_sort", y=4.997, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.72, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_redistribute", label="dbcsr_redistribute", y=3.637, yerr=0.0 Running MQAE_single_node.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/MQAE_single_node_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.046 0.047 215.123 215.124 qs_mol_dyn_low 1 2.0 0.004 0.004 213.504 213.548 qs_forces 6 3.8 0.001 0.001 132.957 132.957 qs_energies 6 4.8 0.001 0.001 125.515 125.515 scf_env_do_scf 6 5.8 0.000 0.000 117.385 117.385 scf_env_do_scf_inner_loop 113 6.2 0.006 0.009 108.852 108.853 velocity_verlet 5 3.0 0.003 0.003 102.018 102.075 rebuild_ks_matrix 119 8.1 0.001 0.001 89.849 89.851 qs_ks_build_kohn_sham_matrix 119 9.1 0.022 0.022 89.849 89.850 qs_ks_update_qs_env 119 7.3 0.001 0.001 84.816 84.817 fft_wrap_pw1pw2 2059 12.4 0.049 0.052 69.829 69.849 fft_wrap_pw1pw2_150 1321 13.9 0.010 0.010 66.900 66.950 qs_vxc_create 119 10.1 0.002 0.002 57.707 57.707 xc_vxc_pw_create 119 11.1 1.646 1.648 57.705 57.706 qmmm_el_coupling 6 3.8 0.000 0.000 42.974 42.979 qmmm_elec_with_gaussian 6 4.8 0.024 0.024 42.967 42.972 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 41.450 41.517 xc_pw_derive 714 13.1 0.013 0.013 39.868 39.937 pw_gpu_c1dr3d_3d_ps 1095 14.8 11.245 11.287 37.669 37.736 qmmm_elec_gaussian_low_G 6 6.8 36.441 36.508 36.441 36.508 qmmm_forces 6 3.8 0.002 0.002 34.797 34.797 qmmm_forces_with_gaussian 6 4.8 0.027 0.028 33.946 34.215 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 32.538 32.803 pw_gpu_r3dc1d_3d_ps 964 14.0 10.009 10.025 32.096 32.146 xc_rho_set_and_dset_create 119 12.1 2.635 2.640 29.048 29.051 qmmm_forces_gaussian_low_G 6 6.8 27.214 27.509 27.214 27.509 xc_pw_divergence 119 12.1 0.007 0.007 26.535 26.540 qs_rho_update_rho_low 119 7.3 0.001 0.001 23.325 23.398 calculate_rho_elec 119 8.3 1.152 1.153 23.324 23.397 density_rs2pw 119 9.3 0.009 0.009 17.065 17.220 sum_up_and_integrate 119 10.1 0.003 0.003 14.571 14.579 dbcsr_multiply_generic 2598 12.3 0.103 0.104 14.346 14.440 integrate_v_rspace 119 11.1 0.023 0.023 14.343 14.352 mp_alltoall_z22v 2059 16.4 13.379 13.438 13.379 13.438 multiply_cannon 2598 13.3 0.232 0.232 12.664 12.699 multiply_cannon_loop 2598 14.3 0.252 0.253 12.167 12.197 potential_pw2rs 119 12.1 0.036 0.037 10.329 10.330 x_to_yz 1095 15.8 2.783 2.799 10.101 10.155 multiply_cannon_multrec 5196 15.3 4.152 4.188 9.987 10.025 qs_ks_ddapc 119 10.1 0.003 0.003 9.460 9.470 pw_gpu_sf 1095 15.8 9.241 9.270 9.241 9.270 pw_gpu_fg 964 15.0 8.639 8.752 8.639 8.752 init_scf_loop 6 6.8 0.000 0.000 8.530 8.530 yz_to_x 964 15.0 2.096 2.101 8.156 8.183 qs_scf_new_mos 113 7.2 0.001 0.001 7.432 7.433 qs_scf_loop_do_ot 113 8.2 0.001 0.001 7.431 7.431 ot_scf_mini 113 9.2 0.002 0.002 7.136 7.138 pw_gpu_ffc 1095 15.8 7.062 7.063 7.062 7.063 init_scf_run 6 5.8 0.000 0.000 5.822 5.822 scf_env_initial_rho_setup 6 6.8 0.000 0.000 5.822 5.822 dbcsr_mm_accdrv_process 13992 16.0 0.545 0.550 5.767 5.767 xc_functional_eval 238 13.1 0.004 0.004 5.584 5.586 pw_poisson_solve 125 9.9 0.004 0.004 5.429 5.439 qmmm_forces_gaussian_low_R 6 6.8 0.000 0.000 5.324 5.354 qmmm_forces_with_gaussian_LG 6 7.8 5.324 5.353 5.324 5.353 pw_gpu_cff 964 15.0 5.220 5.270 5.220 5.270 jit_kernel_multiply 24 14.7 5.172 5.176 5.172 5.176 grid_collocate_task_list 119 9.3 5.069 5.157 5.069 5.157 ot_mini 113 10.2 0.001 0.001 5.103 5.104 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 5.066 5.066 qmmm_elec_gaussian_low_R 6 6.8 0.000 0.000 5.009 5.010 qmmm_elec_with_gaussian_LG 6 7.8 5.009 5.010 5.009 5.010 pw_derive 1089 13.4 4.924 4.982 4.924 4.982 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="MQAE_single_node", label="MQAE_single_node", y=215.123, yerr=0.0 Plot: name="MQAE_single_node_timings_6cpu_1gpu", title="Timings of MQAE_single_node with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="rest", label="rest", y=116.835, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=36.441, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=27.214, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=13.379, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=11.245, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=10.009, yerr=0.0 Summary: Performance test took 24 minutes. Status: OK ---> Removed intermediate container 1215446aebf3 ---> 50b72528b22f Step 45/46 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in a3499afb3b9d ---> Removed intermediate container a3499afb3b9d ---> f57ff8ff05f1 Step 46/46 : ENTRYPOINT [] ---> Running in d33507dd86d4 ---> Removed intermediate container d33507dd86d4 ---> 3b581acfd314 [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built 3b581acfd314 Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2026-03-15 06:51:23+00:00