StartDate: 2026-03-22 06:06:30+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: 34a1bdb69b24de0dec1ae29b17dd97ac205308d2 CommitTime: 2026-03-21 22:41:24 +0100 CommitAuthor: HE Zilong CommitSubject: Add explanation and error message about CELL/SYMMETRY (#4974) #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=34a1bdb69b24de0dec1ae29b17dd97ac205308d2 SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 413.3MB Step 1/46 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 6e8af4fd0a07: Waiting 15a17189b2df: Waiting 02cb0e091e33: Waiting 6cb9b761b877: Waiting 9c3d619183d2: Waiting 7f7602a82106: Waiting 5a2aba542b08: Waiting 2cd52cbb1ebe: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 32f112e3802c: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 2cd52cbb1ebe: Verifying Checksum 2cd52cbb1ebe: Download complete 9c3d619183d2: Download complete 02cb0e091e33: Verifying Checksum 02cb0e091e33: Download complete 32f112e3802c: Pull complete 7f7602a82106: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Download complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/46 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/46 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/46 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/46 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/46 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/46 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> 1ff2ec46e723 Step 8/46 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> bf4865207130 Step 9/46 : RUN mkdir scripts ---> Using cache ---> 95733bd3ea48 Step 10/46 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./tools/toolchain/scripts/generate_cmake_options.sh ./scripts/ ---> Using cache ---> 8e4844ef0a17 Step 11/46 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> f8a8e707850e Step 12/46 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --gpu-ver=V100 --dry-run --list-cmake-options=no ---> Using cache ---> ba35d39457b4 Step 13/46 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> 635921847003 Step 14/46 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> 70bf58f04432 Step 15/46 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> af4bedfc388b Step 16/46 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> c0705190c385 Step 17/46 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> 8a9485d84586 Step 18/46 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> 32bf2d1e84a0 Step 19/46 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> d1eecf83246f Step 20/46 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> c0079883b66e Step 21/46 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> 6474a5131cb4 Step 22/46 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> 575c2a399a42 Step 23/46 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> Using cache ---> 1b8fddcb5f01 Step 24/46 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Using cache ---> 9233c2db4103 Step 25/46 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> Using cache ---> 4358c686ac5a Step 26/46 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Using cache ---> d9ade9e5c8d0 Step 27/46 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> Using cache ---> dd268d59bfa4 Step 28/46 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Using cache ---> 7ff7a409fc89 Step 29/46 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> Using cache ---> af5a129774a6 Step 30/46 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Using cache ---> 0743c14c8501 Step 31/46 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> Using cache ---> 1adc868be0a1 Step 32/46 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Using cache ---> ccf55d1228f4 Step 33/46 : WORKDIR /opt/cp2k ---> Using cache ---> e585f4c2246a Step 34/46 : COPY ./src ./src ---> 6ce112bd6b6e Step 35/46 : COPY ./data ./data ---> 0c756b171639 Step 36/46 : COPY ./tools/build_utils ./tools/build_utils ---> 8d5cb606c9c0 Step 37/46 : COPY ./cmake ./cmake ---> d4ac077600ce Step 38/46 : COPY ./CMakeLists.txt . ---> a6dea9052808 Step 39/46 : COPY ./tools/docker/scripts/build_cp2k.sh . ---> 05f2c6cc6c2b Step 40/46 : RUN ./build_cp2k.sh toolchain_cuda_V100 psmp ---> Running in f5368320251e ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpi.so (found version "4.1") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpicxx.so (found version "4.1") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpifort.so (found version "4.1") -- Found MPI: TRUE (found version "4.1") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "4.1") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.31 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.31/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.31/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.31/lib/libopenblas.so -- Checking for module 'libxsmm-shared' -- Found libxsmm-shared, version 1.17.0 -- Checking for module 'libxsmmf-shared' -- Found libxsmmf-shared, version 1.17.0 -- Checking for module 'libxsmmext-shared' -- Found libxsmmext-shared, version 1.17.0 -- Checking for module 'libxsmmnoblas-shared' -- Found libxsmmnoblas-shared, version 1.17.0 -- Found LibXSMM: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include -- Using LIBXSMM for Small Matrix Multiplication -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a -- CP2K_WITH_GPU is deprecated in favor of CMAKE_HIP_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 70 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "4.1") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Checking for module 'libxsmmf' -- Found libxsmmf, version 1.17.0 -- Checking for module 'libxsmmext' -- Found libxsmmext, version 1.17.0 ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "2.1.0") found components: C Fortran -- Found MPI: TRUE (found version "4.1") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.31/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.31/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Checking for module 'libint2' -- Package 'libint2', required by 'virtual:world', not found -- Found Libint2: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.7.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "2.1.0") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.31/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.31/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'libvdwxc>=0.3.0' -- Looking for vdwxc_init_mpi -- Looking for vdwxc_init_mpi - not found -- Found LibVDWXC: /opt/cp2k-toolchain/install/libvdwxc-0.4.0/lib/libvdwxc.a (Required is at least version "0.3.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - vendor: OpenBLAS - include directories: /opt/cp2k-toolchain/install/openblas-0.3.31/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.31/lib/libopenblas.so - LAPACK - include directories: /opt/cp2k-toolchain/install/openblas-0.3.31/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.31/lib/libopenblas.so - MPI - include directories: /opt/cp2k-toolchain/install/mpich-4.3.2/include - libraries: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpi.so - MPI_F08: ON - ScaLAPACK - vendor: auto - include directories: - libraries: /opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a - Hardware Acceleration: - CUDA: - GPU architecture number: 70 - GPU profiling enabled: - GPU accelerated modules - ELPA module: ON - GRID module: ON - DBM module: ON - PW module: ON - LibXC - version: 7.0.0 - include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - version: 2.1.0 - include directories: /opt/cp2k-toolchain/install/hdf5-2.1.0/include - libraries: hdf5-shared - FFTW3 - include directories: /opt/cp2k-toolchain/install/fftw-3.3.10/include - libraries: /opt/cp2k-toolchain/install/fftw-3.3.10/lib/libfftw3.a - LIBXSMM - include directories: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include - libraries: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmext.so;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so;/opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmf.so;:libxsmmext.a;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so - SpLA - include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - SIRIUS - include directories: - libraries: - COSMA - include directories: /opt/cp2k-toolchain/install/COSMA-2.8.1/include - libraries: MPI::MPI_CXX;costa::costa;$;$;$<$:cosma::BLAS::blas>;$;$<$:Tiled-MM::Tiled-MM>;$<$:Tiled-MM::Tiled-MM>;$<$:semiprof::semiprof>;$<$:cosma::scalapack::scalapack> - Libint2 - include directories: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include - libraries: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/lib/libint2.a - ELPA - include directories: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/include/elpa_openmp-2026.02.001 - libraries: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a;:libopenblas.a -------------------------------------------------------------------- - - - List of dependencies not included in this build - - - -------------------------------------------------------------------- - DFTD4 - DeePMD - PEXSI - ACE (libpace) - TBLITE - Spglib - LibSMEAGOL - MiMiC - openPMD - DLA-Future - PLUMED - Libvori - LibTorch - TREXIO - GreenX After building and installing CP2K the regtests can be run with the following command: /opt/cp2k/tests/do_regtest.py /opt/cp2k/bin psmp -- Configuring done (14.7s) -- Generating done (0.5s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container f5368320251e ---> a69d3403f616 Step 41/46 : COPY ./benchmarks ./benchmarks ---> 0b860efb08e6 Step 42/46 : COPY ./tools/regtesting ./tools/regtesting ---> 4b0d2bfffa83 Step 43/46 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> 4bca20756d84 Step 44/46 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in 9986076c1530 ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma xsmm dbcsr_acc sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 82 input files and 0 errors. ========== Running Performance Test ========== Plot: name="total_timings_6cpu_1gpu", title="Total Timings with 6 CPU Cores and 1 GPU", ylabel="time [s]" Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.027 0.028 100.683 100.684 qs_mol_dyn_low 1 2.0 0.004 0.004 100.276 100.277 qs_forces 11 3.9 0.002 0.002 100.229 100.230 qs_energies 11 4.9 0.001 0.001 89.131 89.135 scf_env_do_scf 11 5.9 0.001 0.001 68.891 68.892 velocity_verlet 10 3.0 0.001 0.002 64.254 64.272 scf_env_do_scf_inner_loop 108 6.5 0.006 0.008 58.643 58.644 rebuild_ks_matrix 119 8.3 0.001 0.001 25.827 25.829 qs_ks_build_kohn_sham_matrix 119 9.3 0.019 0.019 25.827 25.828 dbcsr_multiply_generic 2286 12.5 0.140 0.143 24.413 24.441 qs_ks_update_qs_env 119 7.6 0.001 0.001 23.686 23.686 qs_scf_new_mos 108 7.5 0.001 0.001 19.891 19.904 qs_scf_loop_do_ot 108 8.5 0.001 0.001 19.890 19.904 qs_rho_update_rho_low 119 7.7 0.001 0.001 19.718 19.740 calculate_rho_elec 119 8.7 0.864 0.876 19.718 19.739 ot_scf_mini 108 9.5 0.003 0.003 18.013 18.014 fft_wrap_pw1pw2 1201 11.6 0.023 0.024 15.266 15.293 sum_up_and_integrate 119 10.3 0.002 0.002 13.665 13.710 integrate_v_rspace 119 11.3 0.349 0.353 13.577 13.622 fft_wrap_pw1pw2_140 487 12.2 0.003 0.003 13.131 13.188 multiply_cannon 2286 13.5 0.332 0.335 12.388 12.402 multiply_cannon_loop 2286 14.5 0.252 0.253 11.347 11.374 ot_mini 108 10.5 0.001 0.001 10.484 10.484 make_m2s 4572 13.5 0.042 0.042 10.448 10.454 make_images 4572 14.5 1.163 1.178 10.278 10.284 init_scf_run 11 5.9 0.000 0.000 10.176 10.176 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.176 10.176 init_scf_loop 11 6.9 0.000 0.000 10.170 10.170 density_rs2pw 119 9.7 0.007 0.007 9.873 9.943 grid_collocate_task_list 119 9.7 8.957 8.998 8.957 8.998 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.053 8.053 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.686 7.841 pw_gpu_r3dc1d_3d_ps 606 13.1 2.249 2.277 7.813 7.832 grid_integrate_task_list 119 12.3 7.387 7.434 7.387 7.434 pw_gpu_c1dr3d_3d_ps 595 14.2 2.196 2.217 7.424 7.433 wfi_extrapolate 11 7.9 0.001 0.001 7.345 7.346 prepare_preconditioner 11 7.9 0.000 0.000 7.027 7.029 make_preconditioner 11 8.9 0.000 0.000 7.027 7.029 qs_ot_get_derivative 108 11.5 0.002 0.002 6.281 6.281 multiply_cannon_multrec 4572 15.5 2.172 2.209 6.242 6.278 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.878 6.129 hybrid_alltoall_any 4725 16.4 4.763 4.768 6.072 6.077 make_images_data 4572 15.5 0.052 0.052 5.957 5.959 potential_pw2rs 119 12.3 0.035 0.036 5.840 5.841 parallel_gemm_fm_cosma 81 9.0 5.390 5.390 5.390 5.390 ot_diis_step 108 11.5 0.005 0.005 4.178 4.178 build_core_ppl_forces 11 5.9 3.933 4.054 3.933 4.054 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.841 3.885 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.847 3.864 dbcsr_mm_accdrv_process 9594 16.2 0.740 0.743 3.675 3.690 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.634 3.636 apply_single 119 13.6 0.001 0.001 3.634 3.636 dbcsr_complete_redistribute 329 12.2 1.398 1.403 3.294 3.547 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.250 3.250 calculate_dm_sparse 119 9.5 0.001 0.001 3.197 3.214 multiply_cannon_sync_h2d 4572 15.5 3.117 3.169 3.117 3.169 mp_alltoall_z22v 1201 15.6 3.060 3.160 3.060 3.160 qs_ot_get_p 119 10.4 0.001 0.001 3.132 3.134 qs_create_task_list 11 7.9 0.000 0.000 3.032 3.058 generate_qs_task_list 11 8.9 1.157 1.167 3.032 3.058 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.760 2.761 mp_waitall_1 64495 16.9 2.662 2.715 2.662 2.715 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.549 2.561 pw_poisson_solve 119 10.3 0.003 0.003 2.552 2.557 transfer_rs2pw 487 10.6 0.007 0.008 2.364 2.466 calculate_first_density_matrix 1 7.0 0.000 0.000 2.395 2.396 jit_kernel_multiply 11 15.7 2.336 2.345 2.336 2.345 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.261 2.261 pw_gpu_fg 606 14.1 2.167 2.200 2.167 2.200 qs_ot_get_derivative_taylor 59 13.0 0.003 0.003 2.102 2.103 transfer_rs2pw_140 130 11.5 1.500 1.506 1.973 2.078 cp_fm_cholesky_invert 11 10.9 2.058 2.058 2.058 2.058 build_core_ppl 11 7.9 2.003 2.050 2.003 2.050 qs_ot_p2m_diag 50 11.0 0.087 0.089 2.034 2.036 yz_to_x 606 14.1 0.432 0.436 1.988 2.036 dbcsr_special_finalize 6858 15.5 0.041 0.042 2.014 2.019 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.008 2.016 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64", label="H2O-64", y=100.683, yerr=0.0 Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=70.25300000000001, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.957, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.387, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.39, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.763, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.933, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.027 0.027 96.673 96.673 qs_mol_dyn_low 1 2.0 0.004 0.004 96.249 96.252 qs_forces 11 3.9 0.002 0.002 96.204 96.204 qs_energies 11 4.9 0.001 0.001 84.963 84.964 scf_env_do_scf 11 5.9 0.001 0.001 64.256 64.256 velocity_verlet 10 3.0 0.001 0.002 62.848 62.864 scf_env_do_scf_inner_loop 96 6.5 0.005 0.008 53.864 53.864 rebuild_ks_matrix 107 8.3 0.001 0.001 25.229 25.229 qs_ks_build_kohn_sham_matrix 107 9.3 0.017 0.017 25.228 25.229 qs_ks_update_qs_env 107 7.6 0.001 0.001 22.757 22.757 dbcsr_multiply_generic 1966 12.4 0.123 0.125 22.544 22.614 qs_rho_update_rho_low 107 7.7 0.001 0.001 18.078 18.098 calculate_rho_elec 107 8.7 0.776 0.783 18.077 18.097 qs_scf_new_mos 96 7.5 0.001 0.001 17.869 17.878 qs_scf_loop_do_ot 96 8.5 0.001 0.001 17.869 17.877 ot_scf_mini 96 9.5 0.002 0.003 16.206 16.208 sum_up_and_integrate 107 10.3 0.002 0.002 14.156 14.229 integrate_v_rspace 107 11.3 0.319 0.320 14.075 14.148 fft_wrap_pw1pw2 1081 11.6 0.020 0.020 13.962 13.998 fft_wrap_pw1pw2_140 439 12.2 0.003 0.003 11.998 12.061 multiply_cannon 1966 13.4 0.282 0.286 11.396 11.521 multiply_cannon_loop 1966 14.4 0.217 0.219 10.414 10.433 init_scf_run 11 5.9 0.000 0.000 10.319 10.319 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.318 10.318 init_scf_loop 11 6.9 0.000 0.000 10.314 10.314 make_m2s 3932 13.4 0.036 0.037 9.705 9.828 make_images 3932 14.4 1.073 1.115 9.553 9.675 ot_mini 96 10.5 0.001 0.001 9.450 9.450 density_rs2pw 107 9.7 0.007 0.007 9.041 9.188 grid_integrate_task_list 107 12.3 8.448 8.523 8.448 8.523 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.360 8.360 grid_collocate_task_list 107 9.7 8.229 8.336 8.229 8.336 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.693 7.789 wfi_extrapolate 11 7.9 0.001 0.001 7.416 7.416 pw_gpu_r3dc1d_3d_ps 546 13.1 2.026 2.052 7.160 7.176 prepare_preconditioner 11 7.9 0.000 0.000 7.021 7.024 make_preconditioner 11 8.9 0.000 0.000 7.021 7.024 pw_gpu_c1dr3d_3d_ps 535 14.2 1.973 1.994 6.776 6.796 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.889 6.119 multiply_cannon_multrec 3932 15.4 1.911 1.935 5.858 5.894 hybrid_alltoall_any 4079 16.3 4.356 4.473 5.704 5.705 qs_ot_get_derivative 96 11.5 0.001 0.001 5.622 5.627 make_images_data 3932 15.4 0.046 0.046 5.573 5.575 parallel_gemm_fm_cosma 81 9.0 5.489 5.490 5.489 5.490 potential_pw2rs 107 12.3 0.032 0.033 5.308 5.308 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.088 4.262 build_core_ppl_forces 11 5.9 3.920 3.991 3.920 3.991 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.819 3.848 ot_diis_step 96 11.5 0.005 0.005 3.805 3.805 dbcsr_complete_redistribute 317 12.2 1.390 1.400 3.394 3.629 dbcsr_mm_accdrv_process 8450 16.1 0.780 0.858 3.594 3.597 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.445 3.446 qs_create_task_list 11 7.9 0.000 0.000 3.292 3.441 generate_qs_task_list 11 8.9 1.425 1.425 3.292 3.441 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.425 3.426 apply_single 107 13.6 0.001 0.001 3.424 3.426 calculate_dm_sparse 107 9.5 0.001 0.001 2.958 2.966 mp_alltoall_z22v 1081 15.6 2.787 2.896 2.787 2.896 multiply_cannon_sync_h2d 3932 15.4 2.793 2.825 2.793 2.825 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.002 2.802 2.803 qs_ot_get_p 107 10.4 0.001 0.001 2.738 2.740 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.700 2.725 mp_waitall_1 55487 16.8 2.511 2.663 2.511 2.663 calculate_first_density_matrix 1 7.0 0.000 0.000 2.448 2.448 jit_kernel_multiply 11 15.7 2.275 2.352 2.275 2.352 transfer_rs2pw 439 10.6 0.007 0.007 2.181 2.335 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.297 2.298 pw_poisson_solve 107 10.3 0.002 0.002 2.290 2.294 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.160 2.184 pw_gpu_fg 546 14.1 2.008 2.040 2.008 2.040 build_core_ppl 11 7.9 1.989 2.020 1.989 2.020 cp_fm_cholesky_invert 11 10.9 1.995 1.995 1.995 1.995 transfer_rs2pw_140 118 11.5 1.365 1.385 1.826 1.986 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64_nonortho", label="H2O-64_nonortho", y=96.673, yerr=0.0 Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=66.231, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.448, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.229, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.489, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.356, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.92, yerr=0.0 Running GW_PBE_4benzene.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.020 0.021 164.161 164.163 qs_energies 1 2.0 0.000 0.000 163.848 163.848 mp2_main 1 3.0 0.000 0.000 157.087 157.087 mp2_gpw_main 1 4.0 0.000 0.000 155.293 155.293 rpa_ri_compute_en 1 5.0 0.000 0.000 146.400 146.400 rpa_num_int 1 6.0 0.001 0.001 146.392 146.392 compute_mat_P_omega 1 7.0 0.001 0.002 66.837 66.838 parallel_gemm_fm_cosma 105 8.4 66.641 66.725 66.641 66.725 dbt_total 2336 9.6 0.021 0.021 66.696 66.696 compute_mat_P_omega_contract 10 8.0 5.143 5.169 66.159 66.177 dbt_contract 787 11.0 0.047 0.048 45.011 45.012 compute_W_cubic_GW 10 7.0 0.004 0.004 43.330 43.333 dbt_tas_total 1149 12.2 0.132 0.132 35.212 35.212 dbt_tas_multiply 807 12.1 0.003 0.003 34.548 34.548 dbt_tas_dbm 807 14.1 0.005 0.005 27.350 27.350 dbm_multiply 807 16.1 26.066 26.308 26.066 26.308 compute_mat_P_omega_calc_M_occ 250 9.0 5.135 5.146 23.444 23.444 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 22.213 22.214 dbt_copy 1107 10.7 0.069 0.070 21.816 22.105 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 21.902 21.904 dbt_tas_mm_1N 524 15.1 0.003 0.003 17.586 17.881 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 14.924 14.924 dbt_reshape 594 11.8 6.358 6.514 14.100 14.192 compute_QP_energies 1 7.0 0.000 0.000 11.738 11.738 compute_self_energy_cubic_gw 1 8.0 0.118 0.119 11.738 11.738 dbt_tas_reserve_blocks_index 3266 14.3 0.637 0.645 10.108 10.296 dbm_reserve_blocks 3634 15.3 9.799 9.997 9.799 9.997 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 8.883 8.883 dbt_crop 1042 12.0 6.468 6.535 8.715 8.843 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 8.816 8.816 dbt_reserve_blocks_index 2347 13.0 0.306 0.310 8.375 8.428 dbt_reserve_blocks_index_array 2289 12.1 0.010 0.010 8.171 8.244 dbt_tas_mm_2 251 15.0 0.002 0.002 7.567 7.567 scf_env_do_scf 1 3.0 0.000 0.000 6.224 6.224 scf_env_do_scf_inner_loop 17 4.0 0.001 0.001 6.224 6.224 mp_waitall_2 2656 15.9 5.685 5.701 5.685 5.701 contract_cubic_gw 21 9.0 0.000 0.000 5.400 5.400 dbt_communicate_buffer 594 12.8 0.012 0.012 5.192 5.214 dbcsr_multiply_generic 30 8.1 0.002 0.002 5.033 5.076 multiply_cannon 30 9.1 0.005 0.006 4.854 4.895 multiply_cannon_loop 30 10.1 0.004 0.004 4.801 4.842 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 4.770 4.778 compute_mat_P_omega_copy_M_occ 250 9.0 0.002 0.002 4.704 4.713 get_2c_integrals 1 6.0 0.000 0.000 4.705 4.706 dbt_tas_copy 511 11.5 2.415 2.479 4.259 4.451 multiply_cannon_multrec 60 11.1 0.150 0.154 4.223 4.232 dbcsr_mm_accdrv_process 328 12.3 0.237 0.433 3.914 3.929 jit_kernel_multiply 18 11.6 3.671 3.852 3.671 3.852 qs_scf_new_mos 17 5.0 0.000 0.000 3.447 3.485 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="GW_PBE_4benzene", label="GW_PBE_4benzene", y=164.161, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_6cpu_1gpu", title="Timings of GW_PBE_4benzene with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="rest", label="rest", y=48.82899999999999, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=66.641, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=26.066, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=9.799, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_crop", label="dbt_crop", y=6.468, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=6.358, yerr=0.0 Running RI-HFX_H2O-32.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.020 0.022 186.789 186.789 qs_forces 1 2.0 0.000 0.000 186.328 186.329 rebuild_ks_matrix 7 6.6 0.000 0.000 181.946 181.946 qs_ks_build_kohn_sham_matrix 7 7.6 0.002 0.002 181.946 181.946 hfx_ks_matrix 7 8.6 0.000 0.000 178.262 178.272 dbt_total 849 11.0 0.009 0.009 132.446 132.446 hfx_ri_update_ks 7 9.6 0.000 0.000 102.597 102.597 hfx_ri_update_ks_Pmat 7 10.6 21.118 21.178 102.592 102.592 qs_energies 1 3.0 0.000 0.000 98.265 98.265 scf_env_do_scf 1 4.0 0.000 0.000 96.094 96.095 qs_ks_update_qs_env 8 6.0 0.000 0.000 93.928 93.928 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 88.025 88.025 dbt_contract 207 12.4 0.047 0.047 77.998 77.998 hfx_ri_update_forces 1 7.0 1.044 1.045 75.663 75.673 dbt_tas_total 369 13.4 0.071 0.072 64.800 64.800 dbt_tas_multiply 216 13.5 0.001 0.001 62.193 62.193 scf_env_do_scf_inner_loop 6 5.0 0.000 0.001 51.324 51.324 dbt_copy 423 11.8 0.044 0.045 50.099 50.716 dbt_tas_dbm 216 15.5 0.002 0.002 49.722 49.723 dbm_multiply 216 17.5 46.720 46.854 46.720 46.854 init_scf_loop 2 5.0 0.000 0.000 44.769 44.769 hfx_ri_forces_Pmat_3c 1 8.0 3.294 3.309 44.199 44.203 dbt_reshape 175 13.2 17.328 17.380 37.865 38.031 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 29.877 29.877 precalc_derivatives 1 8.0 1.794 1.797 25.782 25.782 dbt_tas_mm_2 91 16.5 0.001 0.001 20.897 20.897 mp_waitall_2 1022 16.5 17.922 17.970 17.922 17.970 dbt_tas_reserve_blocks_index 1323 15.4 1.621 1.631 17.248 17.524 dbm_reserve_blocks 1491 16.3 16.305 16.578 16.305 16.578 dbt_tas_mm_3T 77 17.1 0.000 0.001 15.976 16.219 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 15.923 15.923 dbt_crop 372 13.7 12.094 12.132 15.684 15.708 dbt_communicate_buffer 175 14.2 0.004 0.004 14.941 14.971 dbt_reserve_blocks_index 889 14.5 0.581 0.584 14.219 14.256 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 14.026 14.026 dbt_reserve_blocks_index_array 859 13.5 0.007 0.007 13.941 13.973 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 13.611 13.611 build_3c_derivatives 3 9.0 2.139 2.166 13.577 13.578 dbt_tas_mm_3N 37 15.4 0.000 0.000 10.501 10.587 dbt_tas_copy 248 12.5 3.870 4.044 7.202 7.613 mp_sync 2901 12.8 6.202 6.602 6.202 6.602 hfx_ri_pre_scf_Pmat_int 1 13.0 0.000 0.000 5.214 5.214 hfx_ri_pre_scf_calc_tensors 1 14.0 0.003 0.003 4.439 4.439 dbt_tas_replicate 168 15.1 2.060 2.078 4.308 4.330 hfx_ri_pre_scf_Pmat_copy_2 9 13.0 1.685 1.702 4.266 4.283 hfx_ri_pre_scf_Pmat_RIx3C 9 13.0 0.000 0.000 3.863 3.904 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-HFX_H2O-32", label="RI-HFX_H2O-32", y=186.789, yerr=0.0 Plot: name="RI-HFX_H2O-32_timings_6cpu_1gpu", title="Timings of RI-HFX_H2O-32 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="rest", label="rest", y=67.39599999999999, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=46.72, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=21.118, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=17.922, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=17.328, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=16.305, yerr=0.0 Running RI-MP2_ammonia.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.010 0.011 103.489 103.489 qs_energies 1 2.0 0.000 0.000 103.307 103.307 mp2_main 1 3.0 0.000 0.000 95.639 95.639 mp2_gpw_main 1 4.0 0.001 0.001 95.269 95.269 mp2_ri_gpw_compute_in 1 5.0 0.546 0.550 53.361 53.365 mp2_ri_gpw_compute_in_loop 1 6.0 0.012 0.013 45.298 45.305 mp2_ri_gpw_compute_en 1 5.0 0.092 0.092 41.846 41.851 mp2_ri_gpw_compute_en_RI_loop 1 6.0 12.788 12.797 39.255 39.256 dbcsr_multiply_generic 2666 8.0 0.155 0.156 22.433 23.305 ao_to_mo_and_store_B_mult_1 1328 7.0 0.013 0.013 21.096 21.968 mp2_eri_3c_integrate_gpw 1328 7.0 0.016 0.016 18.469 19.372 mp2_ri_gpw_compute_en_expansio 1040 7.0 0.711 0.713 16.154 16.181 local_gemm 1040 8.0 15.443 15.473 15.443 15.473 make_m2s 5332 9.0 0.050 0.052 12.181 12.391 make_images 5332 10.0 2.278 2.290 12.004 12.209 integrate_v_rspace 1338 8.0 1.023 1.043 10.416 10.872 multiply_cannon 2666 9.0 0.386 0.413 9.612 10.700 multiply_cannon_loop 2666 10.0 0.184 0.184 8.451 9.424 grid_integrate_task_list 1338 9.0 8.132 8.569 8.132 8.569 hybrid_alltoall_any 6683 11.6 7.969 8.163 8.218 8.409 make_images_data 5332 11.0 0.061 0.062 8.130 8.325 fft_wrap_pw1pw2 26668 10.4 0.137 0.137 7.677 8.088 get_2c_integrals 1 6.0 0.004 0.004 7.508 7.516 collocate_function 1328 8.0 4.897 4.906 6.973 7.409 compute_2c_integrals 1 7.0 0.006 0.007 6.964 6.965 compute_2c_integrals_loop_lm 1 8.0 0.014 0.022 6.856 6.873 mp2_eri_2c_integrate_gpw 1 9.0 2.021 2.048 6.842 6.851 scf_env_do_scf 1 3.0 0.000 0.000 6.787 6.788 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 6.787 6.788 ao_to_mo_and_store_B_E_Ex_1 1328 7.0 3.584 3.617 5.446 5.475 qs_scf_new_mos 10 5.0 0.000 0.000 5.237 5.238 fft_wrap_pw1pw2_20 10647 11.4 0.020 0.021 4.466 4.890 mp2_ri_gpw_compute_en_ener 1040 7.0 4.739 4.769 4.739 4.769 multiply_cannon_multrec 2676 11.0 1.839 2.064 4.503 4.711 mp2_ri_gpw_compute_en_comm 221 7.0 1.008 1.008 4.437 4.439 pw_gpu_r3dc1d_3d 13282 12.2 3.849 4.295 3.849 4.295 eigensolver 11 5.8 0.001 0.001 2.986 2.987 pw_gpu_c1dr3d_3d 13280 12.7 2.643 2.679 2.643 2.679 potential_pw2rs 2666 10.0 0.096 0.096 2.638 2.677 multiply_cannon_sync_h2d 2676 11.0 1.813 2.559 1.813 2.559 dbcsr_mm_accdrv_process 5392 12.0 0.838 1.450 2.434 2.461 mp_sendrecv_dm3 442 8.0 2.429 2.436 2.429 2.436 cp_fm_diag_elpa 11 6.8 0.000 0.000 2.387 2.388 cp_fm_diag_elpa_base 11 7.8 2.309 2.324 2.386 2.386 fft_wrap_pw1pw2_10 15957 11.5 0.019 0.019 2.301 2.316 collocate_single_gaussian 1328 10.0 0.092 0.093 2.285 2.314 copy_dbcsr_to_fm 1351 8.0 0.032 0.032 2.256 2.262 replicate_iaK_2intgroup 1 6.0 2.080 2.080 2.219 2.220 mp2_eri_2c_integrate_gpw_pot_l 1328 10.0 0.004 0.004 2.163 2.203 fill_local_i_aL 884 7.5 2.138 2.151 2.138 2.151 jit_kernel_multiply 8 13.0 1.486 2.083 1.486 2.083 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-MP2_ammonia", label="RI-MP2_ammonia", y=103.489, yerr=0.0 Plot: name="RI-MP2_ammonia_timings_6cpu_1gpu", title="Timings of RI-MP2_ammonia with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="rest", label="rest", y=54.260000000000005, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="local_gemm", label="local_gemm", y=15.443, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=12.788, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.132, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=7.969, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="collocate_function", label="collocate_function", y=4.897, yerr=0.0 Running diag_cu144_broy.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/diag_cu144_broy_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.077 0.077 201.843 201.844 qs_energies 1 2.0 0.000 0.000 200.776 200.778 scf_env_do_scf 1 3.0 0.000 0.000 187.039 187.040 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 187.039 187.040 qs_ks_update_qs_env 15 5.0 0.000 0.000 93.729 93.782 rebuild_ks_matrix 15 6.0 0.000 0.000 93.528 93.581 qs_ks_build_kohn_sham_matrix 15 7.0 0.002 0.003 93.528 93.581 qs_vxc_create 15 8.0 0.000 0.000 57.577 57.596 qs_scf_new_mos 15 5.0 0.000 0.000 51.710 51.787 fft_wrap_pw1pw2 1086 10.0 0.027 0.029 49.758 49.864 calculate_dispersion_nonloc 15 9.0 10.773 10.795 49.549 49.572 eigensolver 15 6.0 0.002 0.002 42.452 42.465 qs_rho_update_rho_low 16 5.0 0.000 0.000 39.817 39.818 calculate_rho_elec 16 6.0 0.177 0.177 39.817 39.818 sum_up_and_integrate 15 8.0 0.000 0.000 34.507 34.584 integrate_v_rspace 15 9.0 0.046 0.046 34.482 34.560 grid_collocate_task_list 16 7.0 28.504 28.555 28.504 28.555 grid_integrate_task_list 15 10.0 27.636 27.690 27.636 27.690 pw_gpu_c1dr3d_3d_ps 585 12.1 5.561 5.622 26.062 26.100 cp_fm_diag_elpa 15 7.0 0.000 0.000 25.759 25.764 cp_fm_diag_elpa_base 15 8.0 23.959 24.524 25.752 25.752 fft_wrap_pw1pw2_150 765 11.0 0.004 0.005 25.486 25.566 pw_gpu_r3dc1d_3d_ps 501 11.9 4.634 4.802 23.663 23.805 cp_fm_cholesky_restore 45 7.0 14.782 15.494 14.782 15.494 fft_wrap_pw1pw2_200 197 11.3 0.001 0.001 12.166 12.170 density_rs2pw 16 7.0 0.001 0.001 11.121 11.174 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 9.896 9.896 vdW_energy 15 10.0 9.385 9.445 9.385 9.445 pw_gpu_ffc 585 13.1 8.962 9.059 8.962 9.059 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 8.520 8.570 pw_gpu_cff 501 12.9 8.468 8.475 8.468 8.475 xc_vxc_pw_create 15 9.0 0.178 0.180 8.027 8.030 mp_alltoall_z22v 1086 14.0 6.623 7.112 6.623 7.112 pw_gpu_sf 585 13.1 6.925 6.940 6.925 6.940 potential_pw2rs 15 10.0 0.006 0.007 6.800 6.824 pw_gpu_fg 501 12.9 6.653 6.676 6.653 6.676 copy_dbcsr_to_fm 16 5.9 0.001 0.001 6.494 6.618 dbcsr_complete_redistribute 46 8.3 1.768 1.771 5.573 5.642 fft_wrap_pw1pw2_10 62 10.5 0.000 0.000 5.446 5.447 cp_fm_uplo_to_full 30 8.0 3.702 4.960 3.702 4.960 build_core_ppnl 1 5.0 4.800 4.843 4.800 4.843 xc_rho_set_and_dset_create 15 10.0 0.129 0.130 4.719 4.744 x_to_yz 585 13.1 0.966 0.984 4.580 4.715 xc_pw_derive 90 11.0 0.001 0.001 4.622 4.653 yz_to_x 501 12.9 0.843 0.854 3.852 4.177 gspace_mixing 14 5.0 0.129 0.129 4.062 4.062 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="diag_cu144_broy", label="diag_cu144_broy", y=201.843, yerr=0.0 Plot: name="diag_cu144_broy_timings_6cpu_1gpu", title="Timings of diag_cu144_broy with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="rest", label="rest", y=96.189, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=28.504, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=27.636, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=23.959, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=14.782, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="calculate_dispersion_nonloc", label="calculate_dispersion_nonloc", y=10.773, yerr=0.0 Running bench_dftb.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/bench_dftb_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.042 0.043 269.026 269.026 qs_energies 1 2.0 0.000 0.000 268.884 268.885 ls_scf 1 3.0 0.000 0.000 268.004 268.004 ls_scf_main 1 4.0 0.001 0.001 258.231 258.232 density_matrix_trs4 11 5.0 0.008 0.008 214.637 214.714 dbcsr_multiply_generic 185 6.1 0.323 0.325 174.577 174.634 multiply_cannon 185 7.1 1.960 2.209 120.803 120.962 multiply_cannon_loop 185 8.1 0.330 0.333 106.647 106.907 multiply_cannon_multrec 370 9.1 81.941 82.211 91.279 91.494 make_m2s 370 7.1 0.030 0.030 45.445 45.623 make_images 370 8.1 11.168 11.660 44.404 44.579 ls_scf_dm_to_ks 11 5.0 0.000 0.000 39.124 39.218 matrix_ls_to_qs 11 6.0 0.000 0.000 36.099 36.257 dbcsr_complete_redistribute 23 7.5 22.231 22.393 30.567 30.751 matrix_decluster 11 7.0 0.000 0.000 27.712 27.890 arnoldi_extremal 12 6.1 0.000 0.000 24.060 24.060 arnoldi_normal_ev 12 7.1 0.009 0.009 24.059 24.060 build_subspace 23 8.1 0.062 0.062 23.539 23.540 dbcsr_matrix_vector_mult 652 9.0 0.159 0.161 22.078 22.352 dbcsr_matrix_vector_mult_local 652 10.0 21.079 21.355 21.086 21.361 make_images_data 370 9.1 0.012 0.012 17.211 17.613 hybrid_alltoall_any 393 9.9 11.784 12.130 16.725 17.134 calculate_norms 740 9.1 14.413 14.458 14.413 14.458 dbcsr_finalize 559 7.6 0.214 0.223 14.064 14.382 dbcsr_merge_all 510 8.6 2.531 2.848 12.855 13.178 dbcsr_copy 761 7.5 1.608 1.618 10.015 10.025 dbcsr_special_finalize 555 9.1 0.010 0.010 9.405 9.421 setup_rec_index_2d 370 8.1 9.224 9.234 9.224 9.234 dbcsr_sort_indices 1283 10.0 8.816 8.818 8.816 8.818 dbcsr_add_d 280 6.0 0.001 0.001 8.433 8.777 dbcsr_add_anytype 280 7.0 3.748 3.759 8.432 8.776 dbcsr_dot 144 6.3 7.911 8.149 8.599 8.624 dbcsr_copy_into_existing 11 8.0 8.386 8.406 8.386 8.406 ls_scf_init_scf 1 4.0 0.000 0.000 8.306 8.310 ls_scf_init_matrix_S 1 5.0 0.000 0.000 7.851 7.852 dbcsr_mm_accdrv_process 14501 10.0 0.769 0.783 7.333 7.387 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.000 7.075 7.075 tree_to_linear_d 23 10.5 6.869 6.878 6.869 6.878 dbcsr_mm_accdrv_process_sort 14501 11.0 6.564 6.604 6.564 6.604 dbcsr_merge_single_wm 370 10.1 0.554 0.560 6.064 6.082 mp_waitall_1 5192 10.5 5.238 5.906 5.238 5.906 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="bench_dftb", label="bench_dftb", y=269.026, yerr=0.0 Plot: name="bench_dftb_timings_6cpu_1gpu", title="Timings of bench_dftb with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="rest", label="rest", y=117.578, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=81.941, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=22.231, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=21.079, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="calculate_norms", label="calculate_norms", y=14.413, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=11.784, yerr=0.0 Running dbcsr.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/dbcsr_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.004 0.005 47.320 47.321 lib_test 1 2.0 0.000 0.000 47.313 47.315 dbcsr_run_tests 3 3.0 0.000 0.000 47.313 47.313 test_multiplies_multiproc 3 4.0 0.001 0.001 36.562 36.596 dbcsr_multiply_generic 9 5.0 0.002 0.002 28.488 28.489 multiply_cannon 9 6.0 0.357 0.357 18.758 19.311 multiply_cannon_loop 9 7.0 0.003 0.003 17.369 17.800 multiply_cannon_multrec 18 8.0 9.243 9.649 16.160 16.588 dbcsr_make_random_matrix 9 4.0 7.455 7.484 10.623 10.658 dbcsr_finalize 27 5.7 0.001 0.001 7.199 7.200 dbcsr_merge_all 18 6.5 3.557 3.571 7.082 7.091 dbcsr_mm_accdrv_process 8199 9.0 1.401 1.519 6.684 6.697 dbcsr_redistribute 9 5.0 3.460 3.463 5.597 5.613 make_m2s 18 6.0 0.001 0.001 4.881 4.888 make_images 18 7.0 0.362 0.366 4.847 4.852 dbcsr_mm_accdrv_process_sort 8199 10.0 4.518 4.528 4.518 4.528 make_images_data 18 8.0 0.001 0.001 2.806 2.816 hybrid_alltoall_any 18 9.0 2.421 2.429 2.775 2.785 mp_alltoall_d11v 27 6.0 1.880 1.890 1.880 1.890 tree_to_linear_d 9 7.0 1.810 1.817 1.810 1.817 dbcsr_data_copy_aa2 18 7.5 1.582 1.584 1.582 1.584 dbcsr_data_release 507 7.7 1.338 1.338 1.338 1.338 mp_sum_l 61 4.9 0.575 1.130 0.575 1.130 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.574 1.129 dbcsr_data_new 354 7.4 0.972 1.092 0.972 1.092 dbcsr_checksum 6 5.0 1.022 1.022 1.026 1.026 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="dbcsr", label="dbcsr", y=47.32, yerr=0.0 Plot: name="dbcsr_timings_6cpu_1gpu", title="Timings of dbcsr with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="rest", label="rest", y=19.087, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=9.243, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=7.455, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_mm_accdrv_process_sort", label="dbcsr_mm_accdrv_process_sort", y=4.518, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.557, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_redistribute", label="dbcsr_redistribute", y=3.46, yerr=0.0 Running MQAE_single_node.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/MQAE_single_node_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.047 0.047 200.503 200.504 qs_mol_dyn_low 1 2.0 0.004 0.004 198.974 199.013 qs_forces 6 3.8 0.001 0.001 122.079 122.080 qs_energies 6 4.8 0.000 0.000 115.261 115.262 scf_env_do_scf 6 5.8 0.000 0.000 107.645 107.645 scf_env_do_scf_inner_loop 113 6.2 0.005 0.008 99.800 99.801 velocity_verlet 5 3.0 0.003 0.003 96.528 96.578 rebuild_ks_matrix 119 8.1 0.000 0.000 81.506 81.508 qs_ks_build_kohn_sham_matrix 119 9.1 0.018 0.019 81.505 81.508 qs_ks_update_qs_env 119 7.3 0.001 0.001 76.912 76.914 fft_wrap_pw1pw2 2059 12.4 0.043 0.044 64.265 64.323 fft_wrap_pw1pw2_150 1321 13.9 0.009 0.009 61.483 61.574 qs_vxc_create 119 10.1 0.002 0.002 52.212 52.214 xc_vxc_pw_create 119 11.1 1.507 1.511 52.211 52.212 qmmm_el_coupling 6 3.8 0.000 0.000 40.343 40.345 qmmm_elec_with_gaussian 6 4.8 0.019 0.020 40.337 40.339 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 39.053 39.110 xc_pw_derive 714 13.1 0.009 0.009 36.034 36.074 pw_gpu_c1dr3d_3d_ps 1095 14.8 10.366 10.507 34.703 34.721 qmmm_elec_gaussian_low_G 6 6.8 34.334 34.366 34.334 34.366 qmmm_forces 6 3.8 0.001 0.001 33.625 33.625 qmmm_forces_with_gaussian 6 4.8 0.022 0.022 32.940 33.284 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 31.642 31.986 pw_gpu_r3dc1d_3d_ps 964 14.0 9.120 9.263 29.506 29.549 qmmm_forces_gaussian_low_G 6 6.8 26.644 26.990 26.644 26.990 xc_rho_set_and_dset_create 119 12.1 2.415 2.425 26.360 26.374 xc_pw_divergence 119 12.1 0.005 0.005 23.973 23.995 qs_rho_update_rho_low 119 7.3 0.001 0.001 21.784 21.836 calculate_rho_elec 119 8.3 1.091 1.093 21.783 21.836 density_rs2pw 119 9.3 0.007 0.007 15.652 15.759 dbcsr_multiply_generic 2598 12.3 0.094 0.095 13.742 13.750 sum_up_and_integrate 119 10.1 0.002 0.002 13.606 13.667 integrate_v_rspace 119 11.1 0.023 0.024 13.435 13.496 mp_alltoall_z22v 2059 16.4 12.711 12.973 12.711 12.973 multiply_cannon 2598 13.3 0.219 0.224 12.140 12.190 multiply_cannon_loop 2598 14.3 0.244 0.247 11.676 11.725 multiply_cannon_multrec 5196 15.3 4.052 4.141 9.528 9.692 potential_pw2rs 119 12.1 0.033 0.033 9.529 9.530 x_to_yz 1095 15.8 2.126 2.132 9.091 9.221 pw_gpu_sf 1095 15.8 8.774 8.775 8.774 8.775 qs_ks_ddapc 119 10.1 0.002 0.002 8.735 8.757 pw_gpu_fg 964 15.0 7.860 7.947 7.860 7.947 init_scf_loop 6 6.8 0.000 0.000 7.841 7.841 yz_to_x 964 15.0 1.705 1.725 7.451 7.559 qs_scf_new_mos 113 7.2 0.001 0.001 7.231 7.232 qs_scf_loop_do_ot 113 8.2 0.001 0.001 7.231 7.231 ot_scf_mini 113 9.2 0.002 0.002 6.947 6.952 pw_gpu_ffc 1095 15.8 6.454 6.484 6.454 6.484 dbcsr_mm_accdrv_process 13992 16.0 0.527 0.533 5.412 5.484 init_scf_run 6 5.8 0.000 0.000 5.465 5.465 scf_env_initial_rho_setup 6 6.8 0.000 0.000 5.464 5.465 xc_functional_eval 238 13.1 0.003 0.003 5.178 5.186 grid_collocate_task_list 119 9.3 5.014 5.082 5.014 5.082 pw_gpu_cff 964 15.0 5.008 5.016 5.008 5.016 qmmm_forces_gaussian_low_R 6 6.8 0.000 0.000 4.998 4.999 qmmm_forces_with_gaussian_LG 6 7.8 4.998 4.998 4.998 4.998 ot_mini 113 10.2 0.001 0.001 4.954 4.957 jit_kernel_multiply 24 14.7 4.838 4.914 4.838 4.914 qmmm_elec_gaussian_low_R 6 6.8 0.000 0.000 4.719 4.744 qmmm_elec_with_gaussian_LG 6 7.8 4.719 4.744 4.719 4.744 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 4.624 4.625 pw_poisson_solve 125 9.9 0.003 0.003 4.564 4.566 qs_ot_get_derivative 113 11.2 0.001 0.001 4.057 4.060 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="MQAE_single_node", label="MQAE_single_node", y=200.503, yerr=0.0 Plot: name="MQAE_single_node_timings_6cpu_1gpu", title="Timings of MQAE_single_node with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="rest", label="rest", y=107.32799999999999, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=34.334, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=26.644, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=12.711, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.366, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=9.12, yerr=0.0 Summary: Performance test took 23 minutes. Status: OK ---> Removed intermediate container 9986076c1530 ---> eb39b64efa51 Step 45/46 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in d598596dd218 ---> Removed intermediate container d598596dd218 ---> d15129d0849b Step 46/46 : ENTRYPOINT [] ---> Running in cc9ab69b5f2e ---> Removed intermediate container cc9ab69b5f2e ---> 3dddb823b6a5 [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built 3dddb823b6a5 Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2026-03-22 06:48:36+00:00