StartDate: 2025-12-31 06:06:08+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: b13274d47d598e90adb4491b9276edbba0c455fa CommitTime: 2025-12-30 22:31:56 +0100 CommitAuthor: Matthias Krack CommitSubject: Add patch for Spack/CMake build of greenX library #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=b13274d47d598e90adb4491b9276edbba0c455fa SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 408.7MB Step 1/47 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 2cd52cbb1ebe: Waiting 02cb0e091e33: Waiting 6cb9b761b877: Waiting 9c3d619183d2: Waiting 5a2aba542b08: Waiting 6e8af4fd0a07: Waiting 15a17189b2df: Waiting 7f7602a82106: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 2cd52cbb1ebe: Verifying Checksum 2cd52cbb1ebe: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 6e8af4fd0a07: Verifying Checksum 6e8af4fd0a07: Download complete 02cb0e091e33: Verifying Checksum 02cb0e091e33: Download complete 9c3d619183d2: Verifying Checksum 9c3d619183d2: Download complete 7f7602a82106: Verifying Checksum 7f7602a82106: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/47 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/47 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/47 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/47 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/47 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/47 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> f843eeab6072 Step 8/47 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> 896c2903221b Step 9/47 : RUN mkdir scripts ---> Using cache ---> ced8e2638937 Step 10/47 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./scripts/ ---> Using cache ---> 907a94a49441 Step 11/47 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> c0c7e82d5124 Step 12/47 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --gpu-ver=V100 --with-tblite=no --dry-run ---> Using cache ---> bb2eb5db55cb Step 13/47 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> a5914e3cde19 Step 14/47 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> e77118c94fff Step 15/47 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 5aae1c541792 Step 16/47 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> d3a5bdbf1166 Step 17/47 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> 155f71b8d97b Step 18/47 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> 57c476bfc1fc Step 19/47 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> 7c65b017159b Step 20/47 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> f1733109c7fb Step 21/47 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> 447b1c5c9c85 Step 22/47 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> 2a58cdb44ab6 Step 23/47 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> Using cache ---> bc3cf5dd4afd Step 24/47 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Using cache ---> b72397fbd2d2 Step 25/47 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> Using cache ---> 9bb4789dcafa Step 26/47 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Using cache ---> 96b975dbd327 Step 27/47 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> Using cache ---> 60f99aeea0c7 Step 28/47 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Using cache ---> a4d611b6abcc Step 29/47 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> Using cache ---> 6fb18b87860a Step 30/47 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Using cache ---> 27459c31b170 Step 31/47 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> Using cache ---> 663206bda380 Step 32/47 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Using cache ---> c2da0e825835 Step 33/47 : WORKDIR /opt/cp2k ---> Using cache ---> d9790be57eb3 Step 34/47 : COPY ./src ./src ---> Using cache ---> 3b9567879456 Step 35/47 : COPY ./data ./data ---> Using cache ---> ebf7801f5106 Step 36/47 : COPY ./tests ./tests ---> Using cache ---> a5f0adf7cc80 Step 37/47 : COPY ./tools/build_utils ./tools/build_utils ---> Using cache ---> 142beaf28404 Step 38/47 : COPY ./cmake ./cmake ---> a789b20b4a67 Step 39/47 : COPY ./CMakeLists.txt . ---> 4af643c30639 Step 40/47 : COPY ./tools/docker/scripts/build_cp2k.sh . ---> 8053ad767920 Step 41/47 : RUN ./build_cp2k.sh toolchain_cuda_V100 psmp ---> Running in 8973633329d3 ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpi.so (found version "4.1") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpicxx.so (found version "4.1") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpifort.so (found version "4.1") -- Found MPI: TRUE (found version "4.1") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "4.1") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.30 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.30/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Checking for module 'libxsmm-shared' -- Found libxsmm-shared, version 1.17.0 -- Checking for module 'libxsmmf-shared' -- Found libxsmmf-shared, version 1.17.0 -- Checking for module 'libxsmmext-shared' -- Found libxsmmext-shared, version 1.17.0 -- Checking for module 'libxsmmnoblas-shared' -- Found libxsmmnoblas-shared, version 1.17.0 -- Found LibXSMM: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include -- Using LIBXSMM for Small Matrix Multiplication -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a -- CP2K_WITH_GPU is deprecated in favor of CMAKE_HIP_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 70 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "4.1") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "1.14.6") found components: C Fortran -- Found MPI: TRUE (found version "4.1") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Checking for module 'libint2' -- Found libint2, version 2.6.0 -- Found Libint2: /opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include;/opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include/libint2 -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.5.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "1.14.6") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'libvdwxc>=0.3.0' -- Looking for vdwxc_init_mpi -- Looking for vdwxc_init_mpi - not found -- Found LibVDWXC: /opt/cp2k-toolchain/install/libvdwxc-0.4.0/lib/libvdwxc.a (Required is at least version "0.3.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - vendor: OpenBLAS - include directories: /opt/cp2k-toolchain/install/openblas-0.3.30/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so - LAPACK - include directories: /opt/cp2k-toolchain/install/openblas-0.3.30/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so - MPI - include directories: /opt/cp2k-toolchain/install/mpich-4.3.2/include - libraries: /opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-4.3.2/lib/libmpi.so - MPI_F08: ON - ScaLAPACK - vendor: auto - include directories: - libraries: /opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a - Hardware Acceleration: - CUDA: - GPU architecture number: 70 - GPU profiling enabled: - GPU accelerated modules - PW module: ON - GRID module: ON - DBM module: ON - LibXC - version: 7.0.0 - include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - version: 1.14.6 - include directories: /opt/cp2k-toolchain/install/hdf5-1.14.6/include - libraries: hdf5-shared - FFTW3 - include directories: /opt/cp2k-toolchain/install/fftw-3.3.10/include - libraries: /opt/cp2k-toolchain/install/fftw-3.3.10/lib/libfftw3.a - LIBXSMM - include directories: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include - libraries: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmext.so;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so;/opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmf.so;:libxsmmext.a;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so - SpLA - include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - SIRIUS - include directories: - libraries: - COSMA - include directories: /opt/cp2k-toolchain/install/COSMA-2.7.0/include - libraries: MPI::MPI_CXX;costa::costa;$;$;cosma::BLAS::blas;cosma::scalapack::scalapack - Libint2 - include directories: /opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include;/opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include/libint2 - libraries: /opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/lib/libint2.a - ELPA - include directories: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/include/elpa_openmp-2024.05.001 - libraries: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a;:libopenblas.a -------------------------------------------------------------------- - - - List of dependencies not included in this build - - - -------------------------------------------------------------------- - DFTD4 - DeePMD - PEXSI - ACE (libpace) - TBLITE - Spglib - LibSMEAGOL - MiMiC - DLA-Future - PLUMED - Libvori - LibTorch - TREXIO - GreenX To run the regtests you need to run the following commands cd .. export CP2K_DATA_DIR=/opt/cp2k/data/ ./tests/do_regtest.py /opt/cp2k/build/bin psmp -- Configuring done (11.9s) -- Generating done (0.5s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container 8973633329d3 ---> f9e89ab9a530 Step 42/47 : COPY ./benchmarks ./benchmarks ---> 60ba38ef0ab6 Step 43/47 : COPY ./tools/regtesting ./tools/regtesting ---> 87b3a022ca2a Step 44/47 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> b818637baa40 Step 45/47 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in b657ccc05d48 ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma xsmm dbcsr_acc sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 77 input files and 0 errors. ========== Running Performance Test ========== Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.027 0.028 102.481 102.482 qs_mol_dyn_low 1 2.0 0.004 0.005 102.065 102.069 qs_forces 11 3.9 0.002 0.002 102.018 102.019 qs_energies 11 4.9 0.001 0.001 90.902 90.903 scf_env_do_scf 11 5.9 0.001 0.001 70.291 70.292 velocity_verlet 10 3.0 0.001 0.001 65.395 65.411 scf_env_do_scf_inner_loop 108 6.5 0.006 0.008 60.149 60.150 rebuild_ks_matrix 119 8.3 0.001 0.001 26.548 26.552 qs_ks_build_kohn_sham_matrix 119 9.3 0.020 0.020 26.547 26.551 dbcsr_multiply_generic 2286 12.5 0.140 0.141 25.316 25.351 qs_ks_update_qs_env 119 7.6 0.001 0.001 24.345 24.348 qs_scf_new_mos 108 7.5 0.001 0.001 20.361 20.367 qs_scf_loop_do_ot 108 8.5 0.001 0.001 20.360 20.366 qs_rho_update_rho_low 119 7.7 0.001 0.001 20.168 20.185 calculate_rho_elec 119 8.7 0.863 0.868 20.167 20.184 ot_scf_mini 108 9.5 0.003 0.003 18.431 18.432 fft_wrap_pw1pw2 1201 11.6 0.023 0.023 15.829 15.876 sum_up_and_integrate 119 10.3 0.003 0.003 13.923 13.960 integrate_v_rspace 119 11.3 0.352 0.355 13.828 13.865 fft_wrap_pw1pw2_140 487 12.2 0.003 0.003 13.627 13.709 multiply_cannon 2286 13.5 0.322 0.329 12.501 12.516 multiply_cannon_loop 2286 14.5 0.262 0.266 11.455 11.455 make_m2s 4572 13.5 0.042 0.043 11.206 11.227 make_images 4572 14.5 1.501 1.513 11.032 11.051 ot_mini 108 10.5 0.001 0.001 10.718 10.720 density_rs2pw 119 9.7 0.008 0.008 10.273 10.383 init_scf_run 11 5.9 0.000 0.000 10.275 10.275 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.274 10.274 init_scf_loop 11 6.9 0.000 0.000 10.070 10.070 grid_collocate_task_list 119 9.7 8.995 9.062 8.995 9.062 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.222 8.222 pw_gpu_r3dc1d_3d_ps 606 13.1 2.336 2.371 8.124 8.146 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.687 7.815 pw_gpu_c1dr3d_3d_ps 595 14.2 2.274 2.297 7.676 7.701 grid_integrate_task_list 119 12.3 7.425 7.464 7.425 7.464 wfi_extrapolate 11 7.9 0.001 0.001 7.445 7.445 prepare_preconditioner 11 7.9 0.000 0.000 6.836 6.841 make_preconditioner 11 8.9 0.000 0.000 6.836 6.841 qs_ot_get_derivative 108 11.5 0.002 0.002 6.465 6.465 multiply_cannon_multrec 4572 15.5 2.121 2.166 6.191 6.228 hybrid_alltoall_any 4725 16.4 4.788 4.804 6.097 6.116 potential_pw2rs 119 12.3 0.037 0.037 6.051 6.052 make_images_data 4572 15.5 0.053 0.054 6.012 6.018 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.771 6.009 parallel_gemm_fm_cosma 81 9.0 5.543 5.544 5.543 5.544 ot_diis_step 108 11.5 0.006 0.006 4.228 4.228 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.905 4.034 build_core_ppl_forces 11 5.9 3.890 3.979 3.890 3.979 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.882 3.901 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.682 3.683 apply_single 119 13.6 0.001 0.001 3.682 3.683 dbcsr_mm_accdrv_process 9594 16.2 0.823 0.908 3.650 3.659 dbcsr_complete_redistribute 329 12.2 1.213 1.240 3.073 3.323 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.296 3.296 multiply_cannon_sync_h2d 4572 15.5 3.225 3.269 3.225 3.269 mp_alltoall_z22v 1201 15.6 3.110 3.234 3.110 3.234 calculate_dm_sparse 119 9.5 0.001 0.001 3.228 3.233 qs_create_task_list 11 7.9 0.000 0.000 3.036 3.182 generate_qs_task_list 11 8.9 1.139 1.151 3.036 3.182 qs_ot_get_p 119 10.4 0.001 0.001 3.087 3.090 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.799 2.800 mp_waitall_1 64495 16.9 2.716 2.738 2.716 2.738 pw_poisson_solve 119 10.3 0.003 0.003 2.650 2.653 transfer_rs2pw 487 10.6 0.008 0.008 2.510 2.607 calculate_first_density_matrix 1 7.0 0.000 0.000 2.396 2.397 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.359 2.375 pw_gpu_fg 606 14.1 2.324 2.365 2.324 2.365 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.326 2.327 jit_kernel_multiply 11 15.7 2.230 2.311 2.230 2.311 transfer_rs2pw_140 130 11.5 1.571 1.579 2.107 2.212 qs_ot_get_derivative_taylor 59 13.0 0.003 0.003 2.161 2.162 cp_fm_cholesky_invert 11 10.9 2.114 2.114 2.114 2.114 yz_to_x 606 14.1 0.454 0.458 2.033 2.102 dbcsr_special_finalize 6858 15.5 0.042 0.043 2.073 2.078 ------------------------------------------------------------------------------- Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=71.84, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.995, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.425, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.543, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.788, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.89, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.028 0.028 98.363 98.364 qs_mol_dyn_low 1 2.0 0.004 0.004 97.931 97.933 qs_forces 11 3.9 0.002 0.002 97.887 97.888 qs_energies 11 4.9 0.001 0.001 86.614 86.616 scf_env_do_scf 11 5.9 0.001 0.001 65.950 65.950 velocity_verlet 10 3.0 0.001 0.002 64.026 64.042 scf_env_do_scf_inner_loop 96 6.5 0.005 0.007 55.448 55.448 rebuild_ks_matrix 107 8.3 0.001 0.001 25.914 25.917 qs_ks_build_kohn_sham_matrix 107 9.3 0.018 0.018 25.913 25.916 dbcsr_multiply_generic 1966 12.4 0.123 0.125 23.381 23.477 qs_ks_update_qs_env 107 7.6 0.001 0.001 23.378 23.381 qs_rho_update_rho_low 107 7.7 0.001 0.001 18.603 18.620 calculate_rho_elec 107 8.7 0.777 0.782 18.602 18.619 qs_scf_new_mos 96 7.5 0.001 0.001 18.378 18.383 qs_scf_loop_do_ot 96 8.5 0.001 0.001 18.377 18.382 ot_scf_mini 96 9.5 0.003 0.003 16.672 16.673 sum_up_and_integrate 107 10.3 0.002 0.002 14.422 14.512 fft_wrap_pw1pw2 1081 11.6 0.020 0.021 14.408 14.444 integrate_v_rspace 107 11.3 0.317 0.319 14.335 14.426 fft_wrap_pw1pw2_140 439 12.2 0.003 0.003 12.379 12.454 multiply_cannon 1966 13.4 0.285 0.287 11.473 11.572 make_m2s 3932 13.4 0.037 0.038 10.425 10.545 multiply_cannon_loop 1966 14.4 0.229 0.233 10.477 10.486 init_scf_loop 11 6.9 0.000 0.000 10.430 10.430 make_images 3932 14.4 1.378 1.440 10.272 10.390 init_scf_run 11 5.9 0.000 0.000 10.306 10.306 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.306 10.306 ot_mini 96 10.5 0.001 0.001 9.694 9.696 density_rs2pw 107 9.7 0.007 0.007 9.357 9.484 grid_integrate_task_list 107 12.3 8.547 8.639 8.547 8.639 grid_collocate_task_list 107 9.7 8.434 8.522 8.434 8.522 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.287 8.287 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.648 7.753 wfi_extrapolate 11 7.9 0.002 0.002 7.548 7.549 pw_gpu_r3dc1d_3d_ps 546 13.1 2.141 2.190 7.430 7.447 prepare_preconditioner 11 7.9 0.000 0.000 7.054 7.055 make_preconditioner 11 8.9 0.000 0.000 7.054 7.055 pw_gpu_c1dr3d_3d_ps 535 14.2 2.043 2.066 6.952 6.971 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.937 6.174 hybrid_alltoall_any 4079 16.3 4.424 4.536 5.790 5.803 qs_ot_get_derivative 96 11.5 0.001 0.001 5.782 5.783 multiply_cannon_multrec 3932 15.4 1.850 1.881 5.710 5.736 make_images_data 3932 15.4 0.046 0.047 5.652 5.657 parallel_gemm_fm_cosma 81 9.0 5.560 5.561 5.560 5.561 potential_pw2rs 107 12.3 0.033 0.034 5.470 5.471 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.061 4.199 build_core_ppl_forces 11 5.9 3.871 3.955 3.871 3.955 ot_diis_step 96 11.5 0.005 0.005 3.889 3.889 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.807 3.835 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.517 3.519 apply_single 107 13.6 0.001 0.001 3.517 3.519 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.514 3.515 dbcsr_complete_redistribute 317 12.2 1.244 1.248 3.257 3.503 dbcsr_mm_accdrv_process 8450 16.1 0.659 0.862 3.490 3.496 qs_create_task_list 11 7.9 0.000 0.000 3.267 3.377 generate_qs_task_list 11 8.9 1.423 1.427 3.267 3.377 multiply_cannon_sync_h2d 3932 15.4 2.951 3.034 2.951 3.034 calculate_dm_sparse 107 9.5 0.001 0.001 2.970 2.973 mp_alltoall_z22v 1081 15.6 2.831 2.964 2.831 2.964 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.780 2.781 mp_waitall_1 55487 16.8 2.571 2.750 2.571 2.750 qs_ot_get_p 107 10.4 0.001 0.001 2.700 2.701 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.563 2.593 jit_kernel_multiply 12 15.7 2.293 2.493 2.293 2.493 pw_poisson_solve 107 10.3 0.003 0.003 2.377 2.383 transfer_rs2pw 439 10.6 0.007 0.007 2.226 2.356 calculate_first_density_matrix 1 7.0 0.000 0.000 2.327 2.327 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.311 2.312 pw_gpu_fg 546 14.1 2.115 2.149 2.115 2.149 cp_fm_cholesky_invert 11 10.9 2.110 2.110 2.110 2.110 transfer_dbcsr_to_fm 11 10.9 0.002 0.002 2.061 2.088 transfer_rs2pw_140 118 11.5 1.421 1.441 1.864 2.004 build_core_ppl 11 7.9 1.946 1.974 1.946 1.974 ------------------------------------------------------------------------------- Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=67.527, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.547, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.434, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.56, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.424, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.871, yerr=0.0 Running GW_PBE_4benzene.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.018 0.020 168.550 168.552 qs_energies 1 2.0 0.000 0.000 168.224 168.225 mp2_main 1 3.0 0.000 0.000 161.491 161.491 mp2_gpw_main 1 4.0 0.000 0.000 159.726 159.726 rpa_ri_compute_en 1 5.0 0.000 0.000 150.063 150.064 rpa_num_int 1 6.0 0.001 0.001 150.055 150.055 parallel_gemm_fm_cosma 105 8.4 69.900 69.905 69.900 69.905 compute_mat_P_omega 1 7.0 0.001 0.002 66.827 66.828 dbt_total 2336 9.6 0.020 0.021 66.753 66.754 compute_mat_P_omega_contract 10 8.0 5.088 5.105 66.168 66.173 compute_W_cubic_GW 10 7.0 0.004 0.004 45.522 45.523 dbt_contract 787 11.0 0.048 0.049 45.009 45.011 dbt_tas_total 1149 12.2 0.130 0.131 35.367 35.368 dbt_tas_multiply 807 12.1 0.003 0.003 34.695 34.695 dbt_tas_dbm 807 14.1 0.005 0.005 27.391 27.391 dbm_multiply 807 16.1 26.313 26.743 26.313 26.743 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 23.515 23.516 compute_mat_P_omega_calc_M_occ 250 9.0 5.071 5.093 23.379 23.379 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 23.202 23.203 dbt_copy 1107 10.7 0.071 0.072 21.938 21.956 dbt_tas_mm_1N 524 15.1 0.003 0.003 17.802 18.268 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 14.871 14.872 dbt_reshape 594 11.8 6.357 6.483 14.144 14.187 compute_QP_energies 1 7.0 0.000 0.000 11.915 11.915 compute_self_energy_cubic_gw 1 8.0 0.128 0.131 11.915 11.915 dbt_tas_reserve_blocks_index 3266 14.3 0.623 0.628 10.193 10.283 dbm_reserve_blocks 3634 15.3 9.898 9.992 9.898 9.992 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 9.653 9.653 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 8.874 8.874 dbt_reserve_blocks_index 2347 13.0 0.305 0.307 8.508 8.647 dbt_crop 1042 12.0 6.348 6.375 8.549 8.583 dbt_reserve_blocks_index_array 2289 12.1 0.011 0.011 8.309 8.466 dbt_tas_mm_2 251 15.0 0.002 0.003 7.618 7.618 scf_env_do_scf 1 3.0 0.000 0.000 6.203 6.203 scf_env_do_scf_inner_loop 17 4.0 0.001 0.001 6.203 6.203 mp_waitall_2 2656 15.9 5.746 5.768 5.746 5.768 get_2c_integrals 1 6.0 0.000 0.000 5.474 5.474 contract_cubic_gw 21 9.0 0.000 0.000 5.359 5.359 dbt_communicate_buffer 594 12.8 0.011 0.012 5.246 5.264 dbcsr_multiply_generic 30 8.1 0.003 0.003 5.069 5.095 multiply_cannon 30 9.1 0.012 0.013 4.875 4.898 multiply_cannon_loop 30 10.1 0.004 0.004 4.822 4.846 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 4.793 4.794 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.002 4.676 4.687 dbt_tas_copy 511 11.5 2.438 2.443 4.231 4.290 multiply_cannon_multrec 60 11.1 0.170 0.179 4.231 4.272 dbcsr_mm_accdrv_process 328 12.3 0.040 0.041 3.869 3.907 jit_kernel_multiply 18 11.7 3.823 3.860 3.823 3.860 qs_scf_new_mos 17 5.0 0.000 0.000 3.379 3.405 ------------------------------------------------------------------------------- Plot: name="GW_PBE_4benzene_timings_6cpu_1gpu", title="Timings of GW_PBE_4benzene with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="rest", label="rest", y=49.73400000000001, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=69.9, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=26.313, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=9.898, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=6.357, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_crop", label="dbt_crop", y=6.348, yerr=0.0 Running RI-HFX_H2O-32.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.025 0.028 187.978 187.978 qs_forces 1 2.0 0.000 0.000 187.510 187.510 rebuild_ks_matrix 7 6.6 0.000 0.000 183.090 183.091 qs_ks_build_kohn_sham_matrix 7 7.6 0.002 0.002 183.090 183.091 hfx_ks_matrix 7 8.6 0.000 0.000 179.351 179.351 dbt_total 849 11.0 0.009 0.009 133.556 133.556 hfx_ri_update_ks 7 9.6 0.000 0.000 102.307 102.307 hfx_ri_update_ks_Pmat 7 10.6 21.236 21.242 102.301 102.302 qs_energies 1 3.0 0.000 0.000 97.844 97.844 scf_env_do_scf 1 4.0 0.000 0.000 95.701 95.702 qs_ks_update_qs_env 8 6.0 0.000 0.000 93.471 93.471 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 89.626 89.627 dbt_contract 207 12.4 0.047 0.047 78.515 78.515 hfx_ri_update_forces 1 7.0 1.043 1.051 77.042 77.042 dbt_tas_total 369 13.4 0.072 0.073 65.222 65.222 dbt_tas_multiply 216 13.5 0.001 0.001 62.567 62.567 scf_env_do_scf_inner_loop 6 5.0 0.000 0.001 51.792 51.792 dbt_copy 423 11.8 0.045 0.045 50.536 51.267 dbt_tas_dbm 216 15.5 0.002 0.002 49.727 49.727 dbm_multiply 216 17.5 47.068 47.383 47.068 47.383 hfx_ri_forces_Pmat_3c 1 8.0 3.154 3.186 44.898 44.900 init_scf_loop 2 5.0 0.000 0.000 43.908 43.908 dbt_reshape 175 13.2 17.301 17.460 37.895 38.166 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 29.804 29.804 precalc_derivatives 1 8.0 1.852 1.884 26.452 26.452 dbt_tas_mm_2 91 16.5 0.001 0.001 20.783 20.783 dbt_tas_reserve_blocks_index 1323 15.4 1.587 1.591 17.593 18.057 mp_waitall_2 1022 16.5 17.863 17.866 17.863 17.866 dbm_reserve_blocks 1491 16.3 16.705 17.174 16.705 17.174 dbt_crop 372 13.7 12.166 12.218 15.908 15.937 dbt_tas_mm_3T 77 17.1 0.000 0.001 15.634 15.702 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 15.627 15.627 dbt_communicate_buffer 175 14.2 0.004 0.004 14.828 14.868 dbt_reserve_blocks_index 889 14.5 0.588 0.589 14.486 14.744 dbt_reserve_blocks_index_array 859 13.5 0.007 0.007 14.217 14.469 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 13.869 13.869 build_3c_derivatives 3 9.0 2.293 2.326 13.856 13.860 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 13.524 13.524 dbt_tas_mm_3N 37 15.4 0.000 0.000 11.152 11.212 dbt_tas_copy 248 12.5 3.951 4.129 7.374 7.751 mp_sync 2901 12.8 5.923 7.202 5.923 7.202 hfx_ri_pre_scf_Pmat_int 1 13.0 0.000 0.000 5.135 5.135 dbt_tas_replicate 168 15.1 2.150 2.182 4.498 4.501 hfx_ri_pre_scf_calc_tensors 1 14.0 0.004 0.004 4.424 4.428 hfx_ri_pre_scf_Pmat_copy_2 9 13.0 1.630 1.639 4.203 4.213 dbcsr_multiply_generic 155 10.8 0.007 0.007 3.778 3.786 ------------------------------------------------------------------------------- Plot: name="RI-HFX_H2O-32_timings_6cpu_1gpu", title="Timings of RI-HFX_H2O-32 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="rest", label="rest", y=67.805, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=47.068, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=21.236, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=17.863, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=17.301, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=16.705, yerr=0.0 Running RI-MP2_ammonia.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.010 106.489 106.490 qs_energies 1 2.0 0.000 0.000 106.295 106.295 mp2_main 1 3.0 0.000 0.000 98.769 98.769 mp2_gpw_main 1 4.0 0.001 0.001 98.389 98.389 mp2_ri_gpw_compute_in 1 5.0 0.536 0.538 55.058 55.070 mp2_ri_gpw_compute_in_loop 1 6.0 0.014 0.014 46.997 47.009 mp2_ri_gpw_compute_en 1 5.0 0.086 0.086 43.266 43.277 mp2_ri_gpw_compute_en_RI_loop 1 6.0 12.769 12.774 40.703 40.705 dbcsr_multiply_generic 2666 8.0 0.152 0.154 23.987 24.216 ao_to_mo_and_store_B_mult_1 1328 7.0 0.014 0.014 22.662 22.892 mp2_eri_3c_integrate_gpw 1328 7.0 0.017 0.018 18.773 18.968 mp2_ri_gpw_compute_en_expansio 1040 7.0 0.735 0.739 17.172 17.182 local_gemm 1040 8.0 16.437 16.444 16.437 16.444 make_m2s 5332 9.0 0.053 0.054 13.832 14.146 make_images 5332 10.0 3.453 3.460 13.649 13.964 integrate_v_rspace 1338 8.0 1.063 1.065 10.396 10.568 multiply_cannon 2666 9.0 0.378 0.386 9.512 9.592 hybrid_alltoall_any 6683 11.6 8.253 8.545 8.502 8.792 make_images_data 5332 11.0 0.062 0.062 8.422 8.714 multiply_cannon_loop 2666 10.0 0.194 0.197 8.429 8.501 fft_wrap_pw1pw2 26668 10.4 0.138 0.139 8.236 8.496 grid_integrate_task_list 1338 9.0 8.022 8.195 8.022 8.195 collocate_function 1328 8.0 4.856 4.943 7.241 7.598 get_2c_integrals 1 6.0 0.004 0.004 7.511 7.524 compute_2c_integrals 1 7.0 0.007 0.008 6.962 6.963 compute_2c_integrals_loop_lm 1 8.0 0.021 0.021 6.847 6.866 mp2_eri_2c_integrate_gpw 1 9.0 1.918 1.935 6.825 6.845 scf_env_do_scf 1 3.0 0.000 0.000 6.620 6.621 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 6.620 6.621 ao_to_mo_and_store_B_E_Ex_1 1328 7.0 3.479 3.501 5.289 5.312 qs_scf_new_mos 10 5.0 0.000 0.000 5.089 5.090 mp2_ri_gpw_compute_en_ener 1040 7.0 4.942 4.966 4.942 4.966 fft_wrap_pw1pw2_20 10647 11.4 0.020 0.020 4.788 4.964 mp2_ri_gpw_compute_en_comm 221 7.0 1.050 1.055 4.655 4.715 multiply_cannon_multrec 2676 11.0 1.916 1.932 4.580 4.598 pw_gpu_r3dc1d_3d 13282 12.2 4.243 4.475 4.243 4.475 eigensolver 11 5.8 0.001 0.001 2.920 2.923 pw_gpu_c1dr3d_3d 13280 12.7 2.788 2.816 2.788 2.816 potential_pw2rs 2666 10.0 0.095 0.097 2.733 2.787 mp_sendrecv_dm3 442 8.0 2.548 2.619 2.548 2.619 fft_wrap_pw1pw2_10 15957 11.5 0.018 0.018 2.527 2.611 dbcsr_mm_accdrv_process 5392 12.0 0.243 0.246 2.415 2.422 collocate_single_gaussian 1328 10.0 0.087 0.090 2.347 2.393 cp_fm_diag_elpa 11 6.8 0.000 0.000 2.349 2.350 cp_fm_diag_elpa_base 11 7.8 2.269 2.285 2.348 2.348 fill_local_i_aL 884 7.5 2.222 2.257 2.222 2.257 mp2_eri_2c_integrate_gpw_pot_l 1328 10.0 0.004 0.004 2.205 2.254 copy_dbcsr_to_fm 1351 8.0 0.032 0.033 2.193 2.198 replicate_iaK_2intgroup 1 6.0 2.038 2.054 2.177 2.191 ------------------------------------------------------------------------------- Plot: name="RI-MP2_ammonia_timings_6cpu_1gpu", title="Timings of RI-MP2_ammonia with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="rest", label="rest", y=56.066, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="local_gemm", label="local_gemm", y=16.437, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=12.769, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=8.253, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.022, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_ener", label="mp2_ri_gpw_compute_en_ener", y=4.942, yerr=0.0 Running diag_cu144_broy.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/diag_cu144_broy_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.079 0.080 207.788 207.788 qs_energies 1 2.0 0.000 0.000 206.711 206.712 scf_env_do_scf 1 3.0 0.000 0.000 193.285 193.285 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 193.284 193.285 qs_ks_update_qs_env 15 5.0 0.000 0.000 95.859 95.899 rebuild_ks_matrix 15 6.0 0.000 0.000 95.656 95.697 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 95.656 95.697 qs_vxc_create 15 8.0 0.119 0.121 58.828 58.831 qs_scf_new_mos 15 5.0 0.000 0.000 54.965 55.009 fft_wrap_pw1pw2 1086 10.0 0.028 0.030 51.643 51.651 calculate_dispersion_nonloc 15 9.0 10.944 10.945 50.554 50.560 eigensolver 15 6.0 0.002 0.002 46.042 46.174 qs_rho_update_rho_low 16 5.0 0.000 0.000 40.644 40.645 calculate_rho_elec 16 6.0 0.180 0.180 40.644 40.645 sum_up_and_integrate 15 8.0 0.000 0.000 35.319 35.364 integrate_v_rspace 15 9.0 0.047 0.047 35.294 35.338 cp_fm_diag_elpa 15 7.0 0.000 0.000 29.486 29.490 cp_fm_diag_elpa_base 15 8.0 27.786 28.292 29.481 29.482 grid_collocate_task_list 16 7.0 28.912 28.934 28.912 28.934 grid_integrate_task_list 15 10.0 27.970 28.010 27.970 28.010 pw_gpu_c1dr3d_3d_ps 585 12.1 5.731 5.770 26.663 26.707 fft_wrap_pw1pw2_150 765 11.0 0.005 0.005 26.443 26.446 pw_gpu_r3dc1d_3d_ps 501 11.9 5.559 5.703 24.945 24.982 cp_fm_cholesky_restore 45 7.0 14.757 15.411 14.757 15.411 fft_wrap_pw1pw2_200 197 11.3 0.001 0.001 13.080 13.104 density_rs2pw 16 7.0 0.002 0.002 11.545 11.568 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 9.554 9.554 vdW_energy 15 10.0 9.317 9.367 9.317 9.367 pw_gpu_ffc 585 13.1 9.129 9.139 9.129 9.139 pw_gpu_cff 501 12.9 8.455 8.461 8.455 8.461 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 8.223 8.252 xc_vxc_pw_create 15 9.0 0.178 0.179 8.155 8.162 potential_pw2rs 15 10.0 0.007 0.007 7.277 7.281 pw_gpu_sf 585 13.1 7.086 7.093 7.086 7.093 mp_alltoall_z22v 1086 14.0 6.753 7.044 6.753 7.044 pw_gpu_fg 501 12.9 6.915 6.997 6.915 6.997 copy_dbcsr_to_fm 16 5.9 0.001 0.001 5.947 5.953 dbcsr_complete_redistribute 46 8.3 1.657 1.691 5.433 5.551 fft_wrap_pw1pw2_10 62 10.5 0.000 0.000 5.483 5.483 xc_pw_derive 90 11.0 0.001 0.001 4.772 4.782 x_to_yz 585 13.1 1.029 1.041 4.683 4.782 xc_rho_set_and_dset_create 15 10.0 0.129 0.130 4.753 4.758 build_core_ppnl 1 5.0 4.638 4.659 4.638 4.659 cp_fm_uplo_to_full 30 8.0 3.491 4.514 3.491 4.514 ------------------------------------------------------------------------------- Plot: name="diag_cu144_broy_timings_6cpu_1gpu", title="Timings of diag_cu144_broy with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="rest", label="rest", y=97.41900000000001, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=28.912, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=27.97, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=27.786, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=14.757, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="calculate_dispersion_nonloc", label="calculate_dispersion_nonloc", y=10.944, yerr=0.0 Running bench_dftb.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/bench_dftb_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.048 0.053 269.906 269.911 qs_energies 1 2.0 0.000 0.000 269.781 269.782 ls_scf 1 3.0 0.000 0.000 268.913 268.915 ls_scf_main 1 4.0 0.001 0.001 258.741 258.743 density_matrix_trs4 11 5.0 0.008 0.008 218.246 218.251 dbcsr_multiply_generic 185 6.1 0.332 0.335 178.860 178.888 multiply_cannon 185 7.1 1.802 1.977 124.501 125.127 multiply_cannon_loop 185 8.1 0.346 0.349 110.188 110.458 multiply_cannon_multrec 370 9.1 83.955 84.108 93.624 93.808 make_m2s 370 7.1 0.028 0.028 46.086 46.178 make_images 370 8.1 12.072 12.340 45.008 45.098 ls_scf_dm_to_ks 11 5.0 0.000 0.000 36.860 36.861 matrix_ls_to_qs 11 6.0 0.000 0.000 33.862 34.130 dbcsr_complete_redistribute 23 7.5 19.727 20.029 28.074 28.397 matrix_decluster 11 7.0 0.000 0.000 25.776 26.084 arnoldi_extremal 12 6.1 0.000 0.000 23.854 23.857 arnoldi_normal_ev 12 7.1 0.009 0.009 23.853 23.857 build_subspace 23 8.1 0.063 0.063 23.372 23.372 dbcsr_matrix_vector_mult 652 9.0 0.158 0.159 21.856 21.993 dbcsr_matrix_vector_mult_local 652 10.0 20.846 20.982 20.854 20.991 make_images_data 370 9.1 0.013 0.013 16.631 16.707 hybrid_alltoall_any 393 9.9 11.614 11.803 16.110 16.184 calculate_norms 740 9.1 15.634 15.788 15.634 15.788 dbcsr_finalize 559 7.6 0.214 0.218 14.223 14.238 dbcsr_merge_all 510 8.6 2.631 2.655 13.042 13.042 dbcsr_copy 761 7.5 1.701 1.732 9.808 9.818 setup_rec_index_2d 370 8.1 9.300 9.363 9.300 9.363 dbcsr_special_finalize 555 9.1 0.011 0.011 9.259 9.266 dbcsr_sort_indices 1283 10.0 8.720 8.726 8.720 8.726 dbcsr_add_d 280 6.0 0.001 0.001 8.716 8.719 dbcsr_add_anytype 280 7.0 3.819 3.830 8.715 8.718 ls_scf_init_scf 1 4.0 0.000 0.000 8.640 8.644 dbcsr_dot 144 6.3 7.695 7.710 8.224 8.556 ls_scf_init_matrix_S 1 5.0 0.000 0.000 8.164 8.167 dbcsr_copy_into_existing 11 8.0 8.084 8.125 8.085 8.125 dbcsr_mm_accdrv_process 14501 10.0 0.757 0.862 7.649 7.700 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.000 7.435 7.437 tree_to_linear_d 23 10.5 6.953 6.981 6.953 6.981 dbcsr_mm_accdrv_process_sort 14501 11.0 6.811 6.838 6.811 6.838 dbcsr_merge_single_wm 370 10.1 0.562 0.566 6.008 6.018 make_images_pack 370 9.1 5.555 5.714 5.570 5.729 ------------------------------------------------------------------------------- Plot: name="bench_dftb_timings_6cpu_1gpu", title="Timings of bench_dftb with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="rest", label="rest", y=117.672, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=83.955, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=20.846, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=19.727, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="calculate_norms", label="calculate_norms", y=15.634, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="make_images", label="make_images", y=12.072, yerr=0.0 Running dbcsr.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/dbcsr_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.003 0.004 48.073 48.074 lib_test 1 2.0 0.000 0.000 48.061 48.068 dbcsr_run_tests 3 3.0 0.000 0.000 48.060 48.068 test_multiplies_multiproc 3 4.0 0.001 0.001 37.282 37.303 dbcsr_multiply_generic 9 5.0 0.002 0.002 29.282 29.291 multiply_cannon 9 6.0 0.019 0.019 19.055 19.912 multiply_cannon_loop 9 7.0 0.003 0.003 17.671 18.413 multiply_cannon_multrec 18 8.0 9.444 10.166 16.425 17.154 dbcsr_make_random_matrix 9 4.0 7.272 7.388 10.636 10.658 dbcsr_finalize 27 5.7 0.001 0.001 7.383 7.524 dbcsr_merge_all 18 6.5 3.548 3.549 7.271 7.409 dbcsr_mm_accdrv_process 8199 9.0 1.203 1.314 6.728 6.732 dbcsr_redistribute 9 5.0 3.415 3.419 5.538 5.543 make_m2s 18 6.0 0.001 0.001 5.103 5.111 make_images 18 7.0 0.396 0.397 5.067 5.076 dbcsr_mm_accdrv_process_sort 8199 10.0 4.541 4.549 4.541 4.549 make_images_data 18 8.0 0.001 0.001 2.819 2.820 hybrid_alltoall_any 18 9.0 2.423 2.424 2.778 2.779 dbcsr_data_copy_aa2 18 7.5 1.748 1.895 1.748 1.895 mp_alltoall_d11v 27 6.0 1.864 1.867 1.864 1.867 tree_to_linear_d 9 7.0 1.846 1.854 1.846 1.854 mp_sum_l 61 4.9 0.876 1.745 0.876 1.745 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.875 1.745 dbcsr_data_release 507 7.7 1.344 1.345 1.344 1.345 jit_kernel_multiply 6 10.0 0.984 1.084 0.984 1.084 dbcsr_data_new 354 7.4 0.956 1.077 0.956 1.077 dbcsr_checksum 6 5.0 0.995 1.007 1.007 1.007 ------------------------------------------------------------------------------- Plot: name="dbcsr_timings_6cpu_1gpu", title="Timings of dbcsr with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="rest", label="rest", y=19.852999999999998, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=9.444, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=7.272, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_mm_accdrv_process_sort", label="dbcsr_mm_accdrv_process_sort", y=4.541, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.548, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_redistribute", label="dbcsr_redistribute", y=3.415, yerr=0.0 Running MQAE_single_node.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/MQAE_single_node_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.049 0.052 198.959 198.961 qs_mol_dyn_low 1 2.0 0.004 0.004 197.410 197.442 qs_forces 6 3.8 0.001 0.001 124.633 124.634 qs_energies 6 4.8 0.000 0.000 117.739 117.740 scf_env_do_scf 6 5.8 0.000 0.000 110.140 110.141 scf_env_do_scf_inner_loop 113 6.2 0.005 0.007 102.269 102.270 velocity_verlet 5 3.0 0.003 0.003 93.549 93.595 rebuild_ks_matrix 119 8.1 0.000 0.000 83.174 83.179 qs_ks_build_kohn_sham_matrix 119 9.1 0.019 0.019 83.173 83.179 qs_ks_update_qs_env 119 7.3 0.001 0.001 78.474 78.478 fft_wrap_pw1pw2 2059 12.4 0.044 0.046 65.735 65.739 fft_wrap_pw1pw2_150 1321 13.9 0.009 0.009 62.981 63.056 qs_vxc_create 119 10.1 0.002 0.002 53.373 53.374 xc_vxc_pw_create 119 11.1 1.501 1.505 53.371 53.372 qmmm_el_coupling 6 3.8 0.000 0.000 37.106 37.108 qmmm_elec_with_gaussian 6 4.8 0.020 0.020 37.100 37.102 xc_pw_derive 714 13.1 0.010 0.011 37.012 37.031 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 35.663 36.016 pw_gpu_c1dr3d_3d_ps 1095 14.8 10.545 10.554 35.440 35.469 qmmm_forces 6 3.8 0.001 0.001 32.998 32.998 qmmm_forces_with_gaussian 6 4.8 0.023 0.023 32.237 32.588 qmmm_elec_gaussian_low_G 6 6.8 31.016 31.385 31.016 31.385 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 30.944 31.274 pw_gpu_r3dc1d_3d_ps 964 14.0 9.342 9.387 30.239 30.274 xc_rho_set_and_dset_create 119 12.1 2.412 2.415 26.862 26.895 qmmm_forces_gaussian_low_G 6 6.8 25.938 26.221 25.938 26.221 xc_pw_divergence 119 12.1 0.006 0.006 24.615 24.646 qs_rho_update_rho_low 119 7.3 0.001 0.001 22.188 22.254 calculate_rho_elec 119 8.3 1.073 1.075 22.187 22.253 density_rs2pw 119 9.3 0.008 0.008 16.008 16.139 sum_up_and_integrate 119 10.1 0.002 0.002 13.885 13.926 dbcsr_multiply_generic 2598 12.3 0.093 0.095 13.718 13.762 integrate_v_rspace 119 11.1 0.020 0.021 13.701 13.742 mp_alltoall_z22v 2059 16.4 12.752 12.940 12.752 12.940 multiply_cannon 2598 13.3 0.208 0.209 12.122 12.131 multiply_cannon_loop 2598 14.3 0.248 0.251 11.661 11.667 potential_pw2rs 119 12.1 0.034 0.034 9.738 9.738 multiply_cannon_multrec 5196 15.3 4.020 4.070 9.519 9.572 x_to_yz 1095 15.8 2.246 2.254 9.222 9.308 pw_gpu_sf 1095 15.8 8.901 8.913 8.901 8.913 qs_ks_ddapc 119 10.1 0.002 0.002 8.841 8.865 pw_gpu_fg 964 15.0 8.164 8.286 8.164 8.286 init_scf_loop 6 6.8 0.000 0.000 7.869 7.869 qs_scf_new_mos 113 7.2 0.001 0.001 7.698 7.704 qs_scf_loop_do_ot 113 8.2 0.001 0.001 7.697 7.703 yz_to_x 964 15.0 1.782 1.793 7.558 7.642 ot_scf_mini 113 9.2 0.002 0.002 7.412 7.414 pw_gpu_ffc 1095 15.8 6.754 6.791 6.754 6.791 init_scf_run 6 5.8 0.000 0.000 5.458 5.459 scf_env_initial_rho_setup 6 6.8 0.000 0.000 5.458 5.458 dbcsr_mm_accdrv_process 13992 16.0 2.957 3.016 5.433 5.434 xc_functional_eval 238 13.1 0.003 0.003 5.173 5.191 pw_gpu_cff 964 15.0 5.108 5.154 5.108 5.154 grid_collocate_task_list 119 9.3 5.079 5.139 5.079 5.139 qmmm_forces_gaussian_low_R 6 6.8 0.000 0.000 5.006 5.053 qmmm_forces_with_gaussian_LG 6 7.8 5.006 5.053 5.006 5.053 ot_mini 113 10.2 0.001 0.001 4.969 4.973 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 4.729 4.730 pw_poisson_solve 125 9.9 0.003 0.003 4.672 4.685 qmmm_elec_gaussian_low_R 6 6.8 0.000 0.000 4.648 4.664 qmmm_elec_with_gaussian_LG 6 7.8 4.647 4.664 4.647 4.664 qs_ot_get_derivative 113 11.2 0.001 0.001 4.059 4.065 pw_derive 1089 13.4 4.051 4.060 4.051 4.060 grid_integrate_task_list 119 12.1 3.942 3.984 3.942 3.984 ------------------------------------------------------------------------------- Plot: name="MQAE_single_node_timings_6cpu_1gpu", title="Timings of MQAE_single_node with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="rest", label="rest", y=109.366, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=31.016, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=25.938, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=12.752, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.545, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=9.342, yerr=0.0 Summary: Performance test took 23 minutes. Status: OK ---> Removed intermediate container b657ccc05d48 ---> 849b3307b9c8 Step 46/47 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in 6efb2cdf0c87 ---> Removed intermediate container 6efb2cdf0c87 ---> 8998707715bb Step 47/47 : ENTRYPOINT [] ---> Running in 704bbf75feed ---> Removed intermediate container 704bbf75feed ---> fbfbed589d9f [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built fbfbed589d9f Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2025-12-31 06:48:48+00:00