StartDate: 2025-10-25 19:36:09+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: 4a2d255c13bc8ad9b648b1139595965adae2170a CommitTime: 2025-10-25 21:10:23 +0200 CommitAuthor: Hans Pabst CommitSubject: GRID: leverage offload mempool #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=4a2d255c13bc8ad9b648b1139595965adae2170a SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 407.7MB Step 1/49 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 02cb0e091e33: Waiting 9c3d619183d2: Waiting 7f7602a82106: Waiting 5a2aba542b08: Waiting 6cb9b761b877: Waiting 6e8af4fd0a07: Waiting 2cd52cbb1ebe: Waiting 15a17189b2df: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 2cd52cbb1ebe: Verifying Checksum 2cd52cbb1ebe: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 6e8af4fd0a07: Download complete 02cb0e091e33: Verifying Checksum 02cb0e091e33: Download complete 9c3d619183d2: Verifying Checksum 9c3d619183d2: Download complete 7f7602a82106: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/49 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/49 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/49 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/49 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/49 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/49 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> f843eeab6072 Step 8/49 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> 896c2903221b Step 9/49 : RUN mkdir scripts ---> Using cache ---> ced8e2638937 Step 10/49 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./scripts/ ---> Using cache ---> 907a94a49441 Step 11/49 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> 51152901f729 Step 12/49 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --gpu-ver=V100 --dry-run ---> Using cache ---> 58d5eaad8ee8 Step 13/49 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> 131809d3ee1e Step 14/49 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> 26dc43f953a0 Step 15/49 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 6239454d6213 Step 16/49 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> 98e2abdf3158 Step 17/49 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> 260ff438dfbe Step 18/49 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> 0d3f9709058c Step 19/49 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> b1698a1ec0b0 Step 20/49 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> 5495f82582a5 Step 21/49 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> 9c4dff942110 Step 22/49 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> b5389b800873 Step 23/49 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> Using cache ---> d283ac2788ca Step 24/49 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Using cache ---> bdbf780f7388 Step 25/49 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> Using cache ---> a3b0d54eaf90 Step 26/49 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Using cache ---> 458ad60cf357 Step 27/49 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> Using cache ---> 60f1e7dba5bd Step 28/49 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Using cache ---> 6621530ba4ac Step 29/49 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> Using cache ---> 44fa31018581 Step 30/49 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Using cache ---> 4f14d1496f08 Step 31/49 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> Using cache ---> 8e11f2516bbd Step 32/49 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Using cache ---> 5ce7ec38a9df Step 33/49 : COPY ./tools/toolchain/scripts/arch_base.tmpl ./tools/toolchain/scripts/generate_arch_files.sh ./scripts/ ---> Using cache ---> 0618eb7051ef Step 34/49 : RUN ./scripts/generate_arch_files.sh && rm -rf ./build ---> Using cache ---> b71bf1e20391 Step 35/49 : WORKDIR /opt/cp2k ---> Using cache ---> c134e8ff807c Step 36/49 : COPY ./src ./src ---> 9db82ed05370 Step 37/49 : COPY ./data ./data ---> 976340a87c53 Step 38/49 : COPY ./tests ./tests ---> 1b0c10d29ad7 Step 39/49 : COPY ./tools/build_utils ./tools/build_utils ---> 3383bdc2ce11 Step 40/49 : COPY ./cmake ./cmake ---> 44bf4b633528 Step 41/49 : COPY ./CMakeLists.txt . ---> c260c1621777 Step 42/49 : COPY ./tools/docker/scripts/build_cp2k_cmake.sh . ---> 98a90964acbc Step 43/49 : RUN ./build_cp2k_cmake.sh toolchain_cuda_V100 psmp ---> Running in 121601465ed5 ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-4.3.1/lib/libmpi.so (found version "4.1") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-4.3.1/lib/libmpicxx.so (found version "4.1") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-4.3.1/lib/libmpifort.so (found version "4.1") -- Found MPI: TRUE (found version "4.1") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "4.1") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.30 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.30/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Checking for module 'libxsmm-shared' -- Found libxsmm-shared, version 1.17.0 -- Checking for module 'libxsmmf-shared' -- Found libxsmmf-shared, version 1.17.0 -- Checking for module 'libxsmmext-shared' -- Found libxsmmext-shared, version 1.17.0 -- Checking for module 'libxsmmnoblas-shared' -- Found libxsmmnoblas-shared, version 1.17.0 -- Found LibXSMM: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include -- Using LIBXSMM for Small Matrix Multiplication -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a -- CP2K_WITH_GPU is deprecated in favor of CMAKE_HIP_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 70 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "4.1") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "1.14.6") found components: C Fortran -- Found MPI: TRUE (found version "4.1") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Checking for module 'libint2' -- Found libint2, version 2.6.0 -- Found Libint2: /opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include;/opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include/libint2 -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.5.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "1.14.6") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'libvdwxc>=0.3.0' -- Looking for vdwxc_init_mpi -- Looking for vdwxc_init_mpi - not found -- Found LibVDWXC: /opt/cp2k-toolchain/install/libvdwxc-0.4.0/lib/libvdwxc.a (Required is at least version "0.3.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - vendor: OpenBLAS - include directories: /opt/cp2k-toolchain/install/openblas-0.3.30/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so - LAPACK - include directories: /opt/cp2k-toolchain/install/openblas-0.3.30/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so - MPI - include directories: /opt/cp2k-toolchain/install/mpich-4.3.1/include - libraries: /opt/cp2k-toolchain/install/mpich-4.3.1/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-4.3.1/lib/libmpi.so - MPI_F08: ON - ScaLAPACK - vendor: auto - include directories: - libraries: /opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a - Hardware Acceleration: - CUDA: - GPU architecture number: 70 - GPU profiling enabled: OFF - GPU accelerated modules - PW module: ON - GRID module: ON - DBM module: ON - LibXC - version: 7.0.0 - include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - version: 1.14.6 - include directories: /opt/cp2k-toolchain/install/hdf5-1.14.6/include - libraries: hdf5-shared - FFTW3 - include directories: /opt/cp2k-toolchain/install/fftw-3.3.10/include - libraries: /opt/cp2k-toolchain/install/fftw-3.3.10/lib/libfftw3.a - LIBXSMM - include directories: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include - libraries: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmext.so;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so;/opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmf.so;:libxsmmext.a;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so - SpLA - include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - SIRIUS - include directories: - libraries: - COSMA - include directories: /opt/cp2k-toolchain/install/COSMA-2.7.0/include - libraries: MPI::MPI_CXX;costa::costa;$;$;cosma::BLAS::blas;cosma::scalapack::scalapack - Libint2 - include directories: /opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include;/opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include/libint2 - libraries: /opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/lib/libint2.a - ELPA - include directories: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/include/elpa_openmp-2024.05.001 - libraries: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a;:libopenblas.a - GRPP -------------------------------------------------------------------- - - - List of dependencies not included in this build - - - -------------------------------------------------------------------- - DFTD4 - DeePMD - PEXSI - ACE (libpace) - TBLITE - Spglib - LibSMEAGOL - DLA-Future - PLUMED - Libvori - LibTorch - TREXIO - GreenX To run the regtests you need to run the following commands cd .. export CP2K_DATA_DIR=/opt/cp2k/data/ ./tests/do_regtest.py /opt/cp2k/build/bin psmp -- Configuring done (11.4s) -- Generating done (0.4s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container 121601465ed5 ---> 2c13ef9a52f9 Step 44/49 : COPY ./benchmarks ./benchmarks ---> e70e2fdcdb48 Step 45/49 : COPY ./tools/regtesting ./tools/regtesting ---> 70dcabfac068 Step 46/49 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> 2d2452eab79f Step 47/49 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in 6fa5bdbe37da ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc libgrpp elpa parallel scalapack mpi_f08 cosma xsmm dbcsr_acc sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 77 input files and 0 errors. ========== Running Performance Test ========== Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.026 0.027 98.990 98.990 qs_mol_dyn_low 1 2.0 0.004 0.004 98.598 98.600 qs_forces 11 3.9 0.002 0.002 98.551 98.551 qs_energies 11 4.9 0.001 0.001 87.690 87.691 scf_env_do_scf 11 5.9 0.001 0.001 68.159 68.159 velocity_verlet 10 3.0 0.001 0.002 63.212 63.228 scf_env_do_scf_inner_loop 108 6.5 0.006 0.008 58.342 58.342 rebuild_ks_matrix 119 8.3 0.001 0.001 26.150 26.154 qs_ks_build_kohn_sham_matrix 119 9.3 0.019 0.019 26.149 26.153 dbcsr_multiply_generic 2286 12.5 0.134 0.134 24.779 24.813 qs_ks_update_qs_env 119 7.6 0.001 0.001 23.956 23.957 qs_rho_update_rho_low 119 7.7 0.001 0.001 19.716 19.733 calculate_rho_elec 119 8.7 0.831 0.836 19.715 19.732 qs_scf_new_mos 108 7.5 0.001 0.001 19.316 19.320 qs_scf_loop_do_ot 108 8.5 0.001 0.001 19.315 19.319 ot_scf_mini 108 9.5 0.003 0.003 17.440 17.443 fft_wrap_pw1pw2 1201 11.6 0.023 0.024 15.373 15.414 sum_up_and_integrate 119 10.3 0.002 0.002 13.694 13.730 integrate_v_rspace 119 11.3 0.348 0.351 13.605 13.641 fft_wrap_pw1pw2_140 487 12.2 0.003 0.003 13.220 13.238 multiply_cannon 2286 13.5 0.319 0.328 12.185 12.204 multiply_cannon_loop 2286 14.5 0.243 0.246 11.152 11.153 make_m2s 4572 13.5 0.040 0.041 11.048 11.062 make_images 4572 14.5 1.473 1.492 10.879 10.893 ot_mini 108 10.5 0.001 0.001 10.483 10.486 density_rs2pw 119 9.7 0.007 0.007 9.935 10.020 init_scf_run 11 5.9 0.000 0.000 10.001 10.001 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.000 10.000 init_scf_loop 11 6.9 0.000 0.000 9.746 9.747 grid_collocate_task_list 119 9.7 8.919 8.970 8.919 8.970 pw_gpu_r3dc1d_3d_ps 606 13.1 2.254 2.263 7.838 7.850 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 7.609 7.609 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.483 7.594 pw_gpu_c1dr3d_3d_ps 595 14.2 2.212 2.219 7.505 7.558 grid_integrate_task_list 119 12.3 7.361 7.400 7.361 7.400 wfi_extrapolate 11 7.9 0.001 0.001 7.223 7.223 prepare_preconditioner 11 7.9 0.000 0.000 6.590 6.596 make_preconditioner 11 8.9 0.000 0.000 6.590 6.596 qs_ot_get_derivative 108 11.5 0.002 0.002 6.304 6.306 multiply_cannon_multrec 4572 15.5 2.124 2.170 6.077 6.125 hybrid_alltoall_any 4725 16.4 4.760 4.776 6.085 6.103 make_images_data 4572 15.5 0.050 0.051 5.949 5.957 potential_pw2rs 119 12.3 0.035 0.036 5.896 5.896 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.610 5.845 parallel_gemm_fm_cosma 81 9.0 5.207 5.207 5.207 5.207 ot_diis_step 108 11.5 0.005 0.005 4.155 4.155 build_core_ppl_forces 11 5.9 3.816 3.910 3.816 3.910 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.657 3.673 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.634 3.634 apply_single 119 13.6 0.001 0.001 3.634 3.634 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.609 3.633 dbcsr_mm_accdrv_process 9594 16.2 1.004 1.134 3.568 3.575 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.263 3.263 multiply_cannon_sync_h2d 4572 15.5 3.118 3.185 3.118 3.185 dbcsr_complete_redistribute 329 12.2 1.104 1.135 2.933 3.182 calculate_dm_sparse 119 9.5 0.001 0.001 3.151 3.152 mp_alltoall_z22v 1201 15.6 3.022 3.034 3.022 3.034 qs_create_task_list 11 7.9 0.000 0.000 2.830 2.870 generate_qs_task_list 11 8.9 1.101 1.110 2.830 2.870 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.751 2.751 mp_waitall_1 64495 16.9 2.650 2.665 2.650 2.665 pw_poisson_solve 119 10.3 0.003 0.003 2.657 2.661 transfer_rs2pw 487 10.6 0.008 0.008 2.335 2.417 qs_ot_get_p 119 10.4 0.001 0.001 2.401 2.402 calculate_first_density_matrix 1 7.0 0.000 0.000 2.351 2.352 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.299 2.312 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.292 2.294 pw_gpu_fg 606 14.1 2.181 2.203 2.181 2.203 jit_kernel_multiply 10 15.4 1.991 2.111 1.991 2.111 qs_ot_get_derivative_taylor 59 13.0 0.002 0.003 2.101 2.102 dbcsr_special_finalize 6858 15.5 0.038 0.038 2.036 2.050 cp_fm_cholesky_invert 11 10.9 2.033 2.033 2.033 2.033 transfer_rs2pw_140 130 11.5 1.504 1.506 1.941 2.025 yz_to_x 606 14.1 0.460 0.461 1.990 1.993 x_to_yz 595 15.2 0.485 0.490 1.977 1.982 ------------------------------------------------------------------------------- Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=68.92699999999999, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.919, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.361, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.207, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.76, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.816, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.026 0.028 94.909 94.911 qs_mol_dyn_low 1 2.0 0.004 0.004 94.502 94.503 qs_forces 11 3.9 0.002 0.002 94.456 94.457 qs_energies 11 4.9 0.001 0.001 83.443 83.444 scf_env_do_scf 11 5.9 0.001 0.001 63.459 63.460 velocity_verlet 10 3.0 0.001 0.002 61.733 61.750 scf_env_do_scf_inner_loop 96 6.5 0.005 0.007 53.344 53.345 rebuild_ks_matrix 107 8.3 0.001 0.001 25.437 25.440 qs_ks_build_kohn_sham_matrix 107 9.3 0.017 0.017 25.436 25.440 qs_ks_update_qs_env 107 7.6 0.001 0.001 22.923 22.927 dbcsr_multiply_generic 1966 12.4 0.117 0.119 22.506 22.624 qs_rho_update_rho_low 107 7.7 0.001 0.001 18.139 18.139 calculate_rho_elec 107 8.7 0.746 0.748 18.138 18.139 qs_scf_new_mos 96 7.5 0.001 0.001 17.135 17.158 qs_scf_loop_do_ot 96 8.5 0.001 0.001 17.134 17.157 ot_scf_mini 96 9.5 0.002 0.003 15.455 15.458 sum_up_and_integrate 107 10.3 0.002 0.002 14.209 14.300 integrate_v_rspace 107 11.3 0.309 0.310 14.130 14.220 fft_wrap_pw1pw2 1081 11.6 0.021 0.022 13.907 13.945 fft_wrap_pw1pw2_140 439 12.2 0.003 0.003 11.937 11.997 multiply_cannon 1966 13.4 0.279 0.282 11.144 11.160 multiply_cannon_loop 1966 14.4 0.208 0.213 10.256 10.288 init_scf_loop 11 6.9 0.000 0.000 10.044 10.044 init_scf_run 11 5.9 0.000 0.000 10.033 10.033 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.032 10.032 make_m2s 3932 13.4 0.036 0.036 9.922 9.945 make_images 3932 14.4 1.338 1.362 9.773 9.793 ot_mini 96 10.5 0.001 0.001 9.346 9.348 density_rs2pw 107 9.7 0.006 0.006 9.001 9.120 grid_integrate_task_list 107 12.3 8.491 8.582 8.491 8.582 grid_collocate_task_list 107 9.7 8.367 8.468 8.367 8.468 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.031 8.031 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.424 7.538 wfi_extrapolate 11 7.9 0.001 0.002 7.301 7.301 pw_gpu_r3dc1d_3d_ps 546 13.1 2.030 2.055 7.108 7.123 pw_gpu_c1dr3d_3d_ps 535 14.2 1.975 2.001 6.772 6.796 prepare_preconditioner 11 7.9 0.000 0.000 6.707 6.717 make_preconditioner 11 8.9 0.000 0.000 6.707 6.717 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.686 5.918 multiply_cannon_multrec 3932 15.4 1.902 1.911 5.683 5.695 qs_ot_get_derivative 96 11.5 0.001 0.001 5.647 5.652 hybrid_alltoall_any 4079 16.3 4.260 4.277 5.467 5.501 make_images_data 3932 15.4 0.043 0.044 5.342 5.351 potential_pw2rs 107 12.3 0.032 0.033 5.329 5.330 parallel_gemm_fm_cosma 81 9.0 5.270 5.271 5.270 5.271 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.931 4.076 build_core_ppl_forces 11 5.9 3.782 3.879 3.782 3.879 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.693 3.741 ot_diis_step 96 11.5 0.005 0.005 3.677 3.677 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.472 3.472 dbcsr_mm_accdrv_process 8450 16.1 0.958 1.235 3.434 3.439 dbcsr_complete_redistribute 317 12.2 1.111 1.207 3.108 3.341 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.287 3.291 apply_single 107 13.6 0.001 0.001 3.287 3.291 qs_create_task_list 11 7.9 0.000 0.000 3.127 3.221 generate_qs_task_list 11 8.9 1.371 1.381 3.126 3.221 calculate_dm_sparse 107 9.5 0.001 0.001 2.937 2.960 multiply_cannon_sync_h2d 3932 15.4 2.843 2.880 2.843 2.880 mp_alltoall_z22v 1081 15.6 2.760 2.858 2.760 2.858 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.708 2.709 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.417 2.441 mp_waitall_1 55487 16.8 2.379 2.435 2.379 2.435 pw_poisson_solve 107 10.3 0.002 0.002 2.377 2.380 transfer_rs2pw 439 10.6 0.007 0.007 2.189 2.339 calculate_first_density_matrix 1 7.0 0.000 0.000 2.309 2.309 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.246 2.248 jit_kernel_multiply 11 15.4 1.960 2.234 1.960 2.234 qs_ot_get_p 107 10.4 0.001 0.001 2.068 2.071 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 1.988 2.010 pw_gpu_fg 546 14.1 1.977 2.003 1.977 2.003 transfer_rs2pw_140 118 11.5 1.360 1.372 1.830 1.987 cp_fm_cholesky_invert 11 10.9 1.983 1.983 1.983 1.983 build_core_ppl 11 7.9 1.894 1.930 1.894 1.930 ------------------------------------------------------------------------------- Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=64.739, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.491, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.367, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.27, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.26, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.782, yerr=0.0 Running GW_PBE_4benzene.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.017 0.019 161.798 161.800 qs_energies 1 2.0 0.000 0.000 161.493 161.493 mp2_main 1 3.0 0.000 0.000 155.259 155.259 mp2_gpw_main 1 4.0 0.000 0.000 153.523 153.523 rpa_ri_compute_en 1 5.0 0.000 0.000 144.088 144.089 rpa_num_int 1 6.0 0.001 0.001 144.080 144.080 compute_mat_P_omega 1 7.0 0.001 0.002 66.015 66.015 dbt_total 2336 9.6 0.021 0.021 65.663 65.664 compute_mat_P_omega_contract 10 8.0 5.013 5.068 65.363 65.374 parallel_gemm_fm_cosma 105 8.4 65.110 65.118 65.110 65.118 dbt_contract 787 11.0 0.047 0.048 44.296 44.297 compute_W_cubic_GW 10 7.0 0.003 0.004 42.182 42.184 dbt_tas_total 1149 12.2 0.133 0.133 34.757 34.757 dbt_tas_multiply 807 12.1 0.002 0.003 34.083 34.083 dbt_tas_dbm 807 14.1 0.006 0.006 26.767 26.768 dbm_multiply 807 16.1 25.638 25.935 25.638 25.935 compute_mat_P_omega_calc_M_occ 250 9.0 5.025 5.089 23.217 23.217 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 22.076 22.076 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 21.780 21.780 dbt_copy 1107 10.7 0.070 0.071 21.603 21.695 dbt_tas_mm_1N 524 15.1 0.003 0.003 17.252 17.621 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 14.539 14.539 dbt_reshape 594 11.8 6.207 6.310 13.876 13.934 compute_QP_energies 1 7.0 0.000 0.000 11.587 11.587 compute_self_energy_cubic_gw 1 8.0 0.112 0.113 11.587 11.587 dbt_tas_reserve_blocks_index 3266 14.3 0.616 0.629 10.348 10.385 dbm_reserve_blocks 3634 15.3 10.054 10.105 10.054 10.105 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 9.424 9.424 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 8.728 8.728 dbt_reserve_blocks_index 2347 13.0 0.313 0.320 8.558 8.593 dbt_crop 1042 12.0 6.293 6.308 8.485 8.486 dbt_reserve_blocks_index_array 2289 12.1 0.011 0.011 8.362 8.418 dbt_tas_mm_2 251 15.0 0.003 0.003 7.484 7.484 scf_env_do_scf 1 3.0 0.000 0.000 5.708 5.708 scf_env_do_scf_inner_loop 17 4.0 0.001 0.001 5.708 5.708 mp_waitall_2 2656 15.9 5.633 5.642 5.633 5.642 contract_cubic_gw 21 9.0 0.000 0.000 5.452 5.452 get_2c_integrals 1 6.0 0.000 0.000 5.326 5.326 dbt_communicate_buffer 594 12.8 0.011 0.011 5.149 5.153 dbcsr_multiply_generic 30 8.1 0.003 0.003 5.002 5.026 multiply_cannon 30 9.1 0.009 0.013 4.810 4.832 multiply_cannon_loop 30 10.1 0.004 0.004 4.756 4.778 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 4.768 4.769 compute_mat_P_omega_copy_M_occ 250 9.0 0.002 0.002 4.562 4.563 dbt_tas_copy 511 11.5 2.476 2.504 4.386 4.406 multiply_cannon_multrec 60 11.1 0.160 0.171 4.181 4.208 dbcsr_mm_accdrv_process 328 12.3 0.041 0.041 3.841 3.863 jit_kernel_multiply 18 11.7 3.794 3.817 3.794 3.817 ------------------------------------------------------------------------------- Plot: name="GW_PBE_4benzene_timings_6cpu_1gpu", title="Timings of GW_PBE_4benzene with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="rest", label="rest", y=48.495999999999995, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=65.11, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=25.638, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=10.054, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_crop", label="dbt_crop", y=6.293, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=6.207, yerr=0.0 Running RI-HFX_H2O-32.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.021 0.022 183.925 183.927 qs_forces 1 2.0 0.000 0.000 183.495 183.497 rebuild_ks_matrix 7 6.6 0.000 0.000 179.225 179.227 qs_ks_build_kohn_sham_matrix 7 7.6 0.002 0.002 179.225 179.227 hfx_ks_matrix 7 8.6 0.000 0.000 175.597 175.599 dbt_total 849 11.0 0.008 0.008 131.153 131.154 hfx_ri_update_ks 7 9.6 0.000 0.000 99.720 99.720 hfx_ri_update_ks_Pmat 7 10.6 20.261 20.284 99.715 99.715 qs_energies 1 3.0 0.000 0.000 95.568 95.569 scf_env_do_scf 1 4.0 0.000 0.000 93.474 93.475 qs_ks_update_qs_env 8 6.0 0.000 0.000 91.343 91.343 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 87.889 87.890 dbt_contract 207 12.4 0.046 0.046 77.662 77.662 hfx_ri_update_forces 1 7.0 0.960 0.970 75.876 75.876 dbt_tas_total 369 13.4 0.071 0.071 64.732 64.732 dbt_tas_multiply 216 13.5 0.001 0.001 62.208 62.208 scf_env_do_scf_inner_loop 6 5.0 0.000 0.001 50.162 50.162 dbt_copy 423 11.8 0.044 0.044 49.134 49.354 dbt_tas_dbm 216 15.5 0.002 0.002 49.331 49.332 dbm_multiply 216 17.5 46.136 46.243 46.136 46.243 hfx_ri_forces_Pmat_3c 1 8.0 3.228 3.254 45.212 45.216 init_scf_loop 2 5.0 0.000 0.000 43.311 43.311 dbt_reshape 175 13.2 16.887 16.951 37.071 37.299 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 28.814 28.814 precalc_derivatives 1 8.0 1.728 1.731 25.158 25.158 dbt_tas_mm_2 91 16.5 0.001 0.001 20.458 20.458 mp_waitall_2 1022 16.5 17.608 17.648 17.608 17.648 dbt_tas_reserve_blocks_index 1323 15.4 1.540 1.544 16.906 17.184 dbm_reserve_blocks 1491 16.3 16.039 16.320 16.039 16.320 dbt_crop 372 13.7 11.801 11.821 15.435 15.479 dbt_tas_mm_3T 77 17.1 0.001 0.001 15.256 15.325 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 15.304 15.304 dbt_communicate_buffer 175 14.2 0.004 0.004 14.631 14.680 dbt_reserve_blocks_index 889 14.5 0.582 0.584 13.950 14.011 dbt_reserve_blocks_index_array 859 13.5 0.007 0.007 13.688 13.743 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 13.665 13.665 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 13.321 13.321 build_3c_derivatives 3 9.0 2.154 2.210 13.278 13.280 dbt_tas_mm_3N 37 15.4 0.000 0.000 11.018 11.098 dbt_tas_copy 248 12.5 3.882 4.013 7.152 7.628 mp_sync 2901 12.8 6.721 7.004 6.721 7.004 hfx_ri_pre_scf_Pmat_int 1 13.0 0.000 0.000 4.879 4.879 dbt_tas_replicate 168 15.1 2.408 2.411 4.692 4.715 hfx_ri_pre_scf_calc_tensors 1 14.0 0.003 0.003 4.202 4.205 hfx_ri_pre_scf_Pmat_copy_2 9 13.0 1.582 1.592 4.129 4.139 hfx_ri_pre_scf_Pmat_RIx3C 9 13.0 0.000 0.000 3.815 3.854 ------------------------------------------------------------------------------- Plot: name="RI-HFX_H2O-32_timings_6cpu_1gpu", title="Timings of RI-HFX_H2O-32 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="rest", label="rest", y=66.994, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=46.136, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=20.261, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=17.608, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=16.887, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=16.039, yerr=0.0 Running RI-MP2_ammonia.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.010 102.014 102.016 qs_energies 1 2.0 0.000 0.000 101.841 101.842 mp2_main 1 3.0 0.000 0.000 95.150 95.151 mp2_gpw_main 1 4.0 0.001 0.001 94.820 94.821 mp2_ri_gpw_compute_in 1 5.0 0.537 0.540 53.048 53.055 mp2_ri_gpw_compute_in_loop 1 6.0 0.013 0.013 45.113 45.122 mp2_ri_gpw_compute_en 1 5.0 0.082 0.082 41.714 41.718 mp2_ri_gpw_compute_en_RI_loop 1 6.0 12.490 12.501 39.212 39.214 dbcsr_multiply_generic 2666 8.0 0.144 0.148 22.467 22.635 ao_to_mo_and_store_B_mult_1 1328 7.0 0.012 0.013 21.208 21.376 mp2_eri_3c_integrate_gpw 1328 7.0 0.017 0.018 18.226 18.440 mp2_ri_gpw_compute_en_expansio 1040 7.0 0.702 0.710 16.553 16.600 local_gemm 1040 8.0 15.850 15.906 15.850 15.906 make_m2s 5332 9.0 0.046 0.047 12.893 13.224 make_images 5332 10.0 3.214 3.260 12.726 13.051 integrate_v_rspace 1338 8.0 1.019 1.027 10.316 10.780 multiply_cannon 2666 9.0 0.359 0.373 8.960 9.463 grid_integrate_task_list 1338 9.0 8.063 8.516 8.063 8.516 multiply_cannon_loop 2666 10.0 0.176 0.178 7.942 8.438 hybrid_alltoall_any 6683 11.6 7.665 7.918 7.911 8.165 make_images_data 5332 11.0 0.058 0.060 7.823 8.083 fft_wrap_pw1pw2 26668 10.4 0.132 0.133 7.616 7.854 get_2c_integrals 1 6.0 0.003 0.004 7.394 7.398 collocate_function 1328 8.0 4.716 4.722 6.779 7.031 compute_2c_integrals 1 7.0 0.007 0.007 6.768 6.768 compute_2c_integrals_loop_lm 1 8.0 0.021 0.021 6.664 6.671 mp2_eri_2c_integrate_gpw 1 9.0 1.896 1.896 6.643 6.650 scf_env_do_scf 1 3.0 0.000 0.000 5.827 5.828 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 5.827 5.828 ao_to_mo_and_store_B_E_Ex_1 1328 7.0 3.620 3.670 5.400 5.441 mp2_ri_gpw_compute_en_ener 1040 7.0 4.669 4.679 4.669 4.679 fft_wrap_pw1pw2_20 10647 11.4 0.022 0.023 4.381 4.564 mp2_ri_gpw_compute_en_comm 221 7.0 0.985 1.000 4.374 4.448 qs_scf_new_mos 10 5.0 0.000 0.000 4.323 4.324 multiply_cannon_multrec 2676 11.0 1.714 1.725 4.266 4.270 pw_gpu_r3dc1d_3d 13282 12.2 3.813 4.019 3.813 4.019 pw_gpu_c1dr3d_3d 13280 12.7 2.652 2.685 2.652 2.685 potential_pw2rs 2666 10.0 0.093 0.094 2.598 2.625 mp_sendrecv_dm3 442 8.0 2.403 2.454 2.403 2.454 fft_wrap_pw1pw2_10 15957 11.5 0.019 0.020 2.355 2.411 dbcsr_mm_accdrv_process 5392 12.0 0.673 1.099 2.329 2.344 collocate_single_gaussian 1328 10.0 0.087 0.089 2.266 2.300 eigensolver 11 5.8 0.001 0.002 2.200 2.200 mp2_eri_2c_integrate_gpw_pot_l 1328 10.0 0.004 0.004 2.134 2.162 copy_dbcsr_to_fm 1351 8.0 0.032 0.033 2.146 2.156 replicate_iaK_2intgroup 1 6.0 2.010 2.015 2.149 2.154 multiply_cannon_sync_h2d 2676 11.0 1.642 2.137 1.642 2.137 fill_local_i_aL 884 7.5 2.113 2.118 2.113 2.118 ------------------------------------------------------------------------------- Plot: name="RI-MP2_ammonia_timings_6cpu_1gpu", title="Timings of RI-MP2_ammonia with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="rest", label="rest", y=53.23, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="local_gemm", label="local_gemm", y=15.85, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=12.49, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.063, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=7.665, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="collocate_function", label="collocate_function", y=4.716, yerr=0.0 Running diag_cu144_broy.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/diag_cu144_broy_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.077 0.078 201.024 201.024 qs_energies 1 2.0 0.000 0.000 199.992 199.993 scf_env_do_scf 1 3.0 0.000 0.000 186.842 186.842 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 186.842 186.842 qs_ks_update_qs_env 15 5.0 0.000 0.000 91.983 92.049 rebuild_ks_matrix 15 6.0 0.000 0.000 91.786 91.851 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 91.786 91.851 qs_vxc_create 15 8.0 0.000 0.000 56.066 56.115 qs_scf_new_mos 15 5.0 0.000 0.000 53.786 53.824 fft_wrap_pw1pw2 1086 10.0 0.029 0.031 48.745 48.757 calculate_dispersion_nonloc 15 9.0 10.528 10.579 48.254 48.305 eigensolver 15 6.0 0.002 0.002 44.802 44.834 qs_rho_update_rho_low 16 5.0 0.000 0.000 39.335 39.337 calculate_rho_elec 16 6.0 0.173 0.174 39.335 39.337 sum_up_and_integrate 15 8.0 0.000 0.000 34.271 34.286 integrate_v_rspace 15 9.0 0.046 0.046 34.247 34.261 cp_fm_diag_elpa 15 7.0 0.000 0.000 28.512 28.517 cp_fm_diag_elpa_base 15 8.0 26.748 27.282 28.506 28.506 grid_collocate_task_list 16 7.0 28.372 28.448 28.372 28.448 grid_integrate_task_list 15 10.0 27.523 27.589 27.523 27.589 pw_gpu_c1dr3d_3d_ps 585 12.1 5.458 5.523 25.599 25.640 fft_wrap_pw1pw2_150 765 11.0 0.005 0.005 25.097 25.122 pw_gpu_r3dc1d_3d_ps 501 11.9 4.567 4.724 23.110 23.137 cp_fm_cholesky_restore 45 7.0 14.448 15.136 14.448 15.136 fft_wrap_pw1pw2_200 197 11.3 0.001 0.001 11.917 11.947 density_rs2pw 16 7.0 0.001 0.001 10.781 10.857 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 9.294 9.294 vdW_energy 15 10.0 8.968 8.993 8.968 8.993 pw_gpu_ffc 585 13.1 8.695 8.768 8.695 8.768 pw_gpu_cff 501 12.9 8.214 8.216 8.214 8.216 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 7.955 8.003 xc_vxc_pw_create 15 9.0 0.172 0.175 7.812 7.814 mp_alltoall_z22v 1086 14.0 6.561 6.877 6.561 6.877 pw_gpu_sf 585 13.1 6.819 6.836 6.819 6.836 potential_pw2rs 15 10.0 0.007 0.007 6.678 6.729 pw_gpu_fg 501 12.9 6.409 6.444 6.409 6.444 copy_dbcsr_to_fm 16 5.9 0.001 0.001 6.176 6.184 fft_wrap_pw1pw2_10 62 10.5 0.000 0.000 5.260 5.262 dbcsr_complete_redistribute 46 8.3 1.629 1.663 5.192 5.229 cp_fm_uplo_to_full 30 8.0 3.598 4.782 3.598 4.782 x_to_yz 585 13.1 1.029 1.045 4.594 4.673 xc_pw_derive 90 11.0 0.001 0.001 4.559 4.576 xc_rho_set_and_dset_create 15 10.0 0.130 0.132 4.563 4.574 build_core_ppnl 1 5.0 4.502 4.504 4.502 4.504 yz_to_x 501 12.9 0.868 0.869 3.864 4.084 ------------------------------------------------------------------------------- Plot: name="diag_cu144_broy_timings_6cpu_1gpu", title="Timings of diag_cu144_broy with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="rest", label="rest", y=93.405, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=28.372, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=27.523, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=26.748, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=14.448, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="calculate_dispersion_nonloc", label="calculate_dispersion_nonloc", y=10.528, yerr=0.0 Running bench_dftb.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/bench_dftb_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.043 0.045 253.773 253.777 qs_energies 1 2.0 0.000 0.000 253.660 253.663 ls_scf 1 3.0 0.000 0.000 252.839 252.843 ls_scf_main 1 4.0 0.001 0.001 243.621 243.624 density_matrix_trs4 11 5.0 0.007 0.008 204.492 204.492 dbcsr_multiply_generic 185 6.1 0.313 0.325 166.694 166.708 multiply_cannon 185 7.1 1.853 1.860 113.489 113.558 multiply_cannon_loop 185 8.1 0.318 0.320 99.293 99.627 multiply_cannon_multrec 370 9.1 75.212 75.412 83.907 84.070 make_m2s 370 7.1 0.027 0.027 45.731 45.875 make_images 370 8.1 11.711 12.245 44.722 44.865 ls_scf_dm_to_ks 11 5.0 0.000 0.000 35.623 35.627 matrix_ls_to_qs 11 6.0 0.000 0.000 32.638 32.943 dbcsr_complete_redistribute 23 7.5 18.954 19.125 27.120 27.325 matrix_decluster 11 7.0 0.000 0.000 24.884 25.089 arnoldi_extremal 12 6.1 0.000 0.000 22.513 22.517 arnoldi_normal_ev 12 7.1 0.009 0.009 22.512 22.516 build_subspace 23 8.1 0.059 0.060 22.039 22.039 dbcsr_matrix_vector_mult 652 9.0 0.147 0.148 20.620 20.696 dbcsr_matrix_vector_mult_local 652 10.0 19.666 19.742 19.674 19.750 make_images_data 370 9.1 0.012 0.012 17.001 17.634 hybrid_alltoall_any 393 9.9 11.542 11.762 16.513 17.147 calculate_norms 740 9.1 14.547 15.047 14.547 15.047 dbcsr_finalize 559 7.6 0.198 0.204 13.679 13.916 dbcsr_merge_all 510 8.6 2.384 2.560 12.512 12.734 dbcsr_copy 761 7.5 1.674 1.692 9.446 9.566 setup_rec_index_2d 370 8.1 9.179 9.219 9.179 9.219 dbcsr_special_finalize 555 9.1 0.010 0.010 8.981 9.003 dbcsr_dot 144 6.3 7.487 7.559 8.154 8.478 dbcsr_sort_indices 1283 10.0 8.375 8.419 8.375 8.419 dbcsr_add_d 280 6.0 0.001 0.001 8.118 8.340 dbcsr_add_anytype 280 7.0 3.639 3.655 8.117 8.339 dbcsr_copy_into_existing 11 8.0 7.752 7.854 7.752 7.854 ls_scf_init_scf 1 4.0 0.000 0.000 7.806 7.807 ls_scf_init_matrix_S 1 5.0 0.000 0.000 7.377 7.397 tree_to_linear_d 23 10.5 6.814 6.850 6.814 6.850 dbcsr_mm_accdrv_process 14501 10.0 0.683 0.749 6.788 6.798 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.000 6.707 6.710 mp_waitall_1 5192 10.5 5.184 6.042 5.184 6.042 dbcsr_mm_accdrv_process_sort 14501 11.0 6.028 6.028 6.028 6.028 dbcsr_merge_single_wm 370 10.1 0.527 0.542 5.804 5.808 make_images_pack 370 9.1 5.485 5.643 5.499 5.657 ------------------------------------------------------------------------------- Plot: name="bench_dftb_timings_6cpu_1gpu", title="Timings of bench_dftb with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="rest", label="rest", y=113.68299999999999, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=75.212, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=19.666, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=18.954, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="calculate_norms", label="calculate_norms", y=14.547, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="make_images", label="make_images", y=11.711, yerr=0.0 Running dbcsr.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/dbcsr_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.004 0.004 46.241 46.241 lib_test 1 2.0 0.000 0.000 46.235 46.235 dbcsr_run_tests 3 3.0 0.000 0.000 46.234 46.235 test_multiplies_multiproc 3 4.0 0.001 0.001 35.565 35.594 dbcsr_multiply_generic 9 5.0 0.002 0.002 27.732 27.734 multiply_cannon 9 6.0 0.098 0.178 18.159 18.551 multiply_cannon_loop 9 7.0 0.002 0.003 16.786 17.082 multiply_cannon_multrec 18 8.0 8.981 9.278 15.692 15.986 dbcsr_make_random_matrix 9 4.0 7.107 7.122 10.531 10.561 dbcsr_finalize 27 5.7 0.001 0.001 7.377 7.389 dbcsr_merge_all 18 6.5 3.482 3.482 7.267 7.277 dbcsr_mm_accdrv_process 8199 9.0 1.411 1.505 6.500 6.510 dbcsr_redistribute 9 5.0 3.391 3.396 5.499 5.500 make_m2s 18 6.0 0.001 0.001 4.979 5.000 make_images 18 7.0 0.392 0.400 4.947 4.969 dbcsr_mm_accdrv_process_sort 8199 10.0 4.376 4.378 4.376 4.378 make_images_data 18 8.0 0.001 0.001 2.761 2.780 hybrid_alltoall_any 18 9.0 2.376 2.389 2.730 2.749 dbcsr_data_copy_aa2 18 7.5 1.864 1.868 1.864 1.868 mp_alltoall_d11v 27 6.0 1.860 1.868 1.860 1.868 tree_to_linear_d 9 7.0 1.795 1.801 1.795 1.801 dbcsr_data_release 507 7.7 1.291 1.297 1.291 1.297 dbcsr_data_new 354 7.4 0.954 1.074 0.954 1.074 dbcsr_checksum 6 5.0 0.933 0.933 0.940 0.940 ------------------------------------------------------------------------------- Plot: name="dbcsr_timings_6cpu_1gpu", title="Timings of dbcsr with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="rest", label="rest", y=18.904, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=8.981, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=7.107, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_mm_accdrv_process_sort", label="dbcsr_mm_accdrv_process_sort", y=4.376, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.482, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_redistribute", label="dbcsr_redistribute", y=3.391, yerr=0.0 Running MQAE_single_node.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/MQAE_single_node_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.052 0.052 191.075 191.077 qs_mol_dyn_low 1 2.0 0.004 0.004 189.606 189.644 qs_forces 6 3.8 0.001 0.001 121.377 121.378 qs_energies 6 4.8 0.000 0.000 114.746 114.747 scf_env_do_scf 6 5.8 0.000 0.000 107.368 107.369 scf_env_do_scf_inner_loop 113 6.2 0.005 0.007 99.807 99.808 velocity_verlet 5 3.0 0.003 0.003 88.858 88.903 rebuild_ks_matrix 119 8.1 0.000 0.001 81.615 81.615 qs_ks_build_kohn_sham_matrix 119 9.1 0.018 0.018 81.615 81.615 qs_ks_update_qs_env 119 7.3 0.001 0.001 77.017 77.017 fft_wrap_pw1pw2 2059 12.4 0.045 0.045 63.998 64.007 fft_wrap_pw1pw2_150 1321 13.9 0.009 0.009 61.289 61.344 qs_vxc_create 119 10.1 0.002 0.002 52.220 52.220 xc_vxc_pw_create 119 11.1 1.463 1.475 52.218 52.218 xc_pw_derive 714 13.1 0.010 0.010 36.236 36.327 qmmm_el_coupling 6 3.8 0.000 0.000 35.366 35.372 qmmm_elec_with_gaussian 6 4.8 0.019 0.019 35.360 35.366 pw_gpu_c1dr3d_3d_ps 1095 14.8 10.454 10.543 34.662 34.747 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 33.748 34.299 qmmm_forces 6 3.8 0.001 0.001 30.039 30.039 qmmm_elec_gaussian_low_G 6 6.8 29.310 29.845 29.310 29.845 qmmm_forces_with_gaussian 6 4.8 0.022 0.023 29.304 29.675 pw_gpu_r3dc1d_3d_ps 964 14.0 9.130 9.226 29.279 29.372 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 28.065 28.430 xc_rho_set_and_dset_create 119 12.1 2.338 2.349 26.313 26.372 xc_pw_divergence 119 12.1 0.006 0.006 24.047 24.104 qmmm_forces_gaussian_low_G 6 6.8 23.386 23.741 23.386 23.741 qs_rho_update_rho_low 119 7.3 0.001 0.001 21.667 21.742 calculate_rho_elec 119 8.3 1.048 1.049 21.666 21.742 density_rs2pw 119 9.3 0.007 0.007 15.598 15.723 sum_up_and_integrate 119 10.1 0.002 0.002 13.566 13.606 dbcsr_multiply_generic 2598 12.3 0.091 0.092 13.494 13.551 integrate_v_rspace 119 11.1 0.021 0.021 13.387 13.428 mp_alltoall_z22v 2059 16.4 12.663 12.898 12.663 12.898 multiply_cannon 2598 13.3 0.206 0.210 11.919 11.965 multiply_cannon_loop 2598 14.3 0.237 0.241 11.471 11.514 potential_pw2rs 119 12.1 0.033 0.033 9.466 9.466 x_to_yz 1095 15.8 2.301 2.301 9.225 9.359 multiply_cannon_multrec 5196 15.3 4.003 4.044 9.339 9.357 qs_ks_ddapc 119 10.1 0.002 0.002 8.664 8.690 pw_gpu_sf 1095 15.8 8.611 8.613 8.611 8.613 pw_gpu_fg 964 15.0 7.579 7.672 7.579 7.672 yz_to_x 964 15.0 1.817 1.831 7.556 7.644 init_scf_loop 6 6.8 0.000 0.000 7.558 7.558 qs_scf_new_mos 113 7.2 0.001 0.001 7.185 7.186 qs_scf_loop_do_ot 113 8.2 0.001 0.001 7.184 7.185 ot_scf_mini 113 9.2 0.002 0.002 6.900 6.901 pw_gpu_ffc 1095 15.8 6.355 6.392 6.355 6.392 init_scf_run 6 5.8 0.000 0.000 5.339 5.339 scf_env_initial_rho_setup 6 6.8 0.000 0.000 5.339 5.339 dbcsr_mm_accdrv_process 13992 16.0 0.516 0.518 5.272 5.331 grid_collocate_task_list 119 9.3 4.995 5.042 4.995 5.042 xc_functional_eval 238 13.1 0.003 0.003 4.990 5.032 pw_gpu_cff 964 15.0 4.946 4.954 4.946 4.954 ot_mini 113 10.2 0.001 0.001 4.914 4.914 jit_kernel_multiply 24 14.7 4.718 4.775 4.718 4.775 pw_poisson_solve 125 9.9 0.003 0.003 4.768 4.773 qmmm_forces_gaussian_low_R 6 6.8 0.000 0.000 4.679 4.690 qmmm_forces_with_gaussian_LG 6 7.8 4.679 4.690 4.679 4.690 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 4.628 4.628 qmmm_elec_gaussian_low_R 6 6.8 0.000 0.000 4.438 4.454 qmmm_elec_with_gaussian_LG 6 7.8 4.438 4.454 4.438 4.454 pw_derive 1089 13.4 4.225 4.250 4.225 4.250 qs_ot_get_derivative 113 11.2 0.001 0.001 4.021 4.026 grid_integrate_task_list 119 12.1 3.899 3.941 3.899 3.941 ------------------------------------------------------------------------------- Plot: name="MQAE_single_node_timings_6cpu_1gpu", title="Timings of MQAE_single_node with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="rest", label="rest", y=106.13199999999999, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=29.31, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=23.386, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=12.663, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.454, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=9.13, yerr=0.0 Summary: Performance test took 22 minutes. Status: OK ---> Removed intermediate container 6fa5bdbe37da ---> bd873519e540 Step 48/49 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in 2d162d3b65a5 ---> Removed intermediate container 2d162d3b65a5 ---> bac0f472a45b Step 49/49 : ENTRYPOINT [] ---> Running in 02513d440760 ---> Removed intermediate container 02513d440760 ---> 3267276365c7 [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built 3267276365c7 Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2025-10-25 20:17:04+00:00