StartDate: 2025-12-10 06:06:09+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: fdb532c570154b91bf3d504986c90de1a2991301 CommitTime: 2025-12-09 16:11:40 +0100 CommitAuthor: Juerg Hutter CommitSubject: Simplified FM matrix generation (for tmp matrices) (#4599) #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=fdb532c570154b91bf3d504986c90de1a2991301 SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 408.1MB Step 1/49 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 15a17189b2df: Waiting 02cb0e091e33: Waiting 9c3d619183d2: Waiting 7f7602a82106: Waiting 5a2aba542b08: Waiting 6cb9b761b877: Waiting 2cd52cbb1ebe: Waiting 6e8af4fd0a07: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 2cd52cbb1ebe: Verifying Checksum 2cd52cbb1ebe: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 6e8af4fd0a07: Download complete 02cb0e091e33: Download complete 9c3d619183d2: Verifying Checksum 9c3d619183d2: Download complete 7f7602a82106: Verifying Checksum 7f7602a82106: Download complete 02559cd4bc8d: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Download complete 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/49 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/49 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/49 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/49 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/49 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/49 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> f843eeab6072 Step 8/49 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> 896c2903221b Step 9/49 : RUN mkdir scripts ---> Using cache ---> ced8e2638937 Step 10/49 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./scripts/ ---> Using cache ---> 907a94a49441 Step 11/49 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> 51152901f729 Step 12/49 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --gpu-ver=V100 --with-tblite=no --dry-run ---> Using cache ---> 6d5d0564da7b Step 13/49 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> aa13859b90ee Step 14/49 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> f399f0adc829 Step 15/49 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 1f73e0419d33 Step 16/49 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> 01b7e5c58917 Step 17/49 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> b75d15f7c474 Step 18/49 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> 9c4173922d52 Step 19/49 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> d72d1bf0a34b Step 20/49 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> f469ac4cb109 Step 21/49 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> fa3055330eb1 Step 22/49 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> c2f6a4b2a016 Step 23/49 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> Using cache ---> cd0ebda4c95e Step 24/49 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Using cache ---> 99fe9a0f7be3 Step 25/49 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> Using cache ---> 055dfd908762 Step 26/49 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Using cache ---> 053e616d01ed Step 27/49 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> Using cache ---> 00882cca8a57 Step 28/49 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Using cache ---> b7a1c3f38898 Step 29/49 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> Using cache ---> b8dcf3bddf1c Step 30/49 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Using cache ---> 1293bbad6b59 Step 31/49 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> Using cache ---> 0a180dbc0a74 Step 32/49 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Using cache ---> c25a56232363 Step 33/49 : COPY ./tools/toolchain/scripts/arch_base.tmpl ./tools/toolchain/scripts/generate_arch_files.sh ./scripts/ ---> Using cache ---> ff962b06ea27 Step 34/49 : RUN ./scripts/generate_arch_files.sh && rm -rf ./build ---> Using cache ---> 705385d1b7de Step 35/49 : WORKDIR /opt/cp2k ---> Using cache ---> bc234b896bc0 Step 36/49 : COPY ./src ./src ---> 5bff4ec52476 Step 37/49 : COPY ./data ./data ---> 54ac7f213504 Step 38/49 : COPY ./tests ./tests ---> 88e56ca91cc5 Step 39/49 : COPY ./tools/build_utils ./tools/build_utils ---> edd30a5413ac Step 40/49 : COPY ./cmake ./cmake ---> 563aafdd9cbc Step 41/49 : COPY ./CMakeLists.txt . ---> 5d341c10f1db Step 42/49 : COPY ./tools/docker/scripts/build_cp2k_cmake.sh . ---> 5d5c28d03fab Step 43/49 : RUN ./build_cp2k_cmake.sh toolchain_cuda_V100 psmp ---> Running in 1f6e36a9ca61 ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-4.3.1/lib/libmpi.so (found version "4.1") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-4.3.1/lib/libmpicxx.so (found version "4.1") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-4.3.1/lib/libmpifort.so (found version "4.1") -- Found MPI: TRUE (found version "4.1") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "4.1") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.30 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.30/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Checking for module 'libxsmm-shared' -- Found libxsmm-shared, version 1.17.0 -- Checking for module 'libxsmmf-shared' -- Found libxsmmf-shared, version 1.17.0 -- Checking for module 'libxsmmext-shared' -- Found libxsmmext-shared, version 1.17.0 -- Checking for module 'libxsmmnoblas-shared' -- Found libxsmmnoblas-shared, version 1.17.0 -- Found LibXSMM: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include -- Using LIBXSMM for Small Matrix Multiplication -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a -- CP2K_WITH_GPU is deprecated in favor of CMAKE_HIP_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 70 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "4.1") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "1.14.6") found components: C Fortran -- Found MPI: TRUE (found version "4.1") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Checking for module 'libint2' -- Found libint2, version 2.6.0 -- Found Libint2: /opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include;/opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include/libint2 -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.5.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "1.14.6") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.10/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'libvdwxc>=0.3.0' -- Looking for vdwxc_init_mpi -- Looking for vdwxc_init_mpi - not found -- Found LibVDWXC: /opt/cp2k-toolchain/install/libvdwxc-0.4.0/lib/libvdwxc.a (Required is at least version "0.3.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - vendor: OpenBLAS - include directories: /opt/cp2k-toolchain/install/openblas-0.3.30/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so - LAPACK - include directories: /opt/cp2k-toolchain/install/openblas-0.3.30/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.30/lib/libopenblas.so - MPI - include directories: /opt/cp2k-toolchain/install/mpich-4.3.1/include - libraries: /opt/cp2k-toolchain/install/mpich-4.3.1/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-4.3.1/lib/libmpi.so - MPI_F08: ON - ScaLAPACK - vendor: auto - include directories: - libraries: /opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a - Hardware Acceleration: - CUDA: - GPU architecture number: 70 - GPU profiling enabled: OFF - GPU accelerated modules - PW module: ON - GRID module: ON - DBM module: ON - LibXC - version: 7.0.0 - include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - version: 1.14.6 - include directories: /opt/cp2k-toolchain/install/hdf5-1.14.6/include - libraries: hdf5-shared - FFTW3 - include directories: /opt/cp2k-toolchain/install/fftw-3.3.10/include - libraries: /opt/cp2k-toolchain/install/fftw-3.3.10/lib/libfftw3.a - LIBXSMM - include directories: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include - libraries: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmext.so;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so;/opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmf.so;:libxsmmext.a;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so - SpLA - include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - SIRIUS - include directories: - libraries: - COSMA - include directories: /opt/cp2k-toolchain/install/COSMA-2.7.0/include - libraries: MPI::MPI_CXX;costa::costa;$;$;cosma::BLAS::blas;cosma::scalapack::scalapack - Libint2 - include directories: /opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include;/opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/include/libint2 - libraries: /opt/cp2k-toolchain/install/libint-v2.6.0-cp2k-lmax-5/lib/libint2.a - ELPA - include directories: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/include/elpa_openmp-2024.05.001 - libraries: /opt/cp2k-toolchain/install/elpa-2024.05.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.2/lib/libscalapack.a;:libopenblas.a - GRPP -------------------------------------------------------------------- - - - List of dependencies not included in this build - - - -------------------------------------------------------------------- - DFTD4 - DeePMD - PEXSI - ACE (libpace) - TBLITE - Spglib - LibSMEAGOL - DLA-Future - PLUMED - Libvori - LibTorch - TREXIO - GreenX To run the regtests you need to run the following commands cd .. export CP2K_DATA_DIR=/opt/cp2k/data/ ./tests/do_regtest.py /opt/cp2k/build/bin psmp -- Configuring done (11.4s) -- Generating done (0.4s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container 1f6e36a9ca61 ---> 1677a1f26e09 Step 44/49 : COPY ./benchmarks ./benchmarks ---> a310c7771686 Step 45/49 : COPY ./tools/regtesting ./tools/regtesting ---> 944ebcd2ce9e Step 46/49 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> 0b23aaa36e4c Step 47/49 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in 58788c242ef4 ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc libgrpp elpa parallel scalapack mpi_f08 cosma xsmm dbcsr_acc sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 77 input files and 0 errors. ========== Running Performance Test ========== Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.026 0.026 99.355 99.356 qs_mol_dyn_low 1 2.0 0.004 0.004 98.953 98.955 qs_forces 11 3.9 0.002 0.002 98.906 98.906 qs_energies 11 4.9 0.001 0.001 87.958 87.958 scf_env_do_scf 11 5.9 0.001 0.001 68.214 68.214 velocity_verlet 10 3.0 0.001 0.002 63.519 63.537 scf_env_do_scf_inner_loop 108 6.5 0.006 0.008 58.492 58.492 rebuild_ks_matrix 119 8.3 0.001 0.001 26.187 26.191 qs_ks_build_kohn_sham_matrix 119 9.3 0.019 0.019 26.186 26.191 dbcsr_multiply_generic 2286 12.5 0.134 0.135 24.873 24.926 qs_ks_update_qs_env 119 7.6 0.001 0.001 23.952 23.956 qs_rho_update_rho_low 119 7.7 0.001 0.001 19.817 19.822 calculate_rho_elec 119 8.7 0.845 0.852 19.816 19.821 qs_scf_new_mos 108 7.5 0.001 0.001 19.407 19.412 qs_scf_loop_do_ot 108 8.5 0.001 0.001 19.406 19.411 ot_scf_mini 108 9.5 0.003 0.003 17.472 17.476 fft_wrap_pw1pw2 1201 11.6 0.023 0.024 15.444 15.496 sum_up_and_integrate 119 10.3 0.002 0.002 13.740 13.791 integrate_v_rspace 119 11.3 0.349 0.351 13.648 13.699 fft_wrap_pw1pw2_140 487 12.2 0.003 0.003 13.275 13.340 multiply_cannon 2286 13.5 0.311 0.313 12.325 12.330 multiply_cannon_loop 2286 14.5 0.243 0.245 11.294 11.307 make_m2s 4572 13.5 0.041 0.042 10.958 10.969 make_images 4572 14.5 1.459 1.485 10.788 10.797 ot_mini 108 10.5 0.001 0.001 10.507 10.512 init_scf_run 11 5.9 0.000 0.000 10.070 10.070 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.070 10.070 density_rs2pw 119 9.7 0.007 0.007 9.925 9.990 init_scf_loop 11 6.9 0.000 0.000 9.650 9.650 grid_collocate_task_list 119 9.7 9.010 9.043 9.010 9.043 pw_gpu_r3dc1d_3d_ps 606 13.1 2.277 2.304 7.913 7.916 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 7.731 7.732 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.571 7.673 pw_gpu_c1dr3d_3d_ps 595 14.2 2.204 2.227 7.502 7.552 grid_integrate_task_list 119 12.3 7.405 7.457 7.405 7.457 wfi_extrapolate 11 7.9 0.001 0.001 7.301 7.301 prepare_preconditioner 11 7.9 0.000 0.000 6.472 6.477 make_preconditioner 11 8.9 0.000 0.000 6.472 6.477 qs_ot_get_derivative 108 11.5 0.002 0.002 6.344 6.351 multiply_cannon_multrec 4572 15.5 2.227 2.245 6.234 6.259 hybrid_alltoall_any 4725 16.4 4.696 4.699 6.024 6.036 make_images_data 4572 15.5 0.051 0.051 5.879 5.895 potential_pw2rs 119 12.3 0.035 0.036 5.893 5.894 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.510 5.741 parallel_gemm_fm_cosma 81 9.0 5.286 5.286 5.286 5.286 ot_diis_step 108 11.5 0.005 0.005 4.137 4.137 build_core_ppl_forces 11 5.9 3.803 3.895 3.803 3.895 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.773 3.804 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.627 3.627 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.615 3.616 apply_single 119 13.6 0.001 0.001 3.615 3.616 dbcsr_mm_accdrv_process 9594 16.2 0.890 1.300 3.595 3.596 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.271 3.272 calculate_dm_sparse 119 9.5 0.001 0.001 3.237 3.238 multiply_cannon_sync_h2d 4572 15.5 3.105 3.161 3.105 3.161 dbcsr_complete_redistribute 329 12.2 1.101 1.148 2.903 3.148 mp_alltoall_z22v 1201 15.6 3.035 3.113 3.035 3.113 qs_create_task_list 11 7.9 0.000 0.000 2.845 2.874 generate_qs_task_list 11 8.9 1.115 1.130 2.845 2.874 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.723 2.723 mp_waitall_1 64495 16.9 2.638 2.675 2.638 2.675 pw_poisson_solve 119 10.3 0.003 0.003 2.640 2.647 jit_kernel_multiply 12 15.5 2.120 2.528 2.120 2.528 qs_ot_get_p 119 10.4 0.001 0.001 2.410 2.412 transfer_rs2pw 487 10.6 0.008 0.008 2.286 2.354 calculate_first_density_matrix 1 7.0 0.000 0.000 2.339 2.340 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.317 2.328 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.261 2.261 pw_gpu_fg 606 14.1 2.246 2.251 2.246 2.251 qs_ot_get_derivative_taylor 59 13.0 0.002 0.002 2.116 2.116 dbcsr_special_finalize 6858 15.5 0.040 0.041 2.040 2.042 yz_to_x 606 14.1 0.454 0.456 1.987 2.026 cp_fm_cholesky_invert 11 10.9 2.020 2.020 2.020 2.020 x_to_yz 595 15.2 0.473 0.482 1.976 2.004 ------------------------------------------------------------------------------- Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=69.155, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=9.01, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.405, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.286, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.696, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.803, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.026 0.026 94.120 94.121 qs_mol_dyn_low 1 2.0 0.004 0.004 93.698 93.701 qs_forces 11 3.9 0.002 0.002 93.655 93.655 qs_energies 11 4.9 0.001 0.001 82.650 82.651 scf_env_do_scf 11 5.9 0.001 0.001 62.904 62.905 velocity_verlet 10 3.0 0.001 0.002 60.858 60.873 scf_env_do_scf_inner_loop 96 6.5 0.005 0.007 53.210 53.210 rebuild_ks_matrix 107 8.3 0.001 0.001 25.401 25.401 qs_ks_build_kohn_sham_matrix 107 9.3 0.017 0.017 25.400 25.400 qs_ks_update_qs_env 107 7.6 0.001 0.001 22.896 22.897 dbcsr_multiply_generic 1966 12.4 0.116 0.117 22.298 22.382 qs_rho_update_rho_low 107 7.7 0.001 0.001 18.111 18.126 calculate_rho_elec 107 8.7 0.757 0.762 18.111 18.126 qs_scf_new_mos 96 7.5 0.001 0.001 17.033 17.033 qs_scf_loop_do_ot 96 8.5 0.001 0.001 17.032 17.032 ot_scf_mini 96 9.5 0.002 0.003 15.376 15.378 sum_up_and_integrate 107 10.3 0.002 0.002 14.223 14.302 integrate_v_rspace 107 11.3 0.311 0.311 14.142 14.222 fft_wrap_pw1pw2 1081 11.6 0.020 0.021 14.020 14.052 fft_wrap_pw1pw2_140 439 12.2 0.003 0.003 12.043 12.081 multiply_cannon 1966 13.4 0.277 0.283 11.031 11.048 multiply_cannon_loop 1966 14.4 0.207 0.208 10.163 10.166 init_scf_run 11 5.9 0.000 0.000 9.945 9.946 scf_env_initial_rho_setup 11 6.9 0.000 0.001 9.945 9.945 make_m2s 3932 13.4 0.036 0.037 9.835 9.842 make_images 3932 14.4 1.311 1.328 9.684 9.692 init_scf_loop 11 6.9 0.000 0.000 9.624 9.624 ot_mini 96 10.5 0.001 0.001 9.279 9.279 density_rs2pw 107 9.7 0.006 0.006 9.030 9.171 grid_integrate_task_list 107 12.3 8.490 8.570 8.490 8.570 grid_collocate_task_list 107 9.7 8.304 8.430 8.304 8.430 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 7.883 7.883 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.413 7.530 pw_gpu_r3dc1d_3d_ps 546 13.1 2.046 2.071 7.211 7.212 wfi_extrapolate 11 7.9 0.001 0.001 7.191 7.191 pw_gpu_c1dr3d_3d_ps 535 14.2 1.980 1.992 6.784 6.814 prepare_preconditioner 11 7.9 0.000 0.000 6.333 6.338 make_preconditioner 11 8.9 0.000 0.000 6.333 6.338 multiply_cannon_multrec 3932 15.4 1.838 1.855 5.647 5.652 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.394 5.619 qs_ot_get_derivative 96 11.5 0.001 0.001 5.606 5.608 hybrid_alltoall_any 4079 16.3 4.227 4.242 5.396 5.430 potential_pw2rs 107 12.3 0.031 0.032 5.341 5.341 make_images_data 3932 15.4 0.044 0.045 5.289 5.302 parallel_gemm_fm_cosma 81 9.0 5.199 5.199 5.199 5.199 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.858 3.890 build_core_ppl_forces 11 5.9 3.748 3.836 3.748 3.836 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.713 3.756 ot_diis_step 96 11.5 0.005 0.005 3.652 3.652 dbcsr_mm_accdrv_process 8450 16.1 0.568 0.695 3.463 3.483 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.454 3.454 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.257 3.259 apply_single 107 13.6 0.000 0.001 3.256 3.259 dbcsr_complete_redistribute 317 12.2 1.084 1.123 2.935 3.177 qs_create_task_list 11 7.9 0.000 0.000 3.052 3.062 generate_qs_task_list 11 8.9 1.380 1.385 3.052 3.062 calculate_dm_sparse 107 9.5 0.001 0.001 2.904 2.905 multiply_cannon_sync_h2d 3932 15.4 2.798 2.812 2.798 2.812 mp_alltoall_z22v 1081 15.6 2.734 2.801 2.734 2.801 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.703 2.704 jit_kernel_multiply 12 15.7 2.373 2.479 2.373 2.479 pw_poisson_solve 107 10.3 0.002 0.003 2.367 2.369 mp_waitall_1 55487 16.8 2.350 2.356 2.350 2.356 transfer_rs2pw 439 10.6 0.007 0.007 2.158 2.342 calculate_first_density_matrix 1 7.0 0.000 0.000 2.325 2.325 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.288 2.315 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.249 2.249 qs_ot_get_p 107 10.4 0.001 0.001 2.091 2.092 pw_gpu_fg 546 14.1 2.078 2.081 2.078 2.081 transfer_rs2pw_140 118 11.5 1.316 1.321 1.808 1.996 cp_fm_cholesky_invert 11 10.9 1.975 1.975 1.975 1.975 build_core_ppl 11 7.9 1.929 1.965 1.929 1.965 ------------------------------------------------------------------------------- Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=64.152, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.49, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.304, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.199, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.227, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.748, yerr=0.0 Running GW_PBE_4benzene.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.020 0.021 159.033 159.033 qs_energies 1 2.0 0.000 0.000 158.725 158.728 mp2_main 1 3.0 0.000 0.000 152.643 152.646 mp2_gpw_main 1 4.0 0.000 0.000 150.960 150.963 rpa_ri_compute_en 1 5.0 0.000 0.000 141.565 141.568 rpa_num_int 1 6.0 0.001 0.001 141.557 141.560 compute_mat_P_omega 1 7.0 0.001 0.002 65.962 65.964 compute_mat_P_omega_contract 10 8.0 5.042 5.097 65.321 65.329 dbt_total 2336 9.6 0.019 0.019 65.288 65.290 parallel_gemm_fm_cosma 105 8.4 62.759 62.762 62.759 62.762 dbt_contract 787 11.0 0.046 0.046 44.170 44.171 compute_W_cubic_GW 10 7.0 0.004 0.004 40.754 40.755 dbt_tas_total 1149 12.2 0.125 0.126 34.696 34.696 dbt_tas_multiply 807 12.1 0.002 0.002 34.043 34.043 dbt_tas_dbm 807 14.1 0.005 0.005 26.956 26.956 dbm_multiply 807 16.1 25.783 26.110 25.783 26.110 compute_mat_P_omega_calc_M_occ 250 9.0 5.068 5.119 23.256 23.256 dbt_copy 1107 10.7 0.069 0.070 21.403 21.588 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 21.282 21.282 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 20.978 20.978 dbt_tas_mm_1N 524 15.1 0.003 0.003 17.404 17.770 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 14.588 14.588 dbt_reshape 594 11.8 6.127 6.279 13.728 13.814 compute_QP_energies 1 7.0 0.000 0.000 11.355 11.355 compute_self_energy_cubic_gw 1 8.0 0.113 0.115 11.355 11.355 dbt_tas_reserve_blocks_index 3266 14.3 0.629 0.635 10.316 10.358 dbm_reserve_blocks 3634 15.3 10.003 10.050 10.003 10.050 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 9.385 9.385 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 8.722 8.722 dbt_reserve_blocks_index 2347 13.0 0.306 0.308 8.582 8.585 dbt_crop 1042 12.0 6.225 6.330 8.418 8.582 dbt_reserve_blocks_index_array 2289 12.1 0.010 0.011 8.392 8.419 dbt_tas_mm_2 251 15.0 0.002 0.002 7.489 7.489 mp_waitall_2 2656 15.9 5.615 5.632 5.615 5.632 scf_env_do_scf 1 3.0 0.000 0.000 5.568 5.568 scf_env_do_scf_inner_loop 17 4.0 0.001 0.001 5.567 5.567 get_2c_integrals 1 6.0 0.000 0.000 5.316 5.316 contract_cubic_gw 21 9.0 0.000 0.000 5.270 5.270 dbt_communicate_buffer 594 12.8 0.011 0.012 5.132 5.154 dbcsr_multiply_generic 30 8.1 0.003 0.003 4.859 4.908 multiply_cannon 30 9.1 0.014 0.020 4.669 4.717 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 4.675 4.682 multiply_cannon_loop 30 10.1 0.004 0.004 4.616 4.664 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.002 4.595 4.598 dbt_tas_copy 511 11.5 2.438 2.479 4.291 4.364 multiply_cannon_multrec 60 11.1 0.179 0.180 4.108 4.125 dbcsr_mm_accdrv_process 328 12.3 0.040 0.040 3.742 3.748 jit_kernel_multiply 18 11.7 3.696 3.702 3.696 3.702 ------------------------------------------------------------------------------- Plot: name="GW_PBE_4benzene_timings_6cpu_1gpu", title="Timings of GW_PBE_4benzene with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="rest", label="rest", y=48.13599999999998, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=62.759, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=25.783, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=10.003, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_crop", label="dbt_crop", y=6.225, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=6.127, yerr=0.0 Running RI-HFX_H2O-32.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.020 0.021 184.580 184.583 qs_forces 1 2.0 0.000 0.000 184.145 184.148 rebuild_ks_matrix 7 6.6 0.000 0.000 179.984 179.987 qs_ks_build_kohn_sham_matrix 7 7.6 0.001 0.002 179.984 179.987 hfx_ks_matrix 7 8.6 0.000 0.000 176.366 176.369 dbt_total 849 11.0 0.008 0.008 131.724 131.726 hfx_ri_update_ks 7 9.6 0.000 0.000 100.732 100.733 hfx_ri_update_ks_Pmat 7 10.6 20.658 20.674 100.727 100.729 qs_energies 1 3.0 0.000 0.000 96.107 96.109 scf_env_do_scf 1 4.0 0.000 0.000 94.084 94.086 qs_ks_update_qs_env 8 6.0 0.000 0.000 91.993 91.995 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 87.998 87.999 dbt_contract 207 12.4 0.047 0.047 77.560 77.561 hfx_ri_update_forces 1 7.0 0.979 0.983 75.632 75.633 dbt_tas_total 369 13.4 0.070 0.071 64.315 64.316 dbt_tas_multiply 216 13.5 0.001 0.001 61.678 61.679 scf_env_do_scf_inner_loop 6 5.0 0.000 0.000 50.759 50.760 dbt_copy 423 11.8 0.044 0.045 49.978 50.199 dbt_tas_dbm 216 15.5 0.002 0.002 49.085 49.086 dbm_multiply 216 17.5 46.259 46.290 46.259 46.290 hfx_ri_forces_Pmat_3c 1 8.0 3.246 3.290 44.857 44.901 init_scf_loop 2 5.0 0.000 0.000 43.324 43.325 dbt_reshape 175 13.2 16.855 16.877 37.204 37.376 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 29.330 29.331 precalc_derivatives 1 8.0 1.752 1.759 25.251 25.252 dbt_tas_mm_2 91 16.5 0.001 0.001 20.531 20.531 mp_waitall_2 1022 16.5 17.946 18.034 17.946 18.034 dbt_tas_reserve_blocks_index 1323 15.4 1.639 1.647 17.426 17.583 dbm_reserve_blocks 1491 16.3 16.465 16.612 16.465 16.612 dbt_crop 372 13.7 12.095 12.294 15.762 16.022 dbt_tas_mm_3T 77 17.1 0.001 0.001 15.404 15.701 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 15.053 15.053 dbt_communicate_buffer 175 14.2 0.004 0.004 14.824 14.891 dbt_reserve_blocks_index 889 14.5 0.589 0.590 14.127 14.238 dbt_reserve_blocks_index_array 859 13.5 0.007 0.007 13.872 13.975 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 13.946 13.947 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 13.380 13.381 build_3c_derivatives 3 9.0 2.143 2.184 13.309 13.310 dbt_tas_mm_3N 37 15.4 0.000 0.000 10.921 10.927 dbt_tas_copy 248 12.5 4.045 4.062 7.683 7.742 mp_sync 2901 12.8 5.761 5.911 5.761 5.911 hfx_ri_pre_scf_Pmat_int 1 13.0 0.000 0.000 4.799 4.799 dbt_tas_replicate 168 15.1 2.098 2.103 4.449 4.473 hfx_ri_pre_scf_calc_tensors 1 14.0 0.003 0.004 4.120 4.123 hfx_ri_pre_scf_Pmat_copy_2 9 13.0 1.600 1.611 4.090 4.101 dbt_tas_reserve_blocks_templat 266 13.6 0.103 0.104 3.688 3.733 hfx_ri_pre_scf_Pmat_RIx3C 9 13.0 0.000 0.000 3.686 3.720 ------------------------------------------------------------------------------- Plot: name="RI-HFX_H2O-32_timings_6cpu_1gpu", title="Timings of RI-HFX_H2O-32 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="rest", label="rest", y=66.397, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=46.259, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=20.658, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=17.946, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=16.855, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=16.465, yerr=0.0 Running RI-MP2_ammonia.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.010 102.711 102.712 qs_energies 1 2.0 0.000 0.000 102.533 102.533 mp2_main 1 3.0 0.000 0.000 95.773 95.773 mp2_gpw_main 1 4.0 0.001 0.001 95.426 95.426 mp2_ri_gpw_compute_in 1 5.0 0.531 0.536 53.541 53.612 mp2_ri_gpw_compute_in_loop 1 6.0 0.012 0.012 45.547 45.613 mp2_ri_gpw_compute_en 1 5.0 0.089 0.090 41.826 41.895 mp2_ri_gpw_compute_en_RI_loop 1 6.0 12.607 12.667 39.237 39.239 dbcsr_multiply_generic 2666 8.0 0.140 0.140 22.780 23.025 ao_to_mo_and_store_B_mult_1 1328 7.0 0.011 0.011 21.469 21.714 mp2_eri_3c_integrate_gpw 1328 7.0 0.016 0.017 18.582 18.817 mp2_ri_gpw_compute_en_expansio 1040 7.0 0.694 0.699 16.560 16.606 local_gemm 1040 8.0 15.866 15.907 15.866 15.907 make_m2s 5332 9.0 0.045 0.045 12.470 12.700 make_images 5332 10.0 3.172 3.201 12.304 12.535 integrate_v_rspace 1338 8.0 1.016 1.022 10.550 10.924 multiply_cannon 2666 9.0 0.342 0.353 9.708 10.180 multiply_cannon_loop 2666 10.0 0.171 0.172 8.706 9.162 grid_integrate_task_list 1338 9.0 8.240 8.626 8.240 8.626 fft_wrap_pw1pw2 26668 10.4 0.133 0.143 7.848 7.951 hybrid_alltoall_any 6683 11.6 7.332 7.539 7.579 7.784 make_images_data 5332 11.0 0.058 0.059 7.484 7.686 get_2c_integrals 1 6.0 0.004 0.004 7.454 7.462 collocate_function 1328 8.0 4.764 4.800 6.920 7.049 compute_2c_integrals 1 7.0 0.007 0.008 6.923 6.923 compute_2c_integrals_loop_lm 1 8.0 0.021 0.021 6.790 6.831 mp2_eri_2c_integrate_gpw 1 9.0 1.973 2.002 6.769 6.810 scf_env_do_scf 1 3.0 0.000 0.000 5.895 5.897 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 5.895 5.896 ao_to_mo_and_store_B_E_Ex_1 1328 7.0 3.441 3.469 5.220 5.269 multiply_cannon_multrec 2676 11.0 2.205 2.211 4.805 4.828 fft_wrap_pw1pw2_20 10647 11.4 0.021 0.022 4.565 4.649 mp2_ri_gpw_compute_en_ener 1040 7.0 4.589 4.597 4.589 4.597 mp2_ri_gpw_compute_en_comm 221 7.0 0.983 0.990 4.351 4.453 qs_scf_new_mos 10 5.0 0.000 0.000 4.401 4.409 pw_gpu_r3dc1d_3d 13282 12.2 3.966 4.016 3.966 4.016 pw_gpu_c1dr3d_3d 13280 12.7 2.717 2.757 2.717 2.757 potential_pw2rs 2666 10.0 0.091 0.093 2.682 2.755 mp_sendrecv_dm3 442 8.0 2.378 2.496 2.378 2.496 fft_wrap_pw1pw2_10 15957 11.5 0.019 0.019 2.398 2.405 dbcsr_mm_accdrv_process 5392 12.0 0.234 0.237 2.367 2.380 collocate_single_gaussian 1328 10.0 0.087 0.088 2.291 2.341 multiply_cannon_sync_h2d 2676 11.0 1.866 2.316 1.866 2.316 eigensolver 11 5.8 0.001 0.001 2.233 2.233 mp2_eri_2c_integrate_gpw_pot_l 1328 10.0 0.004 0.004 2.156 2.214 copy_dbcsr_to_fm 1351 8.0 0.031 0.031 2.145 2.178 replicate_iaK_2intgroup 1 6.0 2.017 2.025 2.156 2.164 fill_local_i_aL 884 7.5 2.120 2.132 2.120 2.132 jit_kernel_multiply 8 13.0 2.029 2.057 2.029 2.057 ------------------------------------------------------------------------------- Plot: name="RI-MP2_ammonia_timings_6cpu_1gpu", title="Timings of RI-MP2_ammonia with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="rest", label="rest", y=53.902, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="local_gemm", label="local_gemm", y=15.866, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=12.607, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.24, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=7.332, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="collocate_function", label="collocate_function", y=4.764, yerr=0.0 Running diag_cu144_broy.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/diag_cu144_broy_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.090 0.101 202.198 202.199 qs_energies 1 2.0 0.000 0.000 201.148 201.149 scf_env_do_scf 1 3.0 0.000 0.000 188.037 188.037 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 188.037 188.037 qs_ks_update_qs_env 15 5.0 0.000 0.000 92.762 92.822 rebuild_ks_matrix 15 6.0 0.000 0.000 92.568 92.628 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 92.568 92.628 qs_vxc_create 15 8.0 0.029 0.057 56.720 56.737 qs_scf_new_mos 15 5.0 0.000 0.000 53.856 53.873 fft_wrap_pw1pw2 1086 10.0 0.028 0.028 49.182 49.234 calculate_dispersion_nonloc 15 9.0 10.613 10.645 48.767 48.815 eigensolver 15 6.0 0.002 0.002 44.971 45.104 qs_rho_update_rho_low 16 5.0 0.000 0.000 39.668 39.668 calculate_rho_elec 16 6.0 0.178 0.179 39.668 39.668 sum_up_and_integrate 15 8.0 0.000 0.000 34.384 34.423 integrate_v_rspace 15 9.0 0.045 0.046 34.359 34.399 cp_fm_diag_elpa 15 7.0 0.000 0.000 28.860 28.864 cp_fm_diag_elpa_base 15 8.0 27.227 27.751 28.855 28.855 grid_collocate_task_list 16 7.0 28.599 28.653 28.599 28.653 grid_integrate_task_list 15 10.0 27.633 27.694 27.633 27.694 pw_gpu_c1dr3d_3d_ps 585 12.1 5.512 5.661 25.889 25.938 fft_wrap_pw1pw2_150 765 11.0 0.005 0.005 25.357 25.401 pw_gpu_r3dc1d_3d_ps 501 11.9 4.513 4.694 23.259 23.262 cp_fm_cholesky_restore 45 7.0 14.357 15.037 14.357 15.037 fft_wrap_pw1pw2_200 197 11.3 0.001 0.001 11.945 11.971 density_rs2pw 16 7.0 0.001 0.001 10.883 10.943 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 9.312 9.312 vdW_energy 15 10.0 9.099 9.124 9.099 9.124 pw_gpu_ffc 585 13.1 8.816 8.823 8.816 8.823 pw_gpu_cff 501 12.9 8.319 8.324 8.319 8.324 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 8.020 8.040 xc_vxc_pw_create 15 9.0 0.174 0.176 7.925 7.928 pw_gpu_sf 585 13.1 6.967 6.968 6.967 6.968 potential_pw2rs 15 10.0 0.007 0.007 6.681 6.700 mp_alltoall_z22v 1086 14.0 6.427 6.699 6.427 6.699 pw_gpu_fg 501 12.9 6.653 6.672 6.653 6.672 copy_dbcsr_to_fm 16 5.9 0.001 0.001 6.332 6.438 fft_wrap_pw1pw2_10 62 10.5 0.000 0.000 5.374 5.377 dbcsr_complete_redistribute 46 8.3 1.662 1.734 5.123 5.226 xc_rho_set_and_dset_create 15 10.0 0.131 0.132 4.681 4.711 x_to_yz 585 13.1 0.998 1.007 4.561 4.669 xc_pw_derive 90 11.0 0.001 0.001 4.591 4.614 build_core_ppnl 1 5.0 4.498 4.506 4.498 4.506 cp_fm_uplo_to_full 30 8.0 3.380 4.446 3.380 4.446 ------------------------------------------------------------------------------- Plot: name="diag_cu144_broy_timings_6cpu_1gpu", title="Timings of diag_cu144_broy with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="rest", label="rest", y=93.769, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=28.599, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=27.633, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=27.227, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=14.357, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="calculate_dispersion_nonloc", label="calculate_dispersion_nonloc", y=10.613, yerr=0.0 Running bench_dftb.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/bench_dftb_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.044 0.045 255.505 255.506 qs_energies 1 2.0 0.000 0.000 255.377 255.379 ls_scf 1 3.0 0.000 0.000 254.538 254.540 ls_scf_main 1 4.0 0.001 0.001 245.062 245.063 density_matrix_trs4 11 5.0 0.007 0.007 205.977 206.010 dbcsr_multiply_generic 185 6.1 0.305 0.306 167.931 167.939 multiply_cannon 185 7.1 1.927 2.077 115.180 115.430 multiply_cannon_loop 185 8.1 0.319 0.319 100.934 101.318 multiply_cannon_multrec 370 9.1 76.992 77.398 85.901 86.303 make_m2s 370 7.1 0.028 0.028 44.899 45.022 make_images 370 8.1 11.802 11.917 43.880 44.007 ls_scf_dm_to_ks 11 5.0 0.000 0.000 35.611 35.650 matrix_ls_to_qs 11 6.0 0.000 0.000 32.779 32.782 dbcsr_complete_redistribute 23 7.5 18.881 18.899 27.026 27.032 matrix_decluster 11 7.0 0.000 0.000 24.807 24.815 arnoldi_extremal 12 6.1 0.000 0.000 23.133 23.135 arnoldi_normal_ev 12 7.1 0.009 0.010 23.132 23.135 build_subspace 23 8.1 0.061 0.061 22.645 22.646 dbcsr_matrix_vector_mult 652 9.0 0.150 0.150 21.242 21.328 dbcsr_matrix_vector_mult_local 652 10.0 20.269 20.356 20.276 20.363 make_images_data 370 9.1 0.011 0.011 16.151 16.174 hybrid_alltoall_any 393 9.9 11.376 11.416 15.645 15.649 calculate_norms 740 9.1 14.148 14.164 14.148 14.164 dbcsr_finalize 559 7.6 0.200 0.204 13.580 13.660 dbcsr_merge_all 510 8.6 2.288 2.387 12.428 12.476 dbcsr_copy 761 7.5 1.675 1.683 9.666 9.679 setup_rec_index_2d 370 8.1 9.312 9.351 9.312 9.351 dbcsr_special_finalize 555 9.1 0.010 0.010 9.066 9.080 dbcsr_sort_indices 1283 10.0 8.422 8.429 8.422 8.429 dbcsr_add_d 280 6.0 0.001 0.001 8.083 8.194 dbcsr_add_anytype 280 7.0 3.674 3.688 8.081 8.192 ls_scf_init_scf 1 4.0 0.000 0.000 8.064 8.064 dbcsr_copy_into_existing 11 8.0 7.970 7.975 7.971 7.976 dbcsr_dot 144 6.3 7.406 7.413 7.862 7.927 ls_scf_init_matrix_S 1 5.0 0.000 0.000 7.640 7.645 dbcsr_mm_accdrv_process 14501 10.0 0.688 0.773 6.930 6.968 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.000 6.936 6.939 tree_to_linear_d 23 10.5 6.837 6.842 6.837 6.842 dbcsr_mm_accdrv_process_sort 14501 11.0 6.166 6.195 6.166 6.195 dbcsr_merge_single_wm 370 10.1 0.543 0.547 5.911 5.929 make_images_pack 370 9.1 5.418 5.566 5.432 5.579 ------------------------------------------------------------------------------- Plot: name="bench_dftb_timings_6cpu_1gpu", title="Timings of bench_dftb with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="rest", label="rest", y=113.41299999999998, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=76.992, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=20.269, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=18.881, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="calculate_norms", label="calculate_norms", y=14.148, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="make_images", label="make_images", y=11.802, yerr=0.0 Running dbcsr.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/dbcsr_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.003 0.004 46.422 46.422 lib_test 1 2.0 0.000 0.000 46.409 46.417 dbcsr_run_tests 3 3.0 0.000 0.000 46.409 46.416 test_multiplies_multiproc 3 4.0 0.001 0.001 35.879 35.980 dbcsr_multiply_generic 9 5.0 0.001 0.002 27.975 27.981 multiply_cannon 9 6.0 0.263 0.345 18.335 18.784 multiply_cannon_loop 9 7.0 0.003 0.003 16.962 17.296 multiply_cannon_multrec 18 8.0 8.970 9.280 15.709 16.018 dbcsr_make_random_matrix 9 4.0 7.140 7.183 10.396 10.499 dbcsr_finalize 27 5.7 0.001 0.001 7.175 7.329 dbcsr_merge_all 18 6.5 3.454 3.458 7.065 7.215 dbcsr_mm_accdrv_process 8199 9.0 1.197 1.408 6.508 6.511 dbcsr_redistribute 9 5.0 3.351 3.375 5.459 5.466 make_m2s 18 6.0 0.001 0.001 5.026 5.029 make_images 18 7.0 0.407 0.421 4.994 4.997 dbcsr_mm_accdrv_process_sort 8199 10.0 4.360 4.362 4.360 4.362 make_images_data 18 8.0 0.001 0.001 2.776 2.780 hybrid_alltoall_any 18 9.0 2.386 2.388 2.744 2.748 mp_alltoall_d11v 27 6.0 1.848 1.855 1.848 1.855 dbcsr_data_copy_aa2 18 7.5 1.702 1.853 1.702 1.853 tree_to_linear_d 9 7.0 1.782 1.787 1.782 1.787 dbcsr_data_release 507 7.7 1.304 1.315 1.304 1.315 jit_kernel_multiply 7 10.0 0.951 1.160 0.951 1.160 dbcsr_data_new 354 7.4 0.955 1.074 0.955 1.074 dbcsr_checksum 6 5.0 0.963 0.966 0.976 0.976 ------------------------------------------------------------------------------- Plot: name="dbcsr_timings_6cpu_1gpu", title="Timings of dbcsr with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="rest", label="rest", y=19.146999999999995, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=8.97, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=7.14, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_mm_accdrv_process_sort", label="dbcsr_mm_accdrv_process_sort", y=4.36, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.454, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_redistribute", label="dbcsr_redistribute", y=3.351, yerr=0.0 Running MQAE_single_node.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/MQAE_single_node_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.052 0.052 193.441 193.441 qs_mol_dyn_low 1 2.0 0.004 0.004 191.958 191.991 qs_forces 6 3.8 0.001 0.001 121.765 121.765 qs_energies 6 4.8 0.000 0.000 115.065 115.065 scf_env_do_scf 6 5.8 0.000 0.000 107.727 107.727 scf_env_do_scf_inner_loop 113 6.2 0.005 0.007 100.149 100.149 velocity_verlet 5 3.0 0.003 0.003 90.863 90.910 rebuild_ks_matrix 119 8.1 0.001 0.001 81.969 81.969 qs_ks_build_kohn_sham_matrix 119 9.1 0.018 0.018 81.968 81.968 qs_ks_update_qs_env 119 7.3 0.001 0.001 77.359 77.359 fft_wrap_pw1pw2 2059 12.4 0.046 0.047 64.480 64.540 fft_wrap_pw1pw2_150 1321 13.9 0.009 0.010 61.740 61.786 qs_vxc_create 119 10.1 0.002 0.002 52.467 52.468 xc_vxc_pw_create 119 11.1 1.472 1.472 52.465 52.466 qmmm_el_coupling 6 3.8 0.000 0.000 36.779 36.786 qmmm_elec_with_gaussian 6 4.8 0.019 0.019 36.772 36.780 xc_pw_derive 714 13.1 0.010 0.010 36.415 36.425 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 34.952 35.712 pw_gpu_c1dr3d_3d_ps 1095 14.8 10.189 10.231 34.749 34.822 qmmm_elec_gaussian_low_G 6 6.8 30.458 31.252 30.458 31.252 qmmm_forces 6 3.8 0.001 0.001 30.801 30.801 qmmm_forces_with_gaussian 6 4.8 0.023 0.023 29.920 30.461 pw_gpu_r3dc1d_3d_ps 964 14.0 8.942 8.945 29.673 29.687 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 28.683 29.207 xc_rho_set_and_dset_create 119 12.1 2.347 2.347 26.311 26.314 qmmm_forces_gaussian_low_G 6 6.8 23.972 24.494 23.972 24.494 xc_pw_divergence 119 12.1 0.005 0.005 24.279 24.291 qs_rho_update_rho_low 119 7.3 0.001 0.001 21.817 21.885 calculate_rho_elec 119 8.3 1.048 1.048 21.817 21.885 density_rs2pw 119 9.3 0.007 0.007 15.647 15.758 sum_up_and_integrate 119 10.1 0.002 0.002 13.704 13.713 integrate_v_rspace 119 11.1 0.021 0.021 13.536 13.545 dbcsr_multiply_generic 2598 12.3 0.089 0.091 13.270 13.364 mp_alltoall_z22v 2059 16.4 12.454 12.545 12.454 12.545 multiply_cannon 2598 13.3 0.202 0.203 11.757 11.797 multiply_cannon_loop 2598 14.3 0.232 0.234 11.317 11.353 potential_pw2rs 119 12.1 0.032 0.032 9.534 9.535 multiply_cannon_multrec 5196 15.3 3.985 4.018 9.231 9.263 x_to_yz 1095 15.8 2.238 2.265 9.112 9.175 qs_ks_ddapc 119 10.1 0.002 0.002 8.688 8.711 pw_gpu_sf 1095 15.8 8.689 8.691 8.689 8.691 pw_gpu_fg 964 15.0 8.275 8.341 8.275 8.341 init_scf_loop 6 6.8 0.000 0.000 7.576 7.576 yz_to_x 964 15.0 1.781 1.781 7.361 7.416 qs_scf_new_mos 113 7.2 0.001 0.001 7.073 7.074 qs_scf_loop_do_ot 113 8.2 0.001 0.001 7.073 7.073 pw_gpu_ffc 1095 15.8 6.742 6.796 6.742 6.796 ot_scf_mini 113 9.2 0.002 0.002 6.794 6.794 init_scf_run 6 5.8 0.000 0.000 5.280 5.280 scf_env_initial_rho_setup 6 6.8 0.000 0.000 5.279 5.279 dbcsr_mm_accdrv_process 13992 16.0 0.521 0.522 5.181 5.184 grid_collocate_task_list 119 9.3 5.087 5.118 5.087 5.118 pw_gpu_cff 964 15.0 5.030 5.031 5.030 5.031 xc_functional_eval 238 13.1 0.003 0.003 5.010 5.025 ot_mini 113 10.2 0.001 0.001 4.834 4.835 pw_poisson_solve 125 9.9 0.003 0.003 4.728 4.739 qmmm_forces_gaussian_low_R 6 6.8 0.000 0.000 4.711 4.712 qmmm_forces_with_gaussian_LG 6 7.8 4.711 4.712 4.711 4.712 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 4.639 4.639 jit_kernel_multiply 24 14.7 4.623 4.625 4.623 4.625 qmmm_elec_gaussian_low_R 6 6.8 0.000 0.000 4.494 4.528 qmmm_elec_with_gaussian_LG 6 7.8 4.494 4.528 4.494 4.528 pw_derive 1089 13.4 4.162 4.184 4.162 4.184 grid_integrate_task_list 119 12.1 3.980 3.990 3.980 3.990 qs_ot_get_derivative 113 11.2 0.001 0.001 3.954 3.956 ------------------------------------------------------------------------------- Plot: name="MQAE_single_node_timings_6cpu_1gpu", title="Timings of MQAE_single_node with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="rest", label="rest", y=107.426, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=30.458, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=23.972, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=12.454, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.189, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=8.942, yerr=0.0 Summary: Performance test took 22 minutes. Status: OK ---> Removed intermediate container 58788c242ef4 ---> 8b3454ea29fc Step 48/49 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in 518c46d28ff5 ---> Removed intermediate container 518c46d28ff5 ---> e66bcea7c67b Step 49/49 : ENTRYPOINT [] ---> Running in afcd7b2b19f3 ---> Removed intermediate container afcd7b2b19f3 ---> 876a4b84894b [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built 876a4b84894b Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2025-12-10 06:47:27+00:00