StartDate: 2025-12-24 06:06:08+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: 5ec5ad8e02f6cf24dd7a076f355da785f25d5224 CommitTime: 2025-12-23 15:58:32 +0100 CommitAuthor: Matthias Krack CommitSubject: Add spack ssmp tester #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=5ec5ad8e02f6cf24dd7a076f355da785f25d5224 SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 408.2MB Step 1/49 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 9c3d619183d2: Waiting 7f7602a82106: Waiting 6e8af4fd0a07: Waiting 5a2aba542b08: Waiting 6cb9b761b877: Waiting 15a17189b2df: Waiting 02cb0e091e33: Waiting 2cd52cbb1ebe: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 2cd52cbb1ebe: Download complete 6e8af4fd0a07: Download complete 02cb0e091e33: Verifying Checksum 02cb0e091e33: Download complete 9c3d619183d2: Download complete 7f7602a82106: Verifying Checksum 7f7602a82106: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/49 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/49 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/49 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/49 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/49 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/49 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> f843eeab6072 Step 8/49 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> 896c2903221b Step 9/49 : RUN mkdir scripts ---> Using cache ---> ced8e2638937 Step 10/49 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./scripts/ ---> Using cache ---> 907a94a49441 Step 11/49 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> 51152901f729 Step 12/49 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --gpu-ver=V100 --with-tblite=no --dry-run ---> Using cache ---> 6d5d0564da7b Step 13/49 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> aa13859b90ee Step 14/49 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> f399f0adc829 Step 15/49 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> a6478594ea04 Step 16/49 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> eb43c9de2570 Step 17/49 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> d046380ef7ed Step 18/49 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> f0643be0747a Step 19/49 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> 3102bf40aa38 Step 20/49 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> 3edb9ed667ae Step 21/49 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> 614a418610eb Step 22/49 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> eeaece1c55b5 Step 23/49 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> Using cache ---> b0cb16315bd3 Step 24/49 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Using cache ---> 947df88a4e32 Step 25/49 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> Using cache ---> 61dbc94f00e8 Step 26/49 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Using cache ---> 134b41fc916e Step 27/49 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> Using cache ---> 26d355af8642 Step 28/49 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Using cache ---> 8491c3265028 Step 29/49 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> Using cache ---> 5719b9994622 Step 30/49 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Using cache ---> dd921543cdc6 Step 31/49 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> Using cache ---> a68c7ee4dad2 Step 32/49 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Using cache ---> 2bc15b377471 Step 33/49 : COPY ./tools/toolchain/scripts/arch_base.tmpl ./tools/toolchain/scripts/generate_arch_files.sh ./scripts/ ---> Using cache ---> e2bf4b00aa4a Step 34/49 : RUN ./scripts/generate_arch_files.sh && rm -rf ./build ---> Using cache ---> 4c4d6965faff Step 35/49 : WORKDIR /opt/cp2k ---> Using cache ---> 1d7607f5944f Step 36/49 : COPY ./src ./src ---> Using cache ---> 328b9e11aeca Step 37/49 : COPY ./data ./data ---> Using cache ---> e0a704fa9604 Step 38/49 : COPY ./tests ./tests ---> Using cache ---> 0798cd8c9972 Step 39/49 : COPY ./tools/build_utils ./tools/build_utils ---> Using cache ---> 60c990c9d14c Step 40/49 : COPY ./cmake ./cmake ---> Using cache ---> 360ed8577f1b Step 41/49 : COPY ./CMakeLists.txt . ---> Using cache ---> 63dc785c6004 Step 42/49 : COPY ./tools/docker/scripts/build_cp2k_cmake.sh . ---> Using cache ---> 015a46160e21 Step 43/49 : RUN ./build_cp2k_cmake.sh toolchain_cuda_V100 psmp ---> Using cache ---> 93549ef2de70 Step 44/49 : COPY ./benchmarks ./benchmarks ---> Using cache ---> 99cf2e1c0018 Step 45/49 : COPY ./tools/regtesting ./tools/regtesting ---> Using cache ---> e3202923ef0d Step 46/49 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> Using cache ---> be7f0ab2f84a Step 47/49 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Using cache ---> 79d83ae1b5d3 Step 48/49 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Using cache ---> a7b66feb34d6 Step 49/49 : ENTRYPOINT [] ---> Using cache ---> 6c4c944b11eb [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built 6c4c944b11eb Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma xsmm dbcsr_acc sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 77 input files and 0 errors. ========== Running Performance Test ========== Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.027 0.027 100.004 100.004 qs_mol_dyn_low 1 2.0 0.004 0.004 99.601 99.604 qs_forces 11 3.9 0.002 0.002 99.555 99.555 qs_energies 11 4.9 0.001 0.001 88.717 88.718 scf_env_do_scf 11 5.9 0.001 0.001 68.921 68.921 velocity_verlet 10 3.0 0.001 0.002 63.500 63.517 scf_env_do_scf_inner_loop 108 6.5 0.006 0.008 59.092 59.092 rebuild_ks_matrix 119 8.3 0.001 0.001 26.158 26.159 qs_ks_build_kohn_sham_matrix 119 9.3 0.018 0.018 26.157 26.159 dbcsr_multiply_generic 2286 12.5 0.132 0.133 24.842 24.912 qs_ks_update_qs_env 119 7.6 0.001 0.001 23.953 23.954 qs_scf_new_mos 108 7.5 0.001 0.001 19.976 19.993 qs_scf_loop_do_ot 108 8.5 0.001 0.001 19.975 19.993 qs_rho_update_rho_low 119 7.7 0.001 0.001 19.802 19.805 calculate_rho_elec 119 8.7 0.838 0.839 19.802 19.805 ot_scf_mini 108 9.5 0.003 0.003 18.081 18.081 fft_wrap_pw1pw2 1201 11.6 0.022 0.023 15.444 15.471 sum_up_and_integrate 119 10.3 0.002 0.002 13.717 13.765 integrate_v_rspace 119 11.3 0.346 0.349 13.629 13.677 fft_wrap_pw1pw2_140 487 12.2 0.003 0.003 13.267 13.311 multiply_cannon 2286 13.5 0.313 0.316 12.255 12.291 multiply_cannon_loop 2286 14.5 0.245 0.246 11.260 11.281 make_m2s 4572 13.5 0.040 0.040 11.008 11.021 make_images 4572 14.5 1.474 1.479 10.841 10.854 ot_mini 108 10.5 0.001 0.001 10.492 10.493 init_scf_run 11 5.9 0.000 0.000 10.035 10.035 density_rs2pw 119 9.7 0.007 0.007 9.952 10.034 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.034 10.034 init_scf_loop 11 6.9 0.000 0.000 9.759 9.760 grid_collocate_task_list 119 9.7 8.980 9.037 8.980 9.037 pw_gpu_r3dc1d_3d_ps 606 13.1 2.268 2.288 7.907 7.908 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 7.798 7.798 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.425 7.575 pw_gpu_c1dr3d_3d_ps 595 14.2 2.220 2.236 7.509 7.535 grid_integrate_task_list 119 12.3 7.381 7.432 7.381 7.432 wfi_extrapolate 11 7.9 0.001 0.001 7.262 7.262 prepare_preconditioner 11 7.9 0.000 0.000 6.585 6.587 make_preconditioner 11 8.9 0.000 0.000 6.585 6.587 qs_ot_get_derivative 108 11.5 0.001 0.002 6.314 6.316 multiply_cannon_multrec 4572 15.5 2.176 2.191 6.160 6.184 hybrid_alltoall_any 4725 16.4 4.742 4.749 6.057 6.060 make_images_data 4572 15.5 0.049 0.049 5.943 5.946 potential_pw2rs 119 12.3 0.035 0.036 5.901 5.902 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.537 5.774 parallel_gemm_fm_cosma 81 9.0 5.284 5.284 5.284 5.284 ot_diis_step 108 11.5 0.005 0.005 4.153 4.153 build_core_ppl_forces 11 5.9 3.813 3.934 3.813 3.934 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.802 3.808 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.640 3.690 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.635 3.636 apply_single 119 13.6 0.001 0.001 3.634 3.636 dbcsr_mm_accdrv_process 9594 16.2 0.890 0.892 3.590 3.595 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.259 3.260 dbcsr_complete_redistribute 329 12.2 1.202 1.203 2.996 3.246 calculate_dm_sparse 119 9.5 0.001 0.001 3.181 3.201 multiply_cannon_sync_h2d 4572 15.5 3.146 3.155 3.146 3.155 mp_alltoall_z22v 1201 15.6 3.029 3.100 3.029 3.100 qs_create_task_list 11 7.9 0.000 0.000 3.000 3.052 generate_qs_task_list 11 8.9 1.106 1.125 2.999 3.052 qs_ot_get_p 119 10.4 0.001 0.001 3.038 3.038 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.738 2.739 mp_waitall_1 64495 16.9 2.660 2.666 2.660 2.666 pw_poisson_solve 119 10.3 0.003 0.003 2.647 2.651 transfer_rs2pw 487 10.6 0.008 0.008 2.329 2.442 calculate_first_density_matrix 1 7.0 0.000 0.000 2.344 2.344 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.315 2.328 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.283 2.283 pw_gpu_fg 606 14.1 2.245 2.256 2.245 2.256 jit_kernel_multiply 10 15.6 2.126 2.129 2.126 2.129 qs_ot_get_derivative_taylor 59 13.0 0.002 0.002 2.092 2.093 transfer_rs2pw_140 130 11.5 1.468 1.474 1.932 2.049 yz_to_x 606 14.1 0.458 0.460 1.994 2.028 dbcsr_special_finalize 6858 15.5 0.039 0.040 2.015 2.019 cp_fm_cholesky_invert 11 10.9 2.013 2.013 2.013 2.013 ------------------------------------------------------------------------------- Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=69.804, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.98, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.381, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.284, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.742, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.813, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.026 0.028 96.114 96.114 qs_mol_dyn_low 1 2.0 0.004 0.004 95.689 95.691 qs_forces 11 3.9 0.002 0.002 95.646 95.647 qs_energies 11 4.9 0.001 0.001 84.604 84.605 scf_env_do_scf 11 5.9 0.001 0.001 64.270 64.270 velocity_verlet 10 3.0 0.001 0.002 62.169 62.184 scf_env_do_scf_inner_loop 96 6.5 0.005 0.007 54.018 54.019 rebuild_ks_matrix 107 8.3 0.001 0.001 25.541 25.543 qs_ks_build_kohn_sham_matrix 107 9.3 0.017 0.017 25.541 25.542 qs_ks_update_qs_env 107 7.6 0.001 0.001 23.021 23.021 dbcsr_multiply_generic 1966 12.4 0.118 0.119 22.557 22.650 qs_rho_update_rho_low 107 7.7 0.001 0.001 18.148 18.164 calculate_rho_elec 107 8.7 0.761 0.765 18.147 18.163 qs_scf_new_mos 96 7.5 0.001 0.001 17.696 17.696 qs_scf_loop_do_ot 96 8.5 0.001 0.001 17.695 17.695 ot_scf_mini 96 9.5 0.002 0.002 16.027 16.029 sum_up_and_integrate 107 10.3 0.002 0.002 14.248 14.336 integrate_v_rspace 107 11.3 0.311 0.312 14.168 14.256 fft_wrap_pw1pw2 1081 11.6 0.020 0.020 14.087 14.127 fft_wrap_pw1pw2_140 439 12.2 0.003 0.003 12.096 12.149 multiply_cannon 1966 13.4 0.265 0.267 11.169 11.181 multiply_cannon_loop 1966 14.4 0.217 0.218 10.318 10.324 init_scf_run 11 5.9 0.000 0.000 10.181 10.182 init_scf_loop 11 6.9 0.000 0.000 10.181 10.181 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.181 10.181 make_m2s 3932 13.4 0.035 0.036 9.939 9.945 make_images 3932 14.4 1.338 1.369 9.789 9.794 ot_mini 96 10.5 0.001 0.001 9.334 9.336 density_rs2pw 107 9.7 0.007 0.007 9.110 9.286 grid_integrate_task_list 107 12.3 8.469 8.559 8.469 8.559 grid_collocate_task_list 107 9.7 8.247 8.386 8.247 8.386 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.171 8.171 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.454 7.561 wfi_extrapolate 11 7.9 0.001 0.001 7.371 7.371 pw_gpu_r3dc1d_3d_ps 546 13.1 2.061 2.083 7.248 7.251 prepare_preconditioner 11 7.9 0.000 0.000 6.881 6.886 make_preconditioner 11 8.9 0.000 0.000 6.881 6.886 pw_gpu_c1dr3d_3d_ps 535 14.2 1.999 2.019 6.813 6.851 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.805 6.045 multiply_cannon_multrec 3932 15.4 1.883 1.900 5.748 5.788 qs_ot_get_derivative 96 11.5 0.001 0.001 5.644 5.644 hybrid_alltoall_any 4079 16.3 4.259 4.264 5.489 5.527 parallel_gemm_fm_cosma 81 9.0 5.414 5.415 5.414 5.415 potential_pw2rs 107 12.3 0.032 0.032 5.388 5.388 make_images_data 3932 15.4 0.044 0.044 5.341 5.365 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.031 4.209 build_core_ppl_forces 11 5.9 3.835 3.941 3.835 3.941 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.705 3.757 ot_diis_step 96 11.5 0.005 0.005 3.669 3.669 dbcsr_mm_accdrv_process 8450 16.1 1.060 1.262 3.516 3.534 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.478 3.478 dbcsr_complete_redistribute 317 12.2 1.207 1.224 3.162 3.409 qs_create_task_list 11 7.9 0.000 0.000 3.214 3.344 generate_qs_task_list 11 8.9 1.391 1.396 3.213 3.344 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.264 3.266 apply_single 107 13.6 0.001 0.001 3.264 3.266 calculate_dm_sparse 107 9.5 0.001 0.001 2.959 2.961 multiply_cannon_sync_h2d 3932 15.4 2.820 2.868 2.820 2.868 mp_alltoall_z22v 1081 15.6 2.765 2.848 2.765 2.848 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.732 2.733 qs_ot_get_p 107 10.4 0.001 0.001 2.639 2.640 copy_dbcsr_to_fm 147 11.2 0.003 0.004 2.503 2.520 mp_waitall_1 55487 16.8 2.388 2.419 2.388 2.419 transfer_rs2pw 439 10.6 0.007 0.007 2.210 2.419 pw_poisson_solve 107 10.3 0.002 0.002 2.401 2.402 calculate_first_density_matrix 1 7.0 0.000 0.000 2.376 2.376 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.273 2.274 jit_kernel_multiply 10 15.3 1.934 2.115 1.934 2.115 pw_gpu_fg 546 14.1 2.072 2.085 2.072 2.085 transfer_rs2pw_140 118 11.5 1.341 1.354 1.852 2.066 cp_fm_cholesky_invert 11 10.9 2.054 2.054 2.054 2.054 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.014 2.030 build_core_ppl 11 7.9 1.912 1.950 1.912 1.950 ------------------------------------------------------------------------------- Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=65.89, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.469, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.247, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.414, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.259, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.835, yerr=0.0 Running GW_PBE_4benzene.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.017 0.018 163.447 163.449 qs_energies 1 2.0 0.000 0.000 163.144 163.144 mp2_main 1 3.0 0.000 0.000 156.492 156.492 mp2_gpw_main 1 4.0 0.000 0.000 154.704 154.704 rpa_ri_compute_en 1 5.0 0.000 0.000 145.145 145.145 rpa_num_int 1 6.0 0.000 0.001 145.136 145.136 compute_mat_P_omega 1 7.0 0.001 0.002 66.795 66.795 dbt_total 2336 9.6 0.020 0.021 66.184 66.185 compute_mat_P_omega_contract 10 8.0 5.040 5.059 66.147 66.153 parallel_gemm_fm_cosma 105 8.4 65.366 65.370 65.366 65.370 dbt_contract 787 11.0 0.046 0.046 44.703 44.704 compute_W_cubic_GW 10 7.0 0.003 0.004 42.527 42.530 dbt_tas_total 1149 12.2 0.128 0.129 34.951 34.951 dbt_tas_multiply 807 12.1 0.002 0.002 34.289 34.290 dbt_tas_dbm 807 14.1 0.005 0.005 27.051 27.052 dbm_multiply 807 16.1 25.872 26.222 25.872 26.222 compute_mat_P_omega_calc_M_occ 250 9.0 5.014 5.039 23.460 23.460 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 22.074 22.074 dbt_copy 1107 10.7 0.069 0.070 21.774 21.903 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 21.772 21.773 dbt_tas_mm_1N 524 15.1 0.003 0.003 17.453 17.854 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 14.850 14.850 dbt_reshape 594 11.8 6.336 6.431 14.022 14.092 compute_QP_energies 1 7.0 0.000 0.000 11.498 11.498 compute_self_energy_cubic_gw 1 8.0 0.115 0.116 11.498 11.498 dbt_tas_reserve_blocks_index 3266 14.3 0.638 0.645 10.463 10.471 dbm_reserve_blocks 3634 15.3 10.151 10.152 10.151 10.152 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 9.549 9.549 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 8.812 8.812 dbt_reserve_blocks_index 2347 13.0 0.317 0.322 8.701 8.737 dbt_crop 1042 12.0 6.461 6.493 8.681 8.730 dbt_reserve_blocks_index_array 2289 12.1 0.011 0.012 8.508 8.522 dbt_tas_mm_2 251 15.0 0.002 0.002 7.544 7.544 scf_env_do_scf 1 3.0 0.000 0.000 6.126 6.126 scf_env_do_scf_inner_loop 17 4.0 0.001 0.001 6.126 6.126 mp_waitall_2 2656 15.9 5.656 5.685 5.656 5.685 get_2c_integrals 1 6.0 0.000 0.000 5.520 5.520 contract_cubic_gw 21 9.0 0.000 0.000 5.260 5.260 dbt_communicate_buffer 594 12.8 0.011 0.012 5.169 5.195 dbcsr_multiply_generic 30 8.1 0.003 0.003 5.025 5.064 multiply_cannon 30 9.1 0.004 0.004 4.835 4.872 multiply_cannon_loop 30 10.1 0.004 0.004 4.784 4.821 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 4.762 4.769 compute_mat_P_omega_copy_M_occ 250 9.0 0.001 0.001 4.709 4.716 dbt_tas_copy 511 11.5 2.463 2.498 4.348 4.402 multiply_cannon_multrec 60 11.1 0.166 0.176 4.223 4.227 dbcsr_mm_accdrv_process 328 12.3 0.039 0.039 3.873 3.884 jit_kernel_multiply 18 11.7 3.828 3.838 3.828 3.838 qs_scf_new_mos 17 5.0 0.000 0.000 3.344 3.392 ------------------------------------------------------------------------------- Plot: name="GW_PBE_4benzene_timings_6cpu_1gpu", title="Timings of GW_PBE_4benzene with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="rest", label="rest", y=49.260999999999996, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=65.366, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=25.872, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=10.151, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_crop", label="dbt_crop", y=6.461, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=6.336, yerr=0.0 Running RI-HFX_H2O-32.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.021 0.022 187.972 187.972 qs_forces 1 2.0 0.000 0.000 187.560 187.560 rebuild_ks_matrix 7 6.6 0.000 0.000 183.366 183.366 qs_ks_build_kohn_sham_matrix 7 7.6 0.001 0.002 183.366 183.366 hfx_ks_matrix 7 8.6 0.000 0.000 179.713 179.721 dbt_total 849 11.0 0.008 0.009 134.498 134.498 hfx_ri_update_ks 7 9.6 0.000 0.000 101.156 101.156 hfx_ri_update_ks_Pmat 7 10.6 20.674 20.704 101.151 101.151 qs_energies 1 3.0 0.000 0.000 96.600 96.600 scf_env_do_scf 1 4.0 0.000 0.000 94.564 94.564 qs_ks_update_qs_env 8 6.0 0.000 0.000 92.450 92.450 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 90.922 90.922 dbt_contract 207 12.4 0.047 0.048 78.659 78.659 hfx_ri_update_forces 1 7.0 1.020 1.034 78.555 78.563 dbt_tas_total 369 13.4 0.071 0.071 64.913 64.913 dbt_tas_multiply 216 13.5 0.001 0.001 62.286 62.286 dbt_copy 423 11.8 0.045 0.046 51.387 51.575 scf_env_do_scf_inner_loop 6 5.0 0.000 0.000 51.204 51.204 dbt_tas_dbm 216 15.5 0.002 0.002 49.633 49.633 dbm_multiply 216 17.5 46.736 46.759 46.736 46.759 hfx_ri_forces_Pmat_3c 1 8.0 3.288 3.299 45.966 45.969 init_scf_loop 2 5.0 0.000 0.000 43.358 43.358 dbt_reshape 175 13.2 17.593 17.663 38.377 38.562 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 29.603 29.603 precalc_derivatives 1 8.0 1.820 1.853 26.989 26.989 dbt_tas_mm_2 91 16.5 0.001 0.001 20.594 20.594 mp_waitall_2 1022 16.5 18.153 18.253 18.153 18.253 dbt_tas_reserve_blocks_index 1323 15.4 1.644 1.654 17.703 17.732 dbm_reserve_blocks 1491 16.3 16.751 16.788 16.751 16.788 dbt_crop 372 13.7 12.539 12.554 16.228 16.291 dbt_tas_mm_3T 77 17.1 0.001 0.001 15.234 15.357 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 15.188 15.188 dbt_communicate_buffer 175 14.2 0.004 0.004 15.035 15.087 build_3c_derivatives 3 9.0 2.226 2.323 14.548 14.548 dbt_reserve_blocks_index 889 14.5 0.609 0.616 14.365 14.454 dbt_reserve_blocks_index_array 859 13.5 0.007 0.007 14.105 14.183 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 13.888 13.888 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 13.448 13.448 dbt_tas_mm_3N 37 15.4 0.000 0.000 11.477 11.637 dbt_tas_copy 248 12.5 4.136 4.182 7.819 7.888 mp_sync 2901 12.8 6.133 6.391 6.133 6.391 hfx_ri_pre_scf_Pmat_int 1 13.0 0.000 0.000 4.890 4.890 dbt_tas_replicate 168 15.1 2.168 2.204 4.545 4.559 hfx_ri_pre_scf_calc_tensors 1 14.0 0.003 0.003 4.221 4.221 hfx_ri_pre_scf_Pmat_copy_2 9 13.0 1.614 1.616 4.079 4.082 dbt_tas_reserve_blocks_templat 266 13.6 0.102 0.102 3.735 3.845 hfx_ri_pre_scf_Pmat_RIx3C 9 13.0 0.000 0.000 3.728 3.761 ------------------------------------------------------------------------------- Plot: name="RI-HFX_H2O-32_timings_6cpu_1gpu", title="Timings of RI-HFX_H2O-32 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="rest", label="rest", y=68.06500000000001, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=46.736, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=20.674, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=18.153, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=17.593, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=16.751, yerr=0.0 Running RI-MP2_ammonia.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.009 0.010 103.879 103.879 qs_energies 1 2.0 0.000 0.000 103.706 103.706 mp2_main 1 3.0 0.000 0.001 96.258 96.258 mp2_gpw_main 1 4.0 0.001 0.001 95.881 95.881 mp2_ri_gpw_compute_in 1 5.0 0.542 0.545 53.720 53.722 mp2_ri_gpw_compute_in_loop 1 6.0 0.012 0.012 45.517 45.517 mp2_ri_gpw_compute_en 1 5.0 0.088 0.089 42.101 42.103 mp2_ri_gpw_compute_en_RI_loop 1 6.0 12.609 12.618 39.575 39.576 dbcsr_multiply_generic 2666 8.0 0.147 0.147 22.833 22.987 ao_to_mo_and_store_B_mult_1 1328 7.0 0.012 0.012 21.551 21.704 mp2_eri_3c_integrate_gpw 1328 7.0 0.016 0.017 18.519 18.748 mp2_ri_gpw_compute_en_expansio 1040 7.0 0.701 0.702 16.637 16.641 local_gemm 1040 8.0 15.936 15.939 15.936 15.939 make_m2s 5332 9.0 0.048 0.049 12.987 13.323 make_images 5332 10.0 3.268 3.303 12.817 13.152 integrate_v_rspace 1338 8.0 1.039 1.058 10.394 10.796 multiply_cannon 2666 9.0 0.356 0.375 9.221 9.706 multiply_cannon_loop 2666 10.0 0.178 0.178 8.144 8.536 grid_integrate_task_list 1338 9.0 8.067 8.474 8.067 8.474 hybrid_alltoall_any 6683 11.6 7.710 8.004 7.961 8.255 make_images_data 5332 11.0 0.058 0.059 7.868 8.165 fft_wrap_pw1pw2 26668 10.4 0.132 0.132 8.053 8.126 get_2c_integrals 1 6.0 0.004 0.004 7.658 7.662 collocate_function 1328 8.0 4.772 4.851 7.015 7.180 compute_2c_integrals 1 7.0 0.006 0.008 7.082 7.082 compute_2c_integrals_loop_lm 1 8.0 0.013 0.021 6.962 6.984 mp2_eri_2c_integrate_gpw 1 9.0 1.979 1.996 6.948 6.963 scf_env_do_scf 1 3.0 0.000 0.000 6.587 6.588 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 6.586 6.588 ao_to_mo_and_store_B_E_Ex_1 1328 7.0 3.374 3.417 5.168 5.234 qs_scf_new_mos 10 5.0 0.000 0.000 5.072 5.084 mp2_ri_gpw_compute_en_ener 1040 7.0 4.733 4.757 4.733 4.757 fft_wrap_pw1pw2_20 10647 11.4 0.022 0.022 4.675 4.701 mp2_ri_gpw_compute_en_comm 221 7.0 0.999 1.001 4.450 4.486 multiply_cannon_multrec 2676 11.0 1.787 1.876 4.377 4.448 pw_gpu_r3dc1d_3d 13282 12.2 4.141 4.187 4.141 4.187 eigensolver 11 5.8 0.001 0.001 2.941 2.944 pw_gpu_c1dr3d_3d 13280 12.7 2.763 2.789 2.763 2.789 potential_pw2rs 2666 10.0 0.094 0.095 2.740 2.766 fft_wrap_pw1pw2_10 15957 11.5 0.019 0.020 2.504 2.550 mp_sendrecv_dm3 442 8.0 2.428 2.456 2.428 2.456 collocate_single_gaussian 1328 10.0 0.090 0.091 2.370 2.410 dbcsr_mm_accdrv_process 5392 12.0 0.238 0.242 2.357 2.379 cp_fm_diag_elpa 11 6.8 0.000 0.000 2.373 2.374 cp_fm_diag_elpa_base 11 7.8 2.295 2.310 2.372 2.372 mp2_eri_2c_integrate_gpw_pot_l 1328 10.0 0.004 0.004 2.232 2.273 copy_dbcsr_to_fm 1351 8.0 0.032 0.032 2.172 2.206 fill_local_i_aL 884 7.5 2.167 2.175 2.167 2.175 replicate_iaK_2intgroup 1 6.0 2.027 2.028 2.167 2.168 ------------------------------------------------------------------------------- Plot: name="RI-MP2_ammonia_timings_6cpu_1gpu", title="Timings of RI-MP2_ammonia with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="rest", label="rest", y=54.785000000000004, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="local_gemm", label="local_gemm", y=15.936, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=12.609, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.067, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=7.71, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="collocate_function", label="collocate_function", y=4.772, yerr=0.0 Running diag_cu144_broy.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/diag_cu144_broy_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.125 0.170 202.041 202.041 qs_energies 1 2.0 0.000 0.000 200.910 200.910 scf_env_do_scf 1 3.0 0.000 0.000 187.538 187.538 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 187.538 187.538 qs_ks_update_qs_env 15 5.0 0.000 0.000 93.613 93.647 rebuild_ks_matrix 15 6.0 0.000 0.000 93.417 93.451 qs_ks_build_kohn_sham_matrix 15 7.0 0.002 0.003 93.416 93.451 qs_vxc_create 15 8.0 0.026 0.051 57.207 57.226 qs_scf_new_mos 15 5.0 0.000 0.000 52.307 52.378 fft_wrap_pw1pw2 1086 10.0 0.029 0.030 49.366 49.497 calculate_dispersion_nonloc 15 9.0 10.797 10.872 49.220 49.263 eigensolver 15 6.0 0.002 0.002 43.261 43.421 qs_rho_update_rho_low 16 5.0 0.000 0.000 39.883 39.884 calculate_rho_elec 16 6.0 0.176 0.177 39.883 39.884 sum_up_and_integrate 15 8.0 0.000 0.000 34.715 34.767 integrate_v_rspace 15 9.0 0.047 0.047 34.691 34.742 grid_collocate_task_list 16 7.0 28.870 28.901 28.870 28.901 grid_integrate_task_list 15 10.0 27.991 28.026 27.991 28.026 cp_fm_diag_elpa 15 7.0 0.000 0.000 27.302 27.306 cp_fm_diag_elpa_base 15 8.0 25.671 26.178 27.297 27.298 pw_gpu_c1dr3d_3d_ps 585 12.1 5.481 5.560 25.952 26.014 fft_wrap_pw1pw2_150 765 11.0 0.005 0.005 25.531 25.640 pw_gpu_r3dc1d_3d_ps 501 11.9 4.577 4.820 23.379 23.450 cp_fm_cholesky_restore 45 7.0 14.185 14.838 14.185 14.838 fft_wrap_pw1pw2_200 197 11.3 0.001 0.001 12.094 12.151 density_rs2pw 16 7.0 0.001 0.001 10.823 10.857 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 9.538 9.538 vdW_energy 15 10.0 9.132 9.147 9.132 9.147 pw_gpu_ffc 585 13.1 8.951 8.983 8.951 8.983 pw_gpu_cff 501 12.9 8.278 8.319 8.278 8.319 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 8.211 8.238 xc_vxc_pw_create 15 9.0 0.174 0.177 7.962 7.963 pw_gpu_sf 585 13.1 6.925 6.949 6.925 6.949 pw_gpu_fg 501 12.9 6.712 6.716 6.712 6.716 potential_pw2rs 15 10.0 0.007 0.007 6.653 6.670 mp_alltoall_z22v 1086 14.0 6.430 6.608 6.430 6.608 copy_dbcsr_to_fm 16 5.9 0.001 0.001 6.345 6.388 dbcsr_complete_redistribute 46 8.3 1.683 1.785 5.356 5.492 fft_wrap_pw1pw2_10 62 10.5 0.000 0.000 5.273 5.278 build_core_ppnl 1 5.0 4.699 4.700 4.699 4.700 xc_pw_derive 90 11.0 0.001 0.001 4.652 4.662 xc_rho_set_and_dset_create 15 10.0 0.129 0.129 4.629 4.645 x_to_yz 585 13.1 1.013 1.027 4.562 4.588 cp_fm_uplo_to_full 30 8.0 3.397 4.392 3.397 4.392 ------------------------------------------------------------------------------- Plot: name="diag_cu144_broy_timings_6cpu_1gpu", title="Timings of diag_cu144_broy with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="rest", label="rest", y=94.527, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=28.87, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=27.991, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=25.671, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=14.185, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="calculate_dispersion_nonloc", label="calculate_dispersion_nonloc", y=10.797, yerr=0.0 Running bench_dftb.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/bench_dftb_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.044 0.045 266.268 266.268 qs_energies 1 2.0 0.000 0.000 266.147 266.147 ls_scf 1 3.0 0.000 0.000 265.329 265.329 ls_scf_main 1 4.0 0.001 0.001 255.535 255.535 density_matrix_trs4 11 5.0 0.007 0.007 214.741 214.772 dbcsr_multiply_generic 185 6.1 0.305 0.307 173.521 173.616 multiply_cannon 185 7.1 2.174 2.314 118.003 118.565 multiply_cannon_loop 185 8.1 0.325 0.328 103.941 104.807 multiply_cannon_multrec 370 9.1 78.797 79.605 87.822 88.570 make_m2s 370 7.1 0.027 0.028 47.087 47.132 make_images 370 8.1 12.630 12.687 46.052 46.097 ls_scf_dm_to_ks 11 5.0 0.000 0.000 36.953 36.961 matrix_ls_to_qs 11 6.0 0.000 0.000 33.685 34.206 dbcsr_complete_redistribute 23 7.5 19.721 20.207 28.078 28.334 matrix_decluster 11 7.0 0.000 0.000 25.653 25.904 arnoldi_extremal 12 6.1 0.000 0.000 24.816 24.822 arnoldi_normal_ev 12 7.1 0.009 0.009 24.816 24.821 build_subspace 23 8.1 0.060 0.061 24.290 24.291 dbcsr_matrix_vector_mult 652 9.0 0.155 0.163 22.254 23.593 dbcsr_matrix_vector_mult_local 652 10.0 21.202 22.532 21.209 22.539 make_images_data 370 9.1 0.012 0.012 16.751 16.900 hybrid_alltoall_any 393 9.9 11.576 11.694 16.267 16.390 calculate_norms 740 9.1 15.197 15.276 15.197 15.276 dbcsr_finalize 559 7.6 0.221 0.227 13.946 14.251 dbcsr_merge_all 510 8.6 2.533 2.832 12.735 13.053 dbcsr_copy 761 7.5 1.615 1.673 9.666 9.879 dbcsr_dot 144 6.3 7.761 8.007 8.880 9.408 dbcsr_special_finalize 555 9.1 0.011 0.011 9.037 9.055 setup_rec_index_2d 370 8.1 8.964 9.011 8.964 9.011 dbcsr_add_d 280 6.0 0.001 0.001 8.618 8.756 dbcsr_add_anytype 280 7.0 3.800 3.881 8.616 8.754 dbcsr_sort_indices 1283 10.0 8.365 8.368 8.365 8.368 dbcsr_copy_into_existing 11 8.0 8.031 8.301 8.031 8.301 ls_scf_init_scf 1 4.0 0.000 0.000 8.296 8.298 ls_scf_init_matrix_S 1 5.0 0.000 0.000 7.855 7.857 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.000 7.106 7.107 dbcsr_mm_accdrv_process 14501 10.0 0.694 0.774 7.077 7.098 tree_to_linear_d 23 10.5 6.842 6.847 6.842 6.847 dbcsr_mm_accdrv_process_sort 14501 11.0 6.307 6.324 6.307 6.324 make_images_pack 370 9.1 5.779 5.971 5.793 5.985 dbcsr_merge_single_wm 370 10.1 0.556 0.571 5.883 5.896 ------------------------------------------------------------------------------- Plot: name="bench_dftb_timings_6cpu_1gpu", title="Timings of bench_dftb with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="rest", label="rest", y=118.72099999999998, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=78.797, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=21.202, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=19.721, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="calculate_norms", label="calculate_norms", y=15.197, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="make_images", label="make_images", y=12.63, yerr=0.0 Running dbcsr.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/dbcsr_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.004 0.004 47.089 47.089 lib_test 1 2.0 0.000 0.000 47.083 47.083 dbcsr_run_tests 3 3.0 0.000 0.000 47.082 47.083 test_multiplies_multiproc 3 4.0 0.001 0.001 36.417 36.441 dbcsr_multiply_generic 9 5.0 0.002 0.002 28.399 28.400 multiply_cannon 9 6.0 0.018 0.019 18.543 19.021 multiply_cannon_loop 9 7.0 0.003 0.003 17.292 17.758 multiply_cannon_multrec 18 8.0 9.246 9.827 16.077 16.542 dbcsr_make_random_matrix 9 4.0 7.338 7.365 10.533 10.556 dbcsr_finalize 27 5.7 0.001 0.001 7.206 7.214 dbcsr_merge_all 18 6.5 3.538 3.552 7.098 7.111 dbcsr_mm_accdrv_process 8199 9.0 1.372 1.523 6.604 6.723 dbcsr_redistribute 9 5.0 3.461 3.465 5.602 5.609 make_m2s 18 6.0 0.001 0.001 5.096 5.106 make_images 18 7.0 0.403 0.411 5.062 5.072 dbcsr_mm_accdrv_process_sort 8199 10.0 4.518 4.541 4.518 4.541 make_images_data 18 8.0 0.001 0.001 2.822 2.831 hybrid_alltoall_any 18 9.0 2.436 2.448 2.790 2.799 mp_alltoall_d11v 27 6.0 1.876 1.881 1.876 1.881 tree_to_linear_d 9 7.0 1.837 1.840 1.837 1.840 dbcsr_data_copy_aa2 18 7.5 1.593 1.593 1.593 1.593 dbcsr_data_release 507 7.7 1.338 1.339 1.338 1.339 dbcsr_data_new 354 7.4 0.926 1.011 0.926 1.011 jit_kernel_multiply 6 10.0 0.714 0.962 0.714 0.962 dbcsr_checksum 6 5.0 0.953 0.960 0.961 0.961 mp_sum_l 61 4.9 0.494 0.950 0.494 0.950 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.493 0.949 ------------------------------------------------------------------------------- Plot: name="dbcsr_timings_6cpu_1gpu", title="Timings of dbcsr with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="rest", label="rest", y=18.988, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=9.246, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=7.338, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_mm_accdrv_process_sort", label="dbcsr_mm_accdrv_process_sort", y=4.518, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.538, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_redistribute", label="dbcsr_redistribute", y=3.461, yerr=0.0 Running MQAE_single_node.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/MQAE_single_node_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.047 0.051 197.459 197.460 qs_mol_dyn_low 1 2.0 0.004 0.004 195.967 195.999 qs_forces 6 3.8 0.001 0.001 124.217 124.218 qs_energies 6 4.8 0.000 0.000 117.399 117.400 scf_env_do_scf 6 5.8 0.000 0.000 109.928 109.928 scf_env_do_scf_inner_loop 113 6.2 0.005 0.007 102.093 102.093 velocity_verlet 5 3.0 0.003 0.003 92.530 92.578 rebuild_ks_matrix 119 8.1 0.000 0.000 83.110 83.112 qs_ks_build_kohn_sham_matrix 119 9.1 0.018 0.018 83.110 83.112 qs_ks_update_qs_env 119 7.3 0.001 0.001 78.423 78.424 fft_wrap_pw1pw2 2059 12.4 0.044 0.045 65.709 65.721 fft_wrap_pw1pw2_150 1321 13.9 0.009 0.010 62.995 63.036 qs_vxc_create 119 10.1 0.002 0.002 53.291 53.291 xc_vxc_pw_create 119 11.1 1.491 1.496 53.289 53.290 qmmm_el_coupling 6 3.8 0.000 0.000 37.253 37.256 qmmm_elec_with_gaussian 6 4.8 0.019 0.019 37.247 37.250 xc_pw_derive 714 13.1 0.010 0.010 37.090 37.178 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 35.913 35.961 pw_gpu_c1dr3d_3d_ps 1095 14.8 10.667 10.691 35.399 35.488 qmmm_forces 6 3.8 0.001 0.001 31.502 31.502 qmmm_elec_gaussian_low_G 6 6.8 31.277 31.315 31.277 31.315 qmmm_forces_with_gaussian 6 4.8 0.022 0.022 30.897 30.977 pw_gpu_r3dc1d_3d_ps 964 14.0 9.361 9.369 30.254 30.331 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 29.625 29.711 xc_rho_set_and_dset_create 119 12.1 2.363 2.376 26.685 26.719 qmmm_forces_gaussian_low_G 6 6.8 24.786 24.861 24.786 24.861 xc_pw_divergence 119 12.1 0.006 0.006 24.708 24.751 qs_rho_update_rho_low 119 7.3 0.001 0.001 22.079 22.144 calculate_rho_elec 119 8.3 1.058 1.060 22.078 22.144 density_rs2pw 119 9.3 0.007 0.007 15.936 16.044 sum_up_and_integrate 119 10.1 0.002 0.002 13.835 13.911 integrate_v_rspace 119 11.1 0.021 0.021 13.665 13.741 dbcsr_multiply_generic 2598 12.3 0.091 0.093 13.570 13.582 mp_alltoall_z22v 2059 16.4 12.684 12.761 12.684 12.761 multiply_cannon 2598 13.3 0.203 0.203 12.011 12.033 multiply_cannon_loop 2598 14.3 0.248 0.251 11.562 11.581 potential_pw2rs 119 12.1 0.033 0.033 9.692 9.693 multiply_cannon_multrec 5196 15.3 4.069 4.103 9.412 9.442 x_to_yz 1095 15.8 2.300 2.301 9.268 9.324 qs_ks_ddapc 119 10.1 0.002 0.002 8.816 8.848 pw_gpu_sf 1095 15.8 8.669 8.676 8.669 8.676 pw_gpu_fg 964 15.0 8.251 8.358 8.251 8.358 init_scf_loop 6 6.8 0.000 0.000 7.832 7.832 qs_scf_new_mos 113 7.2 0.001 0.001 7.715 7.717 qs_scf_loop_do_ot 113 8.2 0.001 0.001 7.714 7.716 yz_to_x 964 15.0 1.858 1.874 7.573 7.577 ot_scf_mini 113 9.2 0.002 0.002 7.426 7.426 pw_gpu_ffc 1095 15.8 6.776 6.793 6.776 6.793 init_scf_run 6 5.8 0.000 0.000 5.356 5.356 scf_env_initial_rho_setup 6 6.8 0.000 0.000 5.355 5.355 dbcsr_mm_accdrv_process 13992 16.0 0.537 0.542 5.276 5.282 grid_collocate_task_list 119 9.3 5.047 5.085 5.047 5.085 pw_gpu_cff 964 15.0 5.004 5.038 5.004 5.038 xc_functional_eval 238 13.1 0.003 0.003 5.018 5.029 ot_mini 113 10.2 0.001 0.001 4.988 4.990 qmmm_forces_gaussian_low_R 6 6.8 0.000 0.000 4.839 4.849 qmmm_forces_with_gaussian_LG 6 7.8 4.839 4.849 4.839 4.849 pw_poisson_solve 125 9.9 0.003 0.003 4.755 4.769 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 4.716 4.716 jit_kernel_multiply 24 14.7 4.695 4.707 4.695 4.707 qmmm_elec_gaussian_low_R 6 6.8 0.000 0.000 4.636 4.646 qmmm_elec_with_gaussian_LG 6 7.8 4.635 4.646 4.635 4.646 pw_derive 1089 13.4 4.180 4.246 4.180 4.246 qs_ot_get_derivative 113 11.2 0.001 0.001 4.083 4.088 grid_integrate_task_list 119 12.1 3.952 4.027 3.952 4.027 ------------------------------------------------------------------------------- Plot: name="MQAE_single_node_timings_6cpu_1gpu", title="Timings of MQAE_single_node with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="rest", label="rest", y=108.684, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=31.277, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=24.786, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=12.684, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.667, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=9.361, yerr=0.0 Summary: Performance test took 22 minutes. (cached) Status: OK Uploading artifacts... done EndDate: 2025-12-24 06:11:37+00:00