StartDate: 2026-01-18 06:06:17+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: 2dab80f166a95d4c662edfadd9dc5087979e3a7f CommitTime: 2026-01-17 17:11:27 +0100 CommitAuthor: Matthias Krack CommitSubject: Use the same libxsmm version with Spack as with the toolchain #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=2dab80f166a95d4c662edfadd9dc5087979e3a7f SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 408.9MB Step 1/46 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 15a17189b2df: Waiting 02cb0e091e33: Waiting 9c3d619183d2: Waiting 7f7602a82106: Waiting 5a2aba542b08: Waiting 6cb9b761b877: Waiting 2cd52cbb1ebe: Waiting 6e8af4fd0a07: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 2cd52cbb1ebe: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 6e8af4fd0a07: Verifying Checksum 6e8af4fd0a07: Download complete 02cb0e091e33: Verifying Checksum 02cb0e091e33: Download complete 9c3d619183d2: Verifying Checksum 9c3d619183d2: Download complete 7f7602a82106: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/46 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/46 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/46 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/46 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/46 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/46 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> 852ff7058318 Step 8/46 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> 3cc2e0ec6ea3 Step 9/46 : RUN mkdir scripts ---> Using cache ---> 9264fff48632 Step 10/46 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./scripts/ ---> Using cache ---> 94eaf24213f0 Step 11/46 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> 7e5ef29eeea0 Step 12/46 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --gpu-ver=V100 --dry-run ---> Using cache ---> 4940ae3b8d72 Step 13/46 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> a858e4ab62d2 Step 14/46 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> 5c91d3ddd6af Step 15/46 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 32c866fb1eff Step 16/46 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> af4360843d07 Step 17/46 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> 5e21943864dc Step 18/46 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> e091f4e500d7 Step 19/46 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> 3ca31197d770 Step 20/46 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> b02123867e9c Step 21/46 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> aa847f70d99a Step 22/46 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> ccffc891edaa Step 23/46 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> Using cache ---> df4af5fd01c0 Step 24/46 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Using cache ---> 086ef1848115 Step 25/46 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> Using cache ---> faf149cf4be5 Step 26/46 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Using cache ---> 03060a256b70 Step 27/46 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> Using cache ---> 658b78635136 Step 28/46 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Using cache ---> 660771bfcd32 Step 29/46 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> Using cache ---> 29b4ab81d44b Step 30/46 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Using cache ---> ccc4b313956c Step 31/46 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> Using cache ---> bf46716ae2d0 Step 32/46 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Using cache ---> a64df0b0cf81 Step 33/46 : WORKDIR /opt/cp2k ---> Using cache ---> 19073221d876 Step 34/46 : COPY ./src ./src ---> Using cache ---> 16f17d6dab4d Step 35/46 : COPY ./data ./data ---> Using cache ---> e0b0f0588894 Step 36/46 : COPY ./tools/build_utils ./tools/build_utils ---> Using cache ---> 26eef2255cba Step 37/46 : COPY ./cmake ./cmake ---> Using cache ---> 1666738489f2 Step 38/46 : COPY ./CMakeLists.txt . ---> Using cache ---> 48f18131c868 Step 39/46 : COPY ./tools/docker/scripts/build_cp2k.sh . ---> Using cache ---> bd178aac57d3 Step 40/46 : RUN ./build_cp2k.sh toolchain_cuda_V100 psmp ---> Using cache ---> dc7fbca22515 Step 41/46 : COPY ./benchmarks ./benchmarks ---> Using cache ---> 159cd1a78e0b Step 42/46 : COPY ./tools/regtesting ./tools/regtesting ---> Using cache ---> a188fecf2c02 Step 43/46 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> Using cache ---> 095d05944551 Step 44/46 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Using cache ---> e0f45a4df39e Step 45/46 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Using cache ---> d293cdad9264 Step 46/46 : ENTRYPOINT [] ---> Using cache ---> d838bd07d48a [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built d838bd07d48a Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma xsmm dbcsr_acc sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 77 input files and 0 errors. ========== Running Performance Test ========== Plot: name="total_timings_6cpu_1gpu", title="Total Timings with 6 CPU Cores and 1 GPU", ylabel="time [s]" Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.026 0.027 99.177 99.178 qs_mol_dyn_low 1 2.0 0.004 0.004 98.763 98.767 qs_forces 11 3.9 0.002 0.002 98.715 98.715 qs_energies 11 4.9 0.001 0.001 87.758 87.761 scf_env_do_scf 11 5.9 0.001 0.001 67.874 67.874 velocity_verlet 10 3.0 0.001 0.002 63.149 63.166 scf_env_do_scf_inner_loop 108 6.5 0.005 0.008 57.929 57.930 rebuild_ks_matrix 119 8.3 0.001 0.001 25.619 25.619 qs_ks_build_kohn_sham_matrix 119 9.3 0.019 0.019 25.618 25.618 dbcsr_multiply_generic 2286 12.5 0.138 0.139 24.110 24.165 qs_ks_update_qs_env 119 7.6 0.001 0.001 23.446 23.448 qs_rho_update_rho_low 119 7.7 0.001 0.001 19.549 19.563 calculate_rho_elec 119 8.7 0.838 0.842 19.548 19.563 qs_scf_new_mos 108 7.5 0.001 0.001 19.506 19.515 qs_scf_loop_do_ot 108 8.5 0.001 0.001 19.506 19.514 ot_scf_mini 108 9.5 0.003 0.003 17.665 17.667 fft_wrap_pw1pw2 1201 11.6 0.023 0.023 15.154 15.202 sum_up_and_integrate 119 10.3 0.002 0.002 13.575 13.624 integrate_v_rspace 119 11.3 0.340 0.344 13.489 13.538 fft_wrap_pw1pw2_140 487 12.2 0.003 0.003 13.033 13.085 multiply_cannon 2286 13.5 0.319 0.320 12.211 12.226 multiply_cannon_loop 2286 14.5 0.250 0.254 11.180 11.185 ot_mini 108 10.5 0.001 0.001 10.355 10.355 make_m2s 4572 13.5 0.040 0.040 10.332 10.335 make_images 4572 14.5 1.153 1.164 10.162 10.166 init_scf_run 11 5.9 0.000 0.000 10.029 10.029 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.029 10.029 init_scf_loop 11 6.9 0.000 0.000 9.870 9.870 density_rs2pw 119 9.7 0.007 0.007 9.758 9.856 grid_collocate_task_list 119 9.7 8.916 8.972 8.916 8.972 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 7.943 7.944 pw_gpu_r3dc1d_3d_ps 606 13.1 2.182 2.198 7.764 7.767 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.607 7.734 grid_integrate_task_list 119 12.3 7.363 7.415 7.363 7.415 pw_gpu_c1dr3d_3d_ps 595 14.2 2.131 2.143 7.361 7.407 wfi_extrapolate 11 7.9 0.001 0.001 7.219 7.219 prepare_preconditioner 11 7.9 0.000 0.000 6.760 6.762 make_preconditioner 11 8.9 0.000 0.000 6.760 6.762 qs_ot_get_derivative 108 11.5 0.001 0.001 6.202 6.205 multiply_cannon_multrec 4572 15.5 2.117 2.120 6.102 6.112 hybrid_alltoall_any 4725 16.4 4.701 4.712 6.013 6.021 make_images_data 4572 15.5 0.050 0.050 5.905 5.912 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.658 5.887 potential_pw2rs 119 12.3 0.034 0.034 5.785 5.785 parallel_gemm_fm_cosma 81 9.0 5.255 5.256 5.255 5.256 ot_diis_step 108 11.5 0.005 0.005 4.129 4.129 build_core_ppl_forces 11 5.9 3.863 3.976 3.863 3.976 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.812 3.850 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.777 3.810 dbcsr_mm_accdrv_process 9594 16.2 0.736 0.737 3.612 3.613 apply_preconditioner_dbcsr 119 12.6 0.000 0.000 3.605 3.607 apply_single 119 13.6 0.001 0.001 3.605 3.606 dbcsr_complete_redistribute 329 12.2 1.314 1.326 3.126 3.365 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.220 3.220 calculate_dm_sparse 119 9.5 0.001 0.001 3.136 3.147 multiply_cannon_sync_h2d 4572 15.5 3.131 3.135 3.131 3.135 qs_create_task_list 11 7.9 0.000 0.000 3.023 3.094 generate_qs_task_list 11 8.9 1.124 1.140 3.023 3.094 mp_alltoall_z22v 1201 15.6 3.020 3.073 3.020 3.073 qs_ot_get_p 119 10.4 0.001 0.001 2.989 2.989 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.713 2.714 mp_waitall_1 64495 16.9 2.641 2.645 2.641 2.645 pw_poisson_solve 119 10.3 0.003 0.003 2.569 2.572 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.428 2.437 calculate_first_density_matrix 1 7.0 0.000 0.000 2.376 2.376 transfer_rs2pw 487 10.6 0.007 0.008 2.258 2.364 jit_kernel_multiply 11 15.7 2.310 2.313 2.310 2.313 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.232 2.233 pw_gpu_fg 606 14.1 2.200 2.209 2.200 2.209 qs_ot_get_derivative_taylor 59 13.0 0.002 0.003 2.054 2.056 cp_fm_cholesky_invert 11 10.9 2.040 2.040 2.040 2.040 yz_to_x 606 14.1 0.442 0.444 1.975 2.007 build_core_ppl 11 7.9 1.965 1.996 1.965 1.996 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64", label="H2O-64", y=99.177, yerr=0.0 Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=69.07900000000001, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.916, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.363, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.255, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.701, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.863, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.025 0.026 96.527 96.527 qs_mol_dyn_low 1 2.0 0.004 0.004 96.104 96.107 qs_forces 11 3.9 0.002 0.002 96.060 96.061 qs_energies 11 4.9 0.001 0.001 84.892 84.892 scf_env_do_scf 11 5.9 0.001 0.001 64.595 64.595 velocity_verlet 10 3.0 0.001 0.001 63.000 63.015 scf_env_do_scf_inner_loop 96 6.5 0.005 0.007 54.109 54.109 rebuild_ks_matrix 107 8.3 0.001 0.001 25.225 25.226 qs_ks_build_kohn_sham_matrix 107 9.3 0.017 0.017 25.224 25.226 dbcsr_multiply_generic 1966 12.4 0.123 0.123 23.066 23.162 qs_ks_update_qs_env 107 7.6 0.001 0.001 22.700 22.701 qs_scf_new_mos 96 7.5 0.001 0.001 18.119 18.124 qs_scf_loop_do_ot 96 8.5 0.001 0.001 18.118 18.123 qs_rho_update_rho_low 107 7.7 0.001 0.001 18.101 18.122 calculate_rho_elec 107 8.7 0.754 0.764 18.100 18.121 ot_scf_mini 96 9.5 0.002 0.002 16.470 16.470 sum_up_and_integrate 107 10.3 0.002 0.002 14.088 14.190 integrate_v_rspace 107 11.3 0.309 0.315 14.010 14.112 fft_wrap_pw1pw2 1081 11.6 0.020 0.020 13.930 13.983 fft_wrap_pw1pw2_140 439 12.2 0.003 0.003 11.973 12.056 multiply_cannon 1966 13.4 0.282 0.287 11.481 11.792 make_m2s 3932 13.4 0.036 0.037 10.123 10.434 init_scf_loop 11 6.9 0.000 0.000 10.411 10.412 multiply_cannon_loop 1966 14.4 0.219 0.222 10.302 10.318 make_images 3932 14.4 1.112 1.226 9.971 10.281 init_scf_run 11 5.9 0.000 0.000 10.129 10.129 scf_env_initial_rho_setup 11 6.9 0.000 0.001 10.128 10.128 ot_mini 96 10.5 0.001 0.001 9.653 9.654 density_rs2pw 107 9.7 0.007 0.007 8.993 9.128 grid_integrate_task_list 107 12.3 8.455 8.562 8.455 8.562 grid_collocate_task_list 107 9.7 8.324 8.425 8.324 8.425 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.211 8.212 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.574 7.692 wfi_extrapolate 11 7.9 0.001 0.001 7.280 7.280 pw_gpu_r3dc1d_3d_ps 546 13.1 2.052 2.143 7.197 7.211 prepare_preconditioner 11 7.9 0.000 0.000 7.058 7.064 make_preconditioner 11 8.9 0.000 0.000 7.058 7.064 pw_gpu_c1dr3d_3d_ps 535 14.2 1.928 1.951 6.708 6.746 make_full_inverse_cholesky 11 9.9 0.000 0.000 5.929 6.158 hybrid_alltoall_any 4079 16.3 4.480 4.745 6.059 6.076 make_images_data 3932 15.4 0.044 0.045 5.879 5.887 multiply_cannon_multrec 3932 15.4 1.838 1.873 5.704 5.739 qs_ot_get_derivative 96 11.5 0.001 0.001 5.634 5.634 parallel_gemm_fm_cosma 81 9.0 5.343 5.344 5.343 5.344 potential_pw2rs 107 12.3 0.031 0.031 5.245 5.246 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.017 4.205 ot_diis_step 96 11.5 0.005 0.005 3.998 3.998 build_core_ppl_forces 11 5.9 3.871 3.968 3.871 3.968 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.747 3.788 apply_preconditioner_dbcsr 107 12.6 0.000 0.000 3.690 3.693 apply_single 107 13.6 0.001 0.001 3.690 3.692 dbcsr_mm_accdrv_process 8450 16.1 0.782 1.112 3.530 3.536 dbcsr_complete_redistribute 317 12.2 1.326 1.342 3.301 3.531 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.472 3.473 qs_create_task_list 11 7.9 0.000 0.000 3.218 3.367 generate_qs_task_list 11 8.9 1.388 1.390 3.218 3.366 mp_waitall_1 55487 16.8 2.724 3.015 2.724 3.015 mp_alltoall_z22v 1081 15.6 2.838 2.992 2.838 2.992 calculate_dm_sparse 107 9.5 0.001 0.001 2.953 2.961 multiply_cannon_sync_h2d 3932 15.4 2.816 2.862 2.816 2.862 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.719 2.719 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.675 2.718 qs_ot_get_p 107 10.4 0.001 0.001 2.610 2.612 jit_kernel_multiply 12 15.6 2.238 2.578 2.238 2.578 calculate_first_density_matrix 1 7.0 0.000 0.000 2.413 2.413 pw_poisson_solve 107 10.3 0.002 0.002 2.295 2.297 transfer_rs2pw 439 10.6 0.007 0.007 2.104 2.262 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.238 2.239 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.159 2.199 cp_fm_cholesky_invert 11 10.9 2.038 2.038 2.038 2.038 pw_gpu_fg 546 14.1 1.971 1.997 1.971 1.997 yz_to_x 546 14.1 0.400 0.403 1.876 1.992 build_core_ppl 11 7.9 1.945 1.979 1.945 1.979 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64_nonortho", label="H2O-64_nonortho", y=96.527, yerr=0.0 Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=66.054, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.455, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.324, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=5.343, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.48, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.871, yerr=0.0 Running GW_PBE_4benzene.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/GW_PBE_4benzene_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.018 0.019 160.738 160.738 qs_energies 1 2.0 0.000 0.000 160.435 160.437 mp2_main 1 3.0 0.000 0.000 153.891 153.893 mp2_gpw_main 1 4.0 0.000 0.000 152.113 152.115 rpa_ri_compute_en 1 5.0 0.000 0.000 142.748 142.749 rpa_num_int 1 6.0 0.001 0.001 142.739 142.741 compute_mat_P_omega 1 7.0 0.001 0.002 66.026 66.028 dbt_total 2336 9.6 0.020 0.021 65.543 65.544 compute_mat_P_omega_contract 10 8.0 5.133 5.146 65.378 65.390 parallel_gemm_fm_cosma 105 8.4 63.880 63.952 63.880 63.952 dbt_contract 787 11.0 0.047 0.048 44.426 44.427 compute_W_cubic_GW 10 7.0 0.003 0.004 41.570 41.571 dbt_tas_total 1149 12.2 0.131 0.132 34.718 34.718 dbt_tas_multiply 807 12.1 0.002 0.002 34.057 34.057 dbt_tas_dbm 807 14.1 0.005 0.005 26.813 26.813 dbm_multiply 807 16.1 25.573 25.836 25.573 25.836 compute_mat_P_omega_calc_M_occ 250 9.0 5.131 5.144 23.180 23.180 dbt_copy 1107 10.7 0.069 0.070 21.329 21.415 rpa_num_int_RPA_matrix_operati 10 7.0 0.000 0.000 21.320 21.320 contract_P_omega_with_mat_L 10 8.0 0.000 0.000 21.019 21.019 dbt_tas_mm_1N 524 15.1 0.003 0.003 17.197 17.519 compute_mat_P_omega_calc_M_vir 250 9.0 0.001 0.001 14.729 14.729 dbt_reshape 594 11.8 6.206 6.348 13.835 13.922 compute_QP_energies 1 7.0 0.000 0.000 11.554 11.554 compute_self_energy_cubic_gw 1 8.0 0.127 0.129 11.553 11.553 dbt_tas_reserve_blocks_index 3266 14.3 0.617 0.618 10.041 10.192 dbm_reserve_blocks 3634 15.3 9.750 9.903 9.750 9.903 mp2_ri_gpw_compute_in 1 5.0 0.001 0.001 9.355 9.355 compute_mat_P_omega_calc_P_t 250 9.0 0.001 0.001 8.732 8.732 dbt_crop 1042 12.0 6.394 6.467 8.589 8.686 dbt_reserve_blocks_index 2347 13.0 0.296 0.297 8.356 8.573 dbt_reserve_blocks_index_array 2289 12.1 0.011 0.012 8.150 8.395 dbt_tas_mm_2 251 15.0 0.003 0.003 7.482 7.482 scf_env_do_scf 1 3.0 0.000 0.000 6.016 6.016 scf_env_do_scf_inner_loop 17 4.0 0.001 0.001 6.016 6.016 mp_waitall_2 2656 15.9 5.633 5.642 5.633 5.642 contract_cubic_gw 21 9.0 0.000 0.000 5.313 5.313 get_2c_integrals 1 6.0 0.000 0.000 5.301 5.301 dbt_communicate_buffer 594 12.8 0.011 0.012 5.141 5.153 dbcsr_multiply_generic 30 8.1 0.003 0.003 4.924 4.969 multiply_cannon 30 9.1 0.010 0.015 4.746 4.789 multiply_cannon_loop 30 10.1 0.004 0.004 4.694 4.737 compute_mat_P_omega_copy_M_vir 250 9.0 0.002 0.002 4.648 4.675 compute_mat_P_omega_copy_M_occ 250 9.0 0.002 0.002 4.546 4.547 dbt_tas_copy 511 11.5 2.353 2.366 4.137 4.192 multiply_cannon_multrec 60 11.1 0.141 0.143 4.126 4.143 dbcsr_mm_accdrv_process 328 12.3 0.041 0.042 3.834 3.852 jit_kernel_multiply 18 11.7 3.788 3.804 3.788 3.804 qs_scf_new_mos 17 5.0 0.000 0.000 3.312 3.355 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="GW_PBE_4benzene", label="GW_PBE_4benzene", y=160.738, yerr=0.0 Plot: name="GW_PBE_4benzene_timings_6cpu_1gpu", title="Timings of GW_PBE_4benzene with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="rest", label="rest", y=48.935, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="parallel_gemm_fm_cosma", label="parallel_gemm_fm_cosma", y=63.88, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=25.573, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=9.75, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_crop", label="dbt_crop", y=6.394, yerr=0.0 PlotPoint: plot="GW_PBE_4benzene_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=6.206, yerr=0.0 Running RI-HFX_H2O-32.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-HFX_H2O-32_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.019 0.020 189.467 189.468 qs_forces 1 2.0 0.000 0.000 189.049 189.049 rebuild_ks_matrix 7 6.6 0.000 0.000 184.834 184.834 qs_ks_build_kohn_sham_matrix 7 7.6 0.001 0.002 184.833 184.834 hfx_ks_matrix 7 8.6 0.000 0.000 181.179 181.179 dbt_total 849 11.0 0.009 0.009 135.497 135.497 hfx_ri_update_ks 7 9.6 0.000 0.000 102.787 102.787 hfx_ri_update_ks_Pmat 7 10.6 21.439 21.470 102.782 102.782 qs_energies 1 3.0 0.000 0.000 98.367 98.367 scf_env_do_scf 1 4.0 0.000 0.000 96.335 96.335 qs_ks_update_qs_env 8 6.0 0.000 0.000 94.197 94.197 qs_ks_update_qs_env_forces 1 3.0 0.000 0.000 90.644 90.644 dbt_contract 207 12.4 0.048 0.049 79.277 79.278 hfx_ri_update_forces 1 7.0 1.070 1.091 78.390 78.390 dbt_tas_total 369 13.4 0.072 0.074 65.983 65.983 dbt_tas_multiply 216 13.5 0.001 0.001 63.269 63.269 scf_env_do_scf_inner_loop 6 5.0 0.000 0.000 52.160 52.160 dbt_copy 423 11.8 0.045 0.046 51.733 51.809 dbt_tas_dbm 216 15.5 0.002 0.002 50.308 50.308 dbm_multiply 216 17.5 47.506 47.619 47.506 47.619 hfx_ri_forces_Pmat_3c 1 8.0 3.438 3.449 46.277 46.280 init_scf_loop 2 5.0 0.000 0.000 44.173 44.173 dbt_reshape 175 13.2 17.772 17.807 38.817 39.038 hfx_ri_update_ks_Pmat_KS 63 11.6 0.001 0.001 29.604 29.604 precalc_derivatives 1 8.0 1.832 1.872 26.374 26.376 dbt_tas_mm_2 91 16.5 0.001 0.001 20.782 20.782 mp_waitall_2 1022 16.5 18.504 18.559 18.504 18.559 dbt_tas_reserve_blocks_index 1323 15.4 1.582 1.588 17.724 17.842 dbm_reserve_blocks 1491 16.3 16.850 16.965 16.850 16.965 dbt_crop 372 13.7 12.215 12.359 15.946 16.170 dbt_tas_mm_3T 77 17.1 0.000 0.000 15.730 15.870 hfx_ri_pre_scf_Pmat 1 12.0 0.000 0.000 15.409 15.409 dbt_communicate_buffer 175 14.2 0.004 0.004 15.352 15.406 dbt_reserve_blocks_index 889 14.5 0.566 0.571 14.319 14.362 dbt_reserve_blocks_index_array 859 13.5 0.007 0.007 14.058 14.103 hfx_ri_update_ks_Pmat_Px3C 63 11.6 0.000 0.000 14.046 14.046 build_3c_derivatives 3 9.0 2.024 2.053 13.892 13.897 hfx_ri_update_ks_Pmat_copy_2 63 11.6 0.000 0.000 13.815 13.815 dbt_tas_mm_3N 37 15.4 0.000 0.000 11.514 11.665 dbt_tas_copy 248 12.5 4.035 4.047 7.731 7.818 mp_sync 2901 12.8 6.039 6.168 6.039 6.168 hfx_ri_pre_scf_Pmat_int 1 13.0 0.000 0.000 4.993 4.993 dbt_tas_replicate 168 15.1 2.197 2.209 4.599 4.636 hfx_ri_pre_scf_calc_tensors 1 14.0 0.003 0.003 4.283 4.285 hfx_ri_pre_scf_Pmat_copy_2 9 13.0 1.644 1.667 4.212 4.234 dbt_tas_reserve_blocks_templat 266 13.6 0.100 0.103 3.753 3.827 hfx_ri_pre_scf_Pmat_RIx3C 9 13.0 0.000 0.000 3.753 3.793 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-HFX_H2O-32", label="RI-HFX_H2O-32", y=189.467, yerr=0.0 Plot: name="RI-HFX_H2O-32_timings_6cpu_1gpu", title="Timings of RI-HFX_H2O-32 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="rest", label="rest", y=67.39600000000002, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_multiply", label="dbm_multiply", y=47.506, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="hfx_ri_update_ks_Pmat", label="hfx_ri_update_ks_Pmat", y=21.439, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="mp_waitall_2", label="mp_waitall_2", y=18.504, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbt_reshape", label="dbt_reshape", y=17.772, yerr=0.0 PlotPoint: plot="RI-HFX_H2O-32_timings_6cpu_1gpu", name="dbm_reserve_blocks", label="dbm_reserve_blocks", y=16.85, yerr=0.0 Running RI-MP2_ammonia.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/RI-MP2_ammonia_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.010 0.011 104.901 104.902 qs_energies 1 2.0 0.000 0.000 104.724 104.724 mp2_main 1 3.0 0.000 0.000 97.273 97.273 mp2_gpw_main 1 4.0 0.001 0.001 96.908 96.908 mp2_ri_gpw_compute_in 1 5.0 0.549 0.549 54.213 54.217 mp2_ri_gpw_compute_in_loop 1 6.0 0.012 0.013 46.051 46.056 mp2_ri_gpw_compute_en 1 5.0 0.089 0.089 42.635 42.639 mp2_ri_gpw_compute_en_RI_loop 1 6.0 12.919 12.928 40.003 40.004 dbcsr_multiply_generic 2666 8.0 0.159 0.160 23.077 23.387 ao_to_mo_and_store_B_mult_1 1328 7.0 0.014 0.014 21.788 22.098 mp2_eri_3c_integrate_gpw 1328 7.0 0.017 0.018 18.565 18.871 mp2_ri_gpw_compute_en_expansio 1040 7.0 0.742 0.745 16.291 16.320 local_gemm 1040 8.0 15.550 15.582 15.550 15.582 make_m2s 5332 9.0 0.052 0.052 12.472 12.490 make_images 5332 10.0 2.232 2.233 12.293 12.311 integrate_v_rspace 1338 8.0 1.017 1.025 10.441 10.569 multiply_cannon 2666 9.0 0.386 0.396 9.957 10.248 multiply_cannon_loop 2666 10.0 0.187 0.188 8.732 8.999 hybrid_alltoall_any 6683 11.6 8.297 8.302 8.548 8.550 make_images_data 5332 11.0 0.061 0.061 8.465 8.472 grid_integrate_task_list 1338 9.0 8.176 8.302 8.176 8.302 fft_wrap_pw1pw2 26668 10.4 0.136 0.138 7.690 7.830 get_2c_integrals 1 6.0 0.004 0.004 7.612 7.612 collocate_function 1328 8.0 4.884 4.910 6.942 7.121 compute_2c_integrals 1 7.0 0.007 0.007 7.054 7.054 compute_2c_integrals_loop_lm 1 8.0 0.021 0.021 6.903 6.963 mp2_eri_2c_integrate_gpw 1 9.0 2.044 2.082 6.881 6.941 scf_env_do_scf 1 3.0 0.000 0.000 6.564 6.566 scf_env_do_scf_inner_loop 10 4.0 0.001 0.001 6.564 6.565 ao_to_mo_and_store_B_E_Ex_1 1328 7.0 3.599 3.602 5.417 5.423 qs_scf_new_mos 10 5.0 0.000 0.000 5.050 5.058 mp2_ri_gpw_compute_en_ener 1040 7.0 5.015 5.048 5.015 5.048 multiply_cannon_multrec 2676 11.0 2.257 2.321 4.869 4.925 mp2_ri_gpw_compute_en_comm 221 7.0 1.055 1.055 4.645 4.698 fft_wrap_pw1pw2_20 10647 11.4 0.019 0.020 4.409 4.528 pw_gpu_r3dc1d_3d 13282 12.2 3.847 3.978 3.847 3.978 eigensolver 11 5.8 0.001 0.001 2.900 2.900 pw_gpu_c1dr3d_3d 13280 12.7 2.672 2.684 2.672 2.684 potential_pw2rs 2666 10.0 0.094 0.096 2.647 2.664 mp_sendrecv_dm3 442 8.0 2.554 2.616 2.554 2.616 fft_wrap_pw1pw2_10 15957 11.5 0.018 0.018 2.386 2.410 dbcsr_mm_accdrv_process 5392 12.0 0.235 0.235 2.375 2.386 cp_fm_diag_elpa 11 6.8 0.000 0.000 2.335 2.335 cp_fm_diag_elpa_base 11 7.8 2.257 2.271 2.334 2.335 collocate_single_gaussian 1328 10.0 0.090 0.090 2.305 2.309 replicate_iaK_2intgroup 1 6.0 2.123 2.124 2.262 2.266 copy_dbcsr_to_fm 1351 8.0 0.032 0.032 2.205 2.205 mp2_eri_2c_integrate_gpw_pot_l 1328 10.0 0.004 0.004 2.180 2.195 fill_local_i_aL 884 7.5 2.169 2.178 2.169 2.178 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="RI-MP2_ammonia", label="RI-MP2_ammonia", y=104.901, yerr=0.0 Plot: name="RI-MP2_ammonia_timings_6cpu_1gpu", title="Timings of RI-MP2_ammonia with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="rest", label="rest", y=54.943999999999996, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="local_gemm", label="local_gemm", y=15.55, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_RI_loop", label="mp2_ri_gpw_compute_en_RI_loop", y=12.919, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=8.297, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.176, yerr=0.0 PlotPoint: plot="RI-MP2_ammonia_timings_6cpu_1gpu", name="mp2_ri_gpw_compute_en_ener", label="mp2_ri_gpw_compute_en_ener", y=5.015, yerr=0.0 Running diag_cu144_broy.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/diag_cu144_broy_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.081 0.083 204.071 204.074 qs_energies 1 2.0 0.000 0.000 202.989 202.992 scf_env_do_scf 1 3.0 0.000 0.000 189.166 189.168 scf_env_do_scf_inner_loop 15 4.0 0.001 0.002 189.166 189.168 qs_ks_update_qs_env 15 5.0 0.000 0.000 94.375 94.408 rebuild_ks_matrix 15 6.0 0.000 0.000 94.181 94.214 qs_ks_build_kohn_sham_matrix 15 7.0 0.003 0.003 94.180 94.214 qs_vxc_create 15 8.0 0.094 0.126 57.648 57.694 qs_scf_new_mos 15 5.0 0.000 0.000 52.978 53.057 fft_wrap_pw1pw2 1086 10.0 0.027 0.028 50.487 50.651 calculate_dispersion_nonloc 15 9.0 10.804 10.894 49.592 49.613 eigensolver 15 6.0 0.002 0.002 43.930 44.045 qs_rho_update_rho_low 16 5.0 0.000 0.000 40.142 40.143 calculate_rho_elec 16 6.0 0.179 0.180 40.142 40.143 sum_up_and_integrate 15 8.0 0.000 0.000 35.085 35.172 integrate_v_rspace 15 9.0 0.046 0.046 35.062 35.149 grid_collocate_task_list 16 7.0 28.706 28.756 28.706 28.756 cp_fm_diag_elpa 15 7.0 0.000 0.000 27.994 27.997 cp_fm_diag_elpa_base 15 8.0 26.357 26.859 27.989 27.990 grid_integrate_task_list 15 10.0 27.846 27.880 27.846 27.880 pw_gpu_c1dr3d_3d_ps 585 12.1 5.430 5.516 26.096 26.118 fft_wrap_pw1pw2_150 765 11.0 0.005 0.005 25.807 25.966 pw_gpu_r3dc1d_3d_ps 501 11.9 4.886 5.454 24.358 24.498 cp_fm_cholesky_restore 45 7.0 14.206 14.855 14.206 14.855 fft_wrap_pw1pw2_200 197 11.3 0.001 0.001 12.575 12.626 density_rs2pw 16 7.0 0.001 0.001 11.246 11.302 qs_energies_init_hamiltonians 1 3.0 0.000 0.000 9.965 9.965 vdW_energy 15 10.0 9.205 9.247 9.205 9.247 pw_gpu_ffc 585 13.1 8.947 8.961 8.947 8.961 build_core_hamiltonian_matrix 1 4.0 0.000 0.000 8.634 8.644 pw_gpu_cff 501 12.9 8.460 8.475 8.460 8.475 mp_alltoall_z22v 1086 14.0 7.147 8.049 7.147 8.049 xc_vxc_pw_create 15 9.0 0.174 0.176 7.962 7.968 potential_pw2rs 15 10.0 0.006 0.006 7.170 7.223 pw_gpu_sf 585 13.1 7.024 7.048 7.024 7.048 pw_gpu_fg 501 12.9 6.605 6.643 6.605 6.643 copy_dbcsr_to_fm 16 5.9 0.001 0.001 6.090 6.094 dbcsr_complete_redistribute 46 8.3 1.866 1.886 5.694 5.790 fft_wrap_pw1pw2_10 62 10.5 0.000 0.000 5.472 5.476 yz_to_x 501 12.9 0.852 0.857 4.350 5.110 build_core_ppnl 1 5.0 4.835 4.868 4.835 4.868 x_to_yz 585 13.1 1.013 1.030 4.661 4.781 xc_rho_set_and_dset_create 15 10.0 0.126 0.128 4.698 4.727 xc_pw_derive 90 11.0 0.001 0.001 4.589 4.610 cp_fm_uplo_to_full 30 8.0 3.359 4.390 3.359 4.390 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="diag_cu144_broy", label="diag_cu144_broy", y=204.071, yerr=0.0 Plot: name="diag_cu144_broy_timings_6cpu_1gpu", title="Timings of diag_cu144_broy with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="rest", label="rest", y=96.152, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=28.706, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=27.846, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_diag_elpa_base", label="cp_fm_diag_elpa_base", y=26.357, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="cp_fm_cholesky_restore", label="cp_fm_cholesky_restore", y=14.206, yerr=0.0 PlotPoint: plot="diag_cu144_broy_timings_6cpu_1gpu", name="calculate_dispersion_nonloc", label="calculate_dispersion_nonloc", y=10.804, yerr=0.0 Running bench_dftb.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/bench_dftb_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.045 0.045 262.071 262.071 qs_energies 1 2.0 0.000 0.000 261.947 261.949 ls_scf 1 3.0 0.000 0.000 261.062 261.064 ls_scf_main 1 4.0 0.001 0.001 251.260 251.261 density_matrix_trs4 11 5.0 0.007 0.007 208.739 208.761 dbcsr_multiply_generic 185 6.1 0.312 0.314 169.910 169.916 multiply_cannon 185 7.1 1.958 2.252 118.265 118.350 multiply_cannon_loop 185 8.1 0.320 0.324 103.523 103.732 multiply_cannon_multrec 370 9.1 78.205 78.426 87.174 87.362 make_m2s 370 7.1 0.028 0.028 43.988 44.002 make_images 370 8.1 10.981 11.288 42.936 42.945 ls_scf_dm_to_ks 11 5.0 0.000 0.000 38.143 38.177 matrix_ls_to_qs 11 6.0 0.000 0.000 35.281 35.283 dbcsr_complete_redistribute 23 7.5 21.819 21.902 30.057 30.174 matrix_decluster 11 7.0 0.000 0.000 27.186 27.314 arnoldi_extremal 12 6.1 0.000 0.000 23.445 23.447 arnoldi_normal_ev 12 7.1 0.009 0.010 23.445 23.446 build_subspace 23 8.1 0.061 0.061 22.959 22.960 dbcsr_matrix_vector_mult 652 9.0 0.149 0.152 21.644 21.954 dbcsr_matrix_vector_mult_local 652 10.0 20.640 20.953 20.648 20.961 make_images_data 370 9.1 0.012 0.012 16.274 16.435 hybrid_alltoall_any 393 9.9 11.438 11.565 15.747 15.916 calculate_norms 740 9.1 15.506 15.533 15.506 15.533 dbcsr_finalize 559 7.6 0.166 0.168 14.041 14.055 dbcsr_merge_all 510 8.6 2.668 2.682 12.898 12.927 dbcsr_copy 761 7.5 1.711 1.722 9.826 9.941 setup_rec_index_2d 370 8.1 9.646 9.688 9.646 9.688 dbcsr_special_finalize 555 9.1 0.010 0.010 9.204 9.221 dbcsr_sort_indices 1283 10.0 8.666 8.682 8.666 8.682 dbcsr_add_d 280 6.0 0.001 0.001 8.433 8.463 dbcsr_add_anytype 280 7.0 3.628 3.631 8.432 8.461 ls_scf_init_scf 1 4.0 0.000 0.000 8.305 8.305 dbcsr_copy_into_existing 11 8.0 8.094 8.219 8.094 8.220 dbcsr_dot 144 6.3 7.606 7.613 8.101 8.126 ls_scf_init_matrix_S 1 5.0 0.000 0.000 7.826 7.830 dbcsr_mm_accdrv_process 14501 10.0 0.692 0.775 7.063 7.090 matrix_sqrt_Newton_Schulz 1 6.0 0.000 0.000 7.087 7.088 tree_to_linear_d 23 10.5 6.853 6.874 6.853 6.874 dbcsr_mm_accdrv_process_sort 14501 11.0 6.289 6.314 6.289 6.314 dbcsr_merge_single_wm 370 10.1 0.538 0.547 5.932 5.936 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="bench_dftb", label="bench_dftb", y=262.071, yerr=0.0 Plot: name="bench_dftb_timings_6cpu_1gpu", title="Timings of bench_dftb with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="rest", label="rest", y=114.46300000000002, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=78.205, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_complete_redistribute", label="dbcsr_complete_redistribute", y=21.819, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="dbcsr_matrix_vector_mult_local", label="dbcsr_matrix_vector_mult_local", y=20.64, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="calculate_norms", label="calculate_norms", y=15.506, yerr=0.0 PlotPoint: plot="bench_dftb_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=11.438, yerr=0.0 Running dbcsr.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/dbcsr_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.004 0.004 47.381 47.381 lib_test 1 2.0 0.000 0.000 47.366 47.375 dbcsr_run_tests 3 3.0 0.000 0.000 47.366 47.375 test_multiplies_multiproc 3 4.0 0.001 0.001 36.584 36.664 dbcsr_multiply_generic 9 5.0 0.002 0.002 28.500 28.507 multiply_cannon 9 6.0 0.194 0.370 18.710 19.349 multiply_cannon_loop 9 7.0 0.003 0.003 17.196 17.572 multiply_cannon_multrec 18 8.0 9.282 9.658 16.112 16.471 dbcsr_make_random_matrix 9 4.0 7.285 7.362 10.644 10.728 dbcsr_finalize 27 5.7 0.001 0.001 7.398 7.523 dbcsr_merge_all 18 6.5 3.564 3.604 7.287 7.409 dbcsr_mm_accdrv_process 8199 9.0 1.346 1.395 6.612 6.624 dbcsr_redistribute 9 5.0 3.454 3.462 5.588 5.592 make_m2s 18 6.0 0.001 0.001 4.869 4.873 make_images 18 7.0 0.363 0.367 4.838 4.841 dbcsr_mm_accdrv_process_sort 8199 10.0 4.464 4.472 4.464 4.472 make_images_data 18 8.0 0.001 0.001 2.807 2.810 hybrid_alltoall_any 18 9.0 2.419 2.427 2.772 2.774 dbcsr_data_copy_aa2 18 7.5 1.763 1.917 1.763 1.917 mp_alltoall_d11v 27 6.0 1.881 1.885 1.881 1.885 tree_to_linear_d 9 7.0 1.828 1.835 1.828 1.835 dbcsr_data_release 507 7.7 1.340 1.342 1.340 1.342 mp_sum_l 61 4.9 0.648 1.245 0.648 1.245 dbcsr_multiply_generic_mpsum_f 9 6.0 0.000 0.000 0.647 1.244 dbcsr_data_new 354 7.4 0.971 1.096 0.971 1.096 dbcsr_checksum 6 5.0 0.966 0.968 0.977 0.977 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="dbcsr", label="dbcsr", y=47.381, yerr=0.0 Plot: name="dbcsr_timings_6cpu_1gpu", title="Timings of dbcsr with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="rest", label="rest", y=19.332, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="multiply_cannon_multrec", label="multiply_cannon_multrec", y=9.282, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_make_random_matrix", label="dbcsr_make_random_matrix", y=7.285, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_mm_accdrv_process_sort", label="dbcsr_mm_accdrv_process_sort", y=4.464, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_merge_all", label="dbcsr_merge_all", y=3.564, yerr=0.0 PlotPoint: plot="dbcsr_timings_6cpu_1gpu", name="dbcsr_redistribute", label="dbcsr_redistribute", y=3.454, yerr=0.0 Running MQAE_single_node.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/MQAE_single_node_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.045 0.046 196.967 196.968 qs_mol_dyn_low 1 2.0 0.004 0.004 195.476 195.508 qs_forces 6 3.8 0.001 0.001 122.015 122.016 qs_energies 6 4.8 0.000 0.001 115.243 115.244 scf_env_do_scf 6 5.8 0.000 0.000 107.682 107.683 scf_env_do_scf_inner_loop 113 6.2 0.005 0.008 99.806 99.806 velocity_verlet 5 3.0 0.003 0.003 93.363 93.410 rebuild_ks_matrix 119 8.1 0.001 0.001 81.077 81.081 qs_ks_build_kohn_sham_matrix 119 9.1 0.019 0.019 81.077 81.080 qs_ks_update_qs_env 119 7.3 0.001 0.001 76.499 76.502 fft_wrap_pw1pw2 2059 12.4 0.042 0.042 63.808 63.870 fft_wrap_pw1pw2_150 1321 13.9 0.009 0.009 61.105 61.106 qs_vxc_create 119 10.1 0.002 0.002 51.950 51.952 xc_vxc_pw_create 119 11.1 1.487 1.498 51.948 51.950 qmmm_el_coupling 6 3.8 0.000 0.000 38.349 38.350 qmmm_elec_with_gaussian 6 4.8 0.019 0.019 38.343 38.344 qmmm_elec_with_gaussian_low 6 5.8 0.000 0.000 36.715 37.162 xc_pw_derive 714 13.1 0.010 0.010 35.922 35.945 pw_gpu_c1dr3d_3d_ps 1095 14.8 10.105 10.121 34.546 34.623 qmmm_elec_gaussian_low_G 6 6.8 32.094 32.561 32.094 32.561 qmmm_forces 6 3.8 0.001 0.001 32.482 32.482 qmmm_forces_with_gaussian 6 4.8 0.021 0.022 31.829 32.099 qmmm_force_with_gaussian_low 6 5.8 0.000 0.000 30.548 30.822 pw_gpu_r3dc1d_3d_ps 964 14.0 8.964 9.018 29.208 29.223 xc_rho_set_and_dset_create 119 12.1 2.364 2.379 26.291 26.342 qmmm_forces_gaussian_low_G 6 6.8 25.662 25.931 25.662 25.931 xc_pw_divergence 119 12.1 0.006 0.006 23.783 23.840 qs_rho_update_rho_low 119 7.3 0.001 0.001 21.731 21.784 calculate_rho_elec 119 8.3 1.071 1.074 21.731 21.783 density_rs2pw 119 9.3 0.007 0.008 15.600 15.716 dbcsr_multiply_generic 2598 12.3 0.095 0.097 13.632 13.645 sum_up_and_integrate 119 10.1 0.002 0.002 13.559 13.612 integrate_v_rspace 119 11.1 0.022 0.022 13.380 13.436 mp_alltoall_z22v 2059 16.4 12.846 13.196 12.846 13.196 multiply_cannon 2598 13.3 0.210 0.211 12.018 12.084 multiply_cannon_loop 2598 14.3 0.248 0.254 11.557 11.625 multiply_cannon_multrec 5196 15.3 3.967 4.042 9.414 9.602 potential_pw2rs 119 12.1 0.032 0.032 9.423 9.423 x_to_yz 1095 15.8 2.263 2.286 9.259 9.410 pw_gpu_sf 1095 15.8 8.772 8.809 8.772 8.809 qs_ks_ddapc 119 10.1 0.002 0.002 8.639 8.640 init_scf_loop 6 6.8 0.000 0.000 7.874 7.874 pw_gpu_fg 964 15.0 7.693 7.852 7.693 7.852 yz_to_x 964 15.0 1.728 1.739 7.579 7.743 qs_scf_new_mos 113 7.2 0.001 0.001 7.579 7.581 qs_scf_loop_do_ot 113 8.2 0.001 0.001 7.578 7.580 ot_scf_mini 113 9.2 0.002 0.002 7.294 7.299 pw_gpu_ffc 1095 15.8 6.392 6.414 6.392 6.414 dbcsr_mm_accdrv_process 13992 16.0 0.536 0.544 5.380 5.492 init_scf_run 6 5.8 0.000 0.000 5.447 5.447 scf_env_initial_rho_setup 6 6.8 0.000 0.000 5.447 5.447 xc_functional_eval 238 13.1 0.003 0.003 5.104 5.131 grid_collocate_task_list 119 9.3 5.034 5.095 5.034 5.095 pw_gpu_cff 964 15.0 4.905 4.938 4.905 4.938 jit_kernel_multiply 24 14.7 4.805 4.907 4.805 4.907 qmmm_forces_gaussian_low_R 6 6.8 0.000 0.000 4.886 4.891 qmmm_forces_with_gaussian_LG 6 7.8 4.886 4.891 4.886 4.891 ot_mini 113 10.2 0.001 0.001 4.877 4.881 qmmm_elec_gaussian_low_R 6 6.8 0.000 0.000 4.622 4.642 qmmm_elec_with_gaussian_LG 6 7.8 4.622 4.642 4.622 4.642 qs_ks_update_qs_env_forces 6 4.8 0.000 0.000 4.609 4.609 pw_poisson_solve 125 9.9 0.003 0.003 4.562 4.563 pw_derive 1089 13.4 3.981 4.020 3.981 4.020 qs_ot_get_derivative 113 11.2 0.001 0.001 3.995 3.999 grid_integrate_task_list 119 12.1 3.935 3.990 3.935 3.990 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="MQAE_single_node", label="MQAE_single_node", y=196.967, yerr=0.0 Plot: name="MQAE_single_node_timings_6cpu_1gpu", title="Timings of MQAE_single_node with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="rest", label="rest", y=107.296, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_elec_gaussian_low_G", label="qmmm_elec_gaussian_low_G", y=32.094, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="qmmm_forces_gaussian_low_G", label="qmmm_forces_gaussian_low_G", y=25.662, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=12.846, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_c1dr3d_3d_ps", label="pw_gpu_c1dr3d_3d_ps", y=10.105, yerr=0.0 PlotPoint: plot="MQAE_single_node_timings_6cpu_1gpu", name="pw_gpu_r3dc1d_3d_ps", label="pw_gpu_r3dc1d_3d_ps", y=8.964, yerr=0.0 Summary: Performance test took 22 minutes. (cached) Status: OK Uploading artifacts... done EndDate: 2026-01-18 06:11:43+00:00