StartDate: 2026-06-21 06:42:41+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: 561f47539b1bf1b38497acb340ae8cf50844f843 CommitTime: 2026-06-21 07:39:04 +0200 CommitAuthor: Matthias Krack CommitSubject: Update CSCS CI tester (#5415) #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=561f47539b1bf1b38497acb340ae8cf50844f843 SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 420.6MB Step 1/46 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 2cd52cbb1ebe: Waiting 6e8af4fd0a07: Waiting 02cb0e091e33: Waiting 9c3d619183d2: Waiting 7f7602a82106: Waiting 6cb9b761b877: Waiting 5a2aba542b08: Waiting 15a17189b2df: Waiting 644e9b203583: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 2cd52cbb1ebe: Verifying Checksum 6e8af4fd0a07: Verifying Checksum 6e8af4fd0a07: Download complete 02cb0e091e33: Verifying Checksum 02cb0e091e33: Download complete 9c3d619183d2: Verifying Checksum 9c3d619183d2: Download complete 7f7602a82106: Verifying Checksum 7f7602a82106: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/46 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/46 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/46 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/46 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/46 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/46 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> d8bfc1674c90 Step 8/46 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> de928c312410 Step 9/46 : RUN mkdir scripts ---> Using cache ---> 4aed4b85b643 Step 10/46 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./tools/build_utils/fypp ./scripts/ ---> Using cache ---> 465e9c783ab8 Step 11/46 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> 76a695261134 Step 12/46 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --with-sirius=install --gpu-ver=V100 --dry-run ---> Using cache ---> 47dd8f5e7de2 Step 13/46 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> 3599e8a96607 Step 14/46 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> 081fe67a1269 Step 15/46 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 9ee46597153b Step 16/46 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> aeea57b7328c Step 17/46 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> 05809d996288 Step 18/46 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> 3af43b869949 Step 19/46 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> 615f324c6ec5 Step 20/46 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> 290b0ee64ab1 Step 21/46 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> 38c218b14dad Step 22/46 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> 73e6df4a2391 Step 23/46 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> Using cache ---> 142dc64a8443 Step 24/46 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Using cache ---> a405867bf6e9 Step 25/46 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> Using cache ---> 5c90edca9fe2 Step 26/46 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Using cache ---> f60cedde8618 Step 27/46 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> Using cache ---> 7939c7330cfe Step 28/46 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Using cache ---> fb8968b3058b Step 29/46 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> Using cache ---> 29fcb321cf02 Step 30/46 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Using cache ---> cfb03c3e7934 Step 31/46 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> Using cache ---> de5b0c875b02 Step 32/46 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Using cache ---> a3e64e22b184 Step 33/46 : WORKDIR /opt/cp2k ---> Using cache ---> c4b511fb09d5 Step 34/46 : COPY ./src ./src ---> 375c35e9ffd5 Step 35/46 : COPY ./data ./data ---> 5095db1a1682 Step 36/46 : COPY ./tools/build_utils ./tools/build_utils ---> fbed814684b5 Step 37/46 : COPY ./cmake ./cmake ---> d7795b40891c Step 38/46 : COPY ./CMakeLists.txt . ---> b02bdb307219 Step 39/46 : COPY ./tools/docker/scripts/build_cp2k.sh . ---> feae75d1b97f Step 40/46 : RUN ./build_cp2k.sh toolchain_cuda_V100 psmp ---> Running in 8d29680bd571 ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpi.so (found version "5.0") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpicxx.so (found version "5.0") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpifort.so (found version "5.0") -- Found MPI: TRUE (found version "5.0") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "5.0") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.33 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "5.0") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") -- Using LIBXS + LIBXSMM for Small Matrix Multiplication -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 52 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "2.1.1") found components: C Fortran -- Found MPI: TRUE (found version "5.0") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.11/include -- Checking for module 'libint2' -- Package 'libint2', required by 'virtual:world', not found -- Found Libint2: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- mctc-lib: Find installed package -- multicharge: Find installed package -- DFTD4: found version 4.2.0, using v4.2+ API -- toml-f: Find installed package -- s-dftd3: Find installed package -- DFTD4: found version 4.2.0, using v4.2+ API -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.7.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "2.1.1") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.11/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Checking for one of the modules 's-dftd3' -- Checking for one of the modules 'mctc-lib' -- Found DFTD3: /opt/cp2k-toolchain/install/tblite-0.6.0/lib/libs-dftd3.a -- Checking for one of the modules 'dftd4' -- Checking for one of the modules 'multicharge' -- Found DFTD4: /opt/cp2k-toolchain/install/tblite-0.6.0/lib/libdftd4.a -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'elpa;elpa_openmp;elpa-openmp-2019.05.001;elpa_openmp-2019.11.001;elpa_openmp-2020.05.001;elpa-2019.05.001;elpa-2019.11.001;elpa-2020.05.001' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so -- Checking for module 'libvdwxc>=0.5.0' -- Found libvdwxc, version 0.5.0 -- Checking for module 'fftw3' -- Found fftw3, version 3.3.11 -- Found LibVDWXC: vdwxc;fftw3 (Required is at least version "0.5.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -- Adding libxs_jit.F from dependency libxs for compilation -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - vendor: OpenBLAS - include directories: /opt/cp2k-toolchain/install/openblas-0.3.33/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so - LAPACK - include directories: /opt/cp2k-toolchain/install/openblas-0.3.33/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so - MPI - include directories: /opt/cp2k-toolchain/install/mpich-5.0.1/include - libraries: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpi.so - MPI_F08: ON - ScaLAPACK - vendor: auto - include directories: - libraries: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a - Hardware Acceleration: - CUDA: - GPU architecture number: 52 - GPU profiling enabled: - GPU accelerated modules - ELPA module: ON - GRID module: ON - DBM module: ON - PW module: ON - LibXC - version: 7.0.0 - include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - version: 2.1.1 - include directories: /opt/cp2k-toolchain/install/hdf5-2.1.1/include - libraries: hdf5-shared - FFTW3 - include directories: /opt/cp2k-toolchain/install/fftw-3.3.11/include - libraries: /opt/cp2k-toolchain/install/fftw-3.3.11/lib/libfftw3.a - LIBXS - include directories: - libraries: - SpLA - include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - DFTD4 - include directories : /opt/cp2k-toolchain/install/tblite-0.6.0/include;/opt/cp2k-toolchain/install/tblite-0.6.0/include/dftd4/GNU-13.3.0 - libraries : - TBLITE : - include directories : /opt/cp2k-toolchain/install/tblite-0.6.0/include;/opt/cp2k-toolchain/install/tblite-0.6.0/include/tblite/GNU-13.3.0 - tblite libraries : - SIRIUS - include directories: - libraries: - COSMA - include directories: /opt/cp2k-toolchain/install/COSMA-2.8.4/include - libraries: MPI::MPI_CXX;costa::costa;$;$;$<$:cosma::BLAS::blas>;$;$<$:Tiled-MM::Tiled-MM>;$<$:Tiled-MM::Tiled-MM>;$<$:semiprof::semiprof>;$<$:cosma::scalapack::scalapack> - Libint2 - include directories: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include - libraries: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/lib/libint2.a - ELPA - include directories: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/include/elpa_openmp-2026.02.001 - libraries: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -------------------------------------------------------------------- - - - List of dependencies not included in this build - - - -------------------------------------------------------------------- - DeePMD - PEXSI - ACE (libpace) - Spglib - LibSMEAGOL - MiMiC - openPMD - DLA-Future - PLUMED - LibFCI - GauXC - Libvori - LibTorch - TREXIO - GreenX After building and installing CP2K the regtests can be run with the following command: /opt/cp2k/tests/do_regtest.py /opt/cp2k/bin psmp -- Configuring done (12.9s) -- Generating done (0.5s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container 8d29680bd571 ---> 98a805ad7c5b Step 41/46 : COPY ./benchmarks ./benchmarks ---> 73c02fdbbf78 Step 42/46 : COPY ./tools/regtesting ./tools/regtesting ---> 80c3c085f033 Step 43/46 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> 8098ed1b8094 Step 44/46 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in 113accdf043e ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma libxs libxsmm dbcsr_acc libdftd4 dftd4_v4_2 s_dftd3 mctc-lib tblite sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 83 input files and 0 errors. ========== Running Performance Test ========== Plot: name="total_timings_6cpu_1gpu", title="Total Timings with 6 CPU Cores and 1 GPU", ylabel="time [s]" Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.028 0.029 106.241 106.242 qs_mol_dyn_low 1 2.0 0.004 0.004 105.808 105.811 qs_forces 11 3.9 0.002 0.002 105.758 105.758 qs_energies 11 4.9 0.001 0.002 94.530 94.532 scf_env_do_scf 11 5.9 0.001 0.001 78.214 78.214 scf_env_do_scf_inner_loop 112 6.6 0.006 0.009 65.904 65.904 velocity_verlet 10 3.0 0.001 0.002 65.237 65.255 rebuild_ks_matrix 123 8.3 0.001 0.001 29.006 29.010 qs_ks_build_kohn_sham_matrix 123 9.3 0.020 0.021 29.005 29.009 qs_ks_update_qs_env 123 7.6 0.001 0.001 27.130 27.134 dbcsr_multiply_generic 2402 12.5 0.159 0.160 26.927 27.000 qs_rho_update_rho_low 123 7.7 0.001 0.001 23.713 23.736 calculate_rho_elec 123 8.7 0.932 0.934 23.713 23.735 qs_scf_new_mos 112 7.6 0.001 0.001 22.208 22.213 qs_scf_loop_do_ot 112 8.6 0.001 0.001 22.207 22.212 ot_scf_mini 112 9.6 0.003 0.003 20.119 20.124 fft_wrap_pw1pw2 1241 11.7 0.024 0.024 17.546 17.577 sum_up_and_integrate 123 10.3 0.003 0.003 15.547 15.642 integrate_v_rspace 123 11.3 0.370 0.371 15.448 15.543 fft_wrap_pw1pw2_140 503 12.2 0.003 0.003 15.123 15.141 multiply_cannon 2402 13.5 0.351 0.353 13.348 13.382 init_scf_loop 11 6.9 0.000 0.000 12.224 12.225 multiply_cannon_loop 2402 14.5 0.280 0.281 12.150 12.172 make_m2s 4804 13.5 0.049 0.049 11.827 11.835 density_rs2pw 123 9.7 0.008 0.009 11.501 11.695 ot_mini 112 10.6 0.001 0.001 11.689 11.691 make_images 4804 14.5 1.233 1.237 11.637 11.643 grid_collocate_task_list 123 9.7 11.252 11.411 11.252 11.411 pw_gpu_r3dc1d_3d_ps 626 13.2 2.491 2.505 8.993 8.993 pw_gpu_c1dr3d_3d_ps 615 14.2 2.371 2.389 8.522 8.554 grid_integrate_task_list 123 12.3 8.286 8.378 8.286 8.378 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 8.080 8.191 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 7.878 7.878 init_scf_run 11 5.9 0.000 0.000 7.756 7.756 scf_env_initial_rho_setup 11 6.9 0.000 0.001 7.755 7.755 prepare_preconditioner 11 7.9 0.000 0.000 7.573 7.574 make_preconditioner 11 8.9 0.000 0.000 7.573 7.574 qs_ot_get_derivative 112 11.6 0.002 0.002 7.068 7.071 hybrid_alltoall_any 4957 16.4 5.085 5.095 7.055 7.056 make_images_data 4804 15.5 0.061 0.062 6.932 6.941 potential_pw2rs 123 12.3 0.039 0.040 6.790 6.793 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.409 6.669 multiply_cannon_multrec 4804 15.5 2.133 2.151 6.495 6.511 ot_diis_step 112 11.6 0.006 0.006 4.598 4.598 mp_alltoall_z22v 1241 15.7 4.466 4.497 4.466 4.497 build_core_ppl_forces 11 5.9 4.113 4.207 4.113 4.207 wfi_extrapolate 11 7.9 0.001 0.001 4.071 4.072 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.996 4.019 apply_preconditioner_dbcsr 123 12.6 0.000 0.000 3.976 3.979 apply_single 123 13.6 0.001 0.001 3.976 3.978 dbcsr_mm_accdrv_process 9994 16.2 0.809 0.971 3.962 3.962 mp_waitall_1 67759 16.9 3.950 3.952 3.950 3.952 dbcsr_complete_redistribute 329 12.2 1.282 1.286 3.360 3.604 calculate_dm_sparse 123 9.5 0.001 0.001 3.550 3.562 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.481 3.537 qs_ot_get_p 123 10.4 0.001 0.001 3.464 3.466 multiply_cannon_sync_h2d 4804 15.5 3.167 3.194 3.167 3.194 transfer_rs2pw 503 10.6 0.008 0.009 2.823 3.033 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.030 3.031 pw_poisson_solve 123 10.3 0.003 0.003 2.790 2.792 yz_to_x 626 14.2 0.485 0.488 2.762 2.779 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.751 2.752 x_to_yz 615 15.2 0.518 0.521 2.706 2.716 jit_kernel_multiply 12 15.7 2.526 2.689 2.526 2.689 qs_create_task_list 11 7.9 0.000 0.000 2.601 2.684 generate_qs_task_list 11 8.9 1.179 1.189 2.600 2.684 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.656 2.668 transfer_rs2pw_140 134 11.5 1.652 1.676 2.358 2.583 qs_ot_get_derivative_taylor 63 13.0 0.003 0.003 2.560 2.561 calculate_first_density_matrix 1 7.0 0.000 0.000 2.376 2.376 cp_fm_cholesky_invert 11 10.9 2.364 2.364 2.364 2.364 pw_gpu_fg 626 14.2 2.272 2.283 2.272 2.283 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.225 2.225 qs_ot_p2m_diag 50 11.0 0.089 0.090 2.207 2.209 dbcsr_special_finalize 7206 15.5 0.043 0.043 2.165 2.169 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64", label="H2O-64", y=106.241, yerr=0.0 Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=73.039, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=11.252, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.286, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=5.085, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=4.466, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=4.113, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.027 0.027 101.141 101.141 qs_mol_dyn_low 1 2.0 0.005 0.005 100.687 100.690 qs_forces 11 3.9 0.002 0.002 100.637 100.637 qs_energies 11 4.9 0.001 0.001 89.199 89.199 scf_env_do_scf 11 5.9 0.001 0.001 72.075 72.075 velocity_verlet 10 3.0 0.001 0.002 64.111 64.129 scf_env_do_scf_inner_loop 101 6.6 0.006 0.008 59.568 59.568 rebuild_ks_matrix 112 8.3 0.001 0.001 27.719 27.720 qs_ks_build_kohn_sham_matrix 112 9.3 0.018 0.018 27.718 27.719 dbcsr_multiply_generic 2056 12.5 0.136 0.136 25.505 25.589 qs_ks_update_qs_env 112 7.6 0.001 0.001 25.565 25.567 qs_rho_update_rho_low 112 7.7 0.001 0.001 20.691 20.707 calculate_rho_elec 112 8.7 0.840 0.842 20.690 20.707 qs_scf_new_mos 101 7.6 0.001 0.001 20.406 20.411 qs_scf_loop_do_ot 101 8.6 0.001 0.001 20.405 20.410 ot_scf_mini 101 9.6 0.003 0.003 18.547 18.548 fft_wrap_pw1pw2 1131 11.7 0.022 0.022 16.179 16.227 sum_up_and_integrate 112 10.3 0.002 0.002 15.244 15.328 integrate_v_rspace 112 11.3 0.336 0.337 15.154 15.237 fft_wrap_pw1pw2_140 459 12.2 0.003 0.003 13.954 14.034 multiply_cannon 2056 13.5 0.312 0.313 12.495 12.804 init_scf_loop 11 6.9 0.000 0.000 12.421 12.422 make_m2s 4112 13.5 0.043 0.044 11.439 11.761 make_images 4112 14.5 1.221 1.356 11.269 11.589 multiply_cannon_loop 2056 14.5 0.243 0.245 11.188 11.191 ot_mini 101 10.6 0.001 0.001 10.856 10.857 density_rs2pw 112 9.7 0.007 0.008 10.602 10.765 grid_collocate_task_list 112 9.7 9.220 9.349 9.220 9.349 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.739 8.739 grid_integrate_task_list 112 12.3 8.648 8.731 8.648 8.731 pw_gpu_r3dc1d_3d_ps 571 13.2 2.384 2.516 8.384 8.390 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 8.094 8.220 pw_gpu_c1dr3d_3d_ps 560 14.2 2.158 2.185 7.767 7.809 init_scf_run 11 5.9 0.000 0.000 7.723 7.723 scf_env_initial_rho_setup 11 6.9 0.000 0.001 7.722 7.722 prepare_preconditioner 11 7.9 0.000 0.000 7.671 7.682 make_preconditioner 11 8.9 0.000 0.000 7.671 7.682 hybrid_alltoall_any 4259 16.3 4.803 5.066 6.925 6.972 make_images_data 4112 15.5 0.053 0.053 6.743 6.775 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.492 6.758 qs_ot_get_derivative 101 11.6 0.002 0.002 6.386 6.387 potential_pw2rs 112 12.3 0.035 0.035 6.170 6.171 multiply_cannon_multrec 4112 15.5 1.900 1.905 6.089 6.130 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.368 4.520 ot_diis_step 101 11.6 0.006 0.006 4.448 4.448 mp_alltoall_z22v 1131 15.7 4.184 4.345 4.184 4.345 build_core_ppl_forces 11 5.9 4.115 4.219 4.115 4.219 mp_waitall_1 58027 16.9 3.851 4.116 3.851 4.116 apply_preconditioner_dbcsr 112 12.6 0.000 0.000 4.043 4.043 apply_single 112 13.6 0.001 0.001 4.043 4.043 wfi_extrapolate 11 7.9 0.001 0.002 4.030 4.030 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.939 3.982 dbcsr_mm_accdrv_process 8840 16.2 0.766 0.932 3.832 3.876 dbcsr_complete_redistribute 317 12.2 1.238 1.270 3.470 3.706 qs_create_task_list 11 7.9 0.000 0.000 3.483 3.587 generate_qs_task_list 11 8.9 1.459 1.470 3.483 3.587 calculate_dm_sparse 112 9.5 0.001 0.001 3.290 3.293 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.211 3.211 qs_ot_get_p 112 10.4 0.001 0.001 3.019 3.020 multiply_cannon_sync_h2d 4112 15.5 2.857 2.874 2.857 2.874 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.861 2.862 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.810 2.833 transfer_rs2pw 459 10.6 0.008 0.008 2.577 2.771 yz_to_x 571 14.2 0.438 0.442 2.619 2.757 jit_kernel_multiply 12 15.7 2.496 2.706 2.496 2.706 pw_poisson_solve 112 10.3 0.003 0.003 2.536 2.541 x_to_yz 560 15.2 0.466 0.467 2.469 2.487 calculate_first_density_matrix 1 7.0 0.000 0.000 2.406 2.407 transfer_rs2pw_140 123 11.5 1.519 1.539 2.157 2.363 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.331 2.332 cp_fm_cholesky_invert 11 10.9 2.298 2.298 2.298 2.298 transfer_dbcsr_to_fm 11 10.9 0.002 0.002 2.231 2.248 qs_ot_get_derivative_taylor 58 13.0 0.002 0.003 2.183 2.184 build_core_ppl 11 7.9 2.064 2.098 2.064 2.098 pw_gpu_fg 571 14.2 2.029 2.036 2.029 2.036 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64_nonortho", label="H2O-64_nonortho", y=101.141, yerr=0.0 Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=70.171, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=9.22, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=8.648, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.803, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=4.184, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=4.115, yerr=0.0 Running w64PBE.inp with 3 threads and 2 ranks... failed. ----------------------------------- OT --------------------------------------- Step Update method Time Convergence Total energy Change ------------------------------------------------------------------------------ 1 OT DIIS 0.80E-01 3.8 0.00000703 -1102.7676350086 -3.54E-10 2 OT DIIS 0.80E-01 1.7 0.00000283 -1102.7676346782 3.30E-07 3 OT DIIS 0.80E-01 1.6 0.00000236 -1102.7676348208 -1.43E-07 4 OT DIIS 0.80E-01 1.6 0.00000421 -1102.7676348534 -3.26E-08 5 OT DIIS 0.80E-01 1.6 0.00001724 -1102.7676348558 -2.44E-09 6 OT DIIS 0.80E-01 1.6 0.00000220 -1102.7676348559 -1.18E-10 7 OT DIIS 0.80E-01 1.7 0.00000231 -1102.7676348713 -1.54E-08 8 OT DIIS 0.80E-01 1.6 0.00000514 -1102.7676348787 -7.37E-09 9 OT DIIS 0.80E-01 1.6 0.00000413 -1102.7676349406 -6.20E-08 10 OT DIIS 0.80E-01 1.6 0.00000895 -1102.7676349409 -2.95E-10 Leaving inner SCF loop after reaching 10 steps. Electronic density on regular grids: -512.0000000044 -0.0000000044 Core density on regular grids: 511.9999999998 -0.0000000002 Total charge density on r-space grids: -0.0000000045 Total charge density g-space grids: -0.0000000045 Overlap energy of the core charge distribution: 0.00000091569564 Self energy of the core charge distribution: -2838.67351367283345 Core Hamiltonian energy: 824.05925135056964 Hartree energy: 1182.15846501641886 Exchange-correlation energy: -270.31183855076904 Total energy: -1102.76763494091847 outer SCF iter = 10 RMS gradient = 0.89E-05 energy = -1102.7676349409 ----------------------------------- OT --------------------------------------- Minimizer : DIIS : direct inversion in the iterative subspace using 7 DIIS vectors safer DIIS on Preconditioner : FULL_SINGLE_INVERSE : inversion of H + eS - 2*(Sc)(c^T*H*c+const)(Sc)^T Precond_solver : DEFAULT stepsize : 0.08000000 energy_gap : 0.08000000 eps_taylor : 0.10000E-15 max_taylor : 4 ----------------------------------- OT --------------------------------------- Step Update method Time Convergence Total energy Change ------------------------------------------------------------------------------ 1 OT DIIS 0.80E-01 3.8 0.00000536 -1102.7676349434 -2.49E-09 2 OT DIIS 0.80E-01 1.6 0.00000188 -1102.7676348212 1.22E-07 3 OT DIIS 0.80E-01 1.6 0.00005344 -1102.7676349084 -8.72E-08 4 OT SD 0.80E-01 1.6 0.00001053 -1102.7676349084 -3.41E-12 5 OT DIIS 0.80E-01 1.6 0.00000417 -1102.7676342397 6.69E-07 6 OT DIIS 0.80E-01 1.6 0.00000138 -1102.7676349069 -6.67E-07 7 OT DIIS 0.80E-01 1.6 0.00001760 -1102.7676348903 1.67E-08 8 OT SD 0.80E-01 1.6 0.00000156 -1102.7676348902 2.16E-11 9 OT SD 0.80E-01 1.6 0.00000097 -1102.7676349576 -6.74E-08 10 OT SD 0.80E-01 1.6 0.00000065 -1102.7676349870 -2.94E-08 Leaving inner SCF loop after reaching 10 steps. Electronic density on regular grids: -512.0000000044 -0.0000000044 Core density on regular grids: 511.9999999998 -0.0000000002 Total charge density on r-space grids: -0.0000000045 Total charge density g-space grids: -0.0000000045 Overlap energy of the core charge distribution: 0.00000091569564 Self energy of the core charge distribution: -2838.67351367283345 Core Hamiltonian energy: 824.05923972748417 Hartree energy: 1182.15847412261087 Exchange-correlation energy: -270.31183607997832 Total energy: -1102.76763498702121 outer SCF iter = 11 RMS gradient = 0.65E-06 energy = -1102.7676349870 outer SCF loop FAILED to converge after 11 iterations or 110 steps ******************************************************************************* * ___ * * / \ * * [ABORT] * * \___/ SCF run NOT converged. To continue the calculation regardless, * * | please set the keyword IGNORE_CONVERGENCE_FAILURE. * * O/| * * /| | * * / \ qs_scf.F:686 * ******************************************************************************* ===== Routine Calling Stack ===== 5 scf_env_do_scf 4 qs_energies 3 qs_forces 2 qs_mol_dyn_low 1 CP2K Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL STOP 1 Summary: Running w64PBE.inp failed. Status: FAILED ---> Removed intermediate container 113accdf043e ---> 897e21a353d5 Step 45/46 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in 04ea57a34f45 ---> Removed intermediate container 04ea57a34f45 ---> 4fd36a41be0b Step 46/46 : ENTRYPOINT [] ---> Running in 66611d8c5824 ---> Removed intermediate container 66611d8c5824 ---> 274432ecdc1e [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built 274432ecdc1e Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2026-06-21 07:09:45+00:00