StartDate: 2026-06-01 07:12:46+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: d5c4d393fe7031cf97cc6f31a3f9402441e7e64a CommitTime: 2026-05-31 23:29:09 +0200 CommitAuthor: Dynamics of Condensed Matter CommitSubject: Add native-grid SKALA CUDA atom chunk controls (#5323) #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=d5c4d393fe7031cf97cc6f31a3f9402441e7e64a SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 420.3MB Step 1/46 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 9c3d619183d2: Waiting 7f7602a82106: Waiting 2cd52cbb1ebe: Waiting 6e8af4fd0a07: Waiting 15a17189b2df: Waiting 5a2aba542b08: Waiting 02cb0e091e33: Waiting 6cb9b761b877: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 2cd52cbb1ebe: Verifying Checksum 2cd52cbb1ebe: Download complete 6e8af4fd0a07: Download complete 02cb0e091e33: Verifying Checksum 02cb0e091e33: Download complete 9c3d619183d2: Download complete 7f7602a82106: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/46 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/46 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/46 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/46 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/46 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/46 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> d8bfc1674c90 Step 8/46 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> de928c312410 Step 9/46 : RUN mkdir scripts ---> Using cache ---> 4aed4b85b643 Step 10/46 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./scripts/ ---> Using cache ---> ce9efe84db60 Step 11/46 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> dfc1a5ca7e3f Step 12/46 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --gpu-ver=V100 --dry-run ---> Using cache ---> 1bc3916e19c7 Step 13/46 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> bbd97369be82 Step 14/46 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> fbbd58fb6405 Step 15/46 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 9707298b4465 Step 16/46 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> 10af8edef201 Step 17/46 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> cde1e5c7df26 Step 18/46 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> e634e183ddda Step 19/46 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> 90e1d29eaee5 Step 20/46 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> 456e432c42cd Step 21/46 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> 25314ed00994 Step 22/46 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> 2f32d5fcf1ca Step 23/46 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> Using cache ---> f6eb71d2ea73 Step 24/46 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Using cache ---> 89a999028ecd Step 25/46 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> Using cache ---> 4fee466a0efd Step 26/46 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Using cache ---> 4a225437d875 Step 27/46 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> Using cache ---> b3bdd93e7b5e Step 28/46 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Using cache ---> fc993b0523c8 Step 29/46 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> Using cache ---> b243c28b2b5f Step 30/46 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Using cache ---> ac272cd10306 Step 31/46 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> Using cache ---> 3ae08df2098f Step 32/46 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Using cache ---> 8632987b9f69 Step 33/46 : WORKDIR /opt/cp2k ---> Using cache ---> b27caf79383d Step 34/46 : COPY ./src ./src ---> c4db0dcfe06f Step 35/46 : COPY ./data ./data ---> cfcb0aee6139 Step 36/46 : COPY ./tools/build_utils ./tools/build_utils ---> 1c24d7d19b4c Step 37/46 : COPY ./cmake ./cmake ---> 08741f69d9d5 Step 38/46 : COPY ./CMakeLists.txt . ---> f4b3b4cd0500 Step 39/46 : COPY ./tools/docker/scripts/build_cp2k.sh . ---> f01322899c30 Step 40/46 : RUN ./build_cp2k.sh toolchain_cuda_V100 psmp ---> Running in 7097bc288a60 ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpi.so (found version "5.0") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpicxx.so (found version "5.0") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpifort.so (found version "5.0") -- Found MPI: TRUE (found version "5.0") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "5.0") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.33 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Checking for module 'libxsmm-shared' -- Found libxsmm-shared, version 1.17.0 -- Checking for module 'libxsmmf-shared' -- Found libxsmmf-shared, version 1.17.0 -- Checking for module 'libxsmmext-shared' -- Found libxsmmext-shared, version 1.17.0 -- Checking for module 'libxsmmnoblas-shared' -- Found libxsmmnoblas-shared, version 1.17.0 -- Found LibXSMM: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include -- Using LIBXSMM for Small Matrix Multiplication -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a -- CP2K_WITH_GPU is deprecated in favor of CMAKE_HIP_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 70 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "5.0") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Checking for module 'libxsmmf' -- Found libxsmmf, version 1.17.0 -- Checking for module 'libxsmmext' -- Found libxsmmext, version 1.17.0 ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "2.1.1") found components: C Fortran -- Found MPI: TRUE (found version "5.0") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.11/include -- Checking for module 'libint2' -- Package 'libint2', required by 'virtual:world', not found -- Found Libint2: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.7.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "2.1.1") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.11/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'elpa;elpa_openmp;elpa-openmp-2019.05.001;elpa_openmp-2019.11.001;elpa_openmp-2020.05.001;elpa-2019.05.001;elpa-2019.11.001;elpa-2020.05.001' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so -- Checking for module 'libvdwxc>=0.5.0' -- Found libvdwxc, version 0.5.0 -- Checking for module 'fftw3' -- Found fftw3, version 3.3.11 -- Found LibVDWXC: vdwxc;fftw3 (Required is at least version "0.5.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - vendor: OpenBLAS - include directories: /opt/cp2k-toolchain/install/openblas-0.3.33/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so - LAPACK - include directories: /opt/cp2k-toolchain/install/openblas-0.3.33/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so - MPI - include directories: /opt/cp2k-toolchain/install/mpich-5.0.1/include - libraries: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpi.so - MPI_F08: ON - ScaLAPACK - vendor: auto - include directories: - libraries: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a - Hardware Acceleration: - CUDA: - GPU architecture number: 70 - GPU profiling enabled: - GPU accelerated modules - ELPA module: ON - GRID module: ON - DBM module: ON - PW module: ON - LibXC - version: 7.0.0 - include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - version: 2.1.1 - include directories: /opt/cp2k-toolchain/install/hdf5-2.1.1/include - libraries: hdf5-shared - FFTW3 - include directories: /opt/cp2k-toolchain/install/fftw-3.3.11/include - libraries: /opt/cp2k-toolchain/install/fftw-3.3.11/lib/libfftw3.a - LIBXSMM - include directories: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include - libraries: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmext.so;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so;/opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmf.so;:libxsmmext.a;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so;/opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmm.so - SpLA - include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - SIRIUS - include directories: - libraries: - COSMA - include directories: /opt/cp2k-toolchain/install/COSMA-2.8.4/include - libraries: MPI::MPI_CXX;costa::costa;$;$;$<$:cosma::BLAS::blas>;$;$<$:Tiled-MM::Tiled-MM>;$<$:Tiled-MM::Tiled-MM>;$<$:semiprof::semiprof>;$<$:cosma::scalapack::scalapack> - Libint2 - include directories: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include - libraries: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/lib/libint2.a - ELPA - include directories: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/include/elpa_openmp-2026.02.001 - libraries: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -------------------------------------------------------------------- - - - List of dependencies not included in this build - - - -------------------------------------------------------------------- - DFTD4 - DeePMD - PEXSI - ACE (libpace) - TBLITE - Spglib - LibSMEAGOL - MiMiC - openPMD - DLA-Future - PLUMED - LibFCI - GauXC - Libvori - LibTorch - TREXIO - GreenX After building and installing CP2K the regtests can be run with the following command: /opt/cp2k/tests/do_regtest.py /opt/cp2k/bin psmp -- Configuring done (15.0s) -- Generating done (0.5s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container 7097bc288a60 ---> 22417e912eb2 Step 41/46 : COPY ./benchmarks ./benchmarks ---> 4e1165d088ec Step 42/46 : COPY ./tools/regtesting ./tools/regtesting ---> 8fb07074041b Step 43/46 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> 292379699a57 Step 44/46 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in 08a9b8539300 ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma xsmm dbcsr_acc sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 83 input files and 0 errors. ========== Running Performance Test ========== Plot: name="total_timings_6cpu_1gpu", title="Total Timings with 6 CPU Cores and 1 GPU", ylabel="time [s]" Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.031 0.033 99.859 99.859 qs_mol_dyn_low 1 2.0 0.004 0.004 99.415 99.417 qs_forces 11 3.9 0.002 0.002 99.363 99.363 qs_energies 11 4.9 0.001 0.001 88.475 88.475 scf_env_do_scf 11 5.9 0.001 0.001 73.229 73.229 scf_env_do_scf_inner_loop 109 6.5 0.006 0.008 62.282 62.283 velocity_verlet 10 3.0 0.002 0.002 62.047 62.066 rebuild_ks_matrix 120 8.3 0.001 0.001 26.849 26.853 qs_ks_build_kohn_sham_matrix 120 9.3 0.020 0.020 26.848 26.852 dbcsr_multiply_generic 2325 12.5 0.147 0.148 26.115 26.167 qs_ks_update_qs_env 120 7.6 0.001 0.001 24.964 24.967 qs_scf_new_mos 109 7.5 0.001 0.001 21.259 21.272 qs_scf_loop_do_ot 109 8.5 0.001 0.001 21.258 21.271 qs_rho_update_rho_low 120 7.7 0.001 0.001 20.897 20.911 calculate_rho_elec 120 8.7 0.890 0.899 20.896 20.910 ot_scf_mini 109 9.5 0.003 0.003 19.224 19.227 fft_wrap_pw1pw2 1211 11.7 0.023 0.024 17.005 17.031 fft_wrap_pw1pw2_140 491 12.2 0.003 0.003 14.639 14.658 sum_up_and_integrate 120 10.3 0.003 0.003 13.759 13.809 integrate_v_rspace 120 11.3 0.353 0.353 13.662 13.712 multiply_cannon 2325 13.5 0.346 0.352 13.113 13.126 multiply_cannon_loop 2325 14.5 0.265 0.267 11.995 12.000 make_m2s 4650 13.5 0.044 0.045 11.321 11.328 ot_mini 109 10.5 0.001 0.001 11.203 11.206 make_images 4650 14.5 1.228 1.243 11.142 11.151 density_rs2pw 120 9.7 0.008 0.008 11.040 11.146 init_scf_loop 11 6.9 0.000 0.000 10.858 10.858 grid_collocate_task_list 120 9.7 8.937 9.023 8.937 9.023 pw_gpu_r3dc1d_3d_ps 611 13.1 2.406 2.421 8.732 8.744 pw_gpu_c1dr3d_3d_ps 600 14.2 2.298 2.319 8.243 8.281 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.757 7.872 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 7.721 7.721 prepare_preconditioner 11 7.9 0.000 0.000 7.560 7.563 make_preconditioner 11 8.9 0.000 0.000 7.560 7.563 init_scf_run 11 5.9 0.000 0.000 6.858 6.858 scf_env_initial_rho_setup 11 6.9 0.000 0.001 6.858 6.858 qs_ot_get_derivative 109 11.5 0.002 0.002 6.792 6.796 grid_integrate_task_list 120 12.3 6.708 6.758 6.708 6.758 hybrid_alltoall_any 4803 16.4 4.919 4.938 6.700 6.710 potential_pw2rs 120 12.3 0.038 0.038 6.600 6.601 make_images_data 4650 15.5 0.056 0.056 6.583 6.587 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.309 6.579 multiply_cannon_multrec 4650 15.5 2.235 2.273 6.384 6.422 ot_diis_step 109 11.5 0.006 0.006 4.385 4.385 mp_alltoall_z22v 1211 15.7 4.327 4.338 4.327 4.338 build_core_ppl_forces 11 5.9 3.907 4.008 3.907 4.008 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.960 3.990 wfi_extrapolate 11 7.9 0.001 0.001 3.976 3.976 apply_preconditioner_dbcsr 120 12.6 0.000 0.000 3.807 3.810 apply_single 120 13.6 0.001 0.001 3.807 3.810 dbcsr_complete_redistribute 329 12.2 1.434 1.443 3.514 3.790 mp_waitall_1 65591 16.9 3.747 3.761 3.747 3.761 dbcsr_mm_accdrv_process 9714 16.2 0.839 0.912 3.749 3.757 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.388 3.429 calculate_dm_sparse 120 9.5 0.001 0.001 3.393 3.403 qs_ot_get_p 120 10.4 0.001 0.001 3.322 3.324 multiply_cannon_sync_h2d 4650 15.5 3.164 3.235 3.164 3.235 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.010 3.011 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.924 2.926 transfer_rs2pw 491 10.6 0.008 0.008 2.657 2.789 pw_poisson_solve 120 10.3 0.003 0.003 2.716 2.720 yz_to_x 611 14.1 0.471 0.471 2.695 2.704 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.678 2.678 x_to_yz 600 15.2 0.507 0.507 2.610 2.612 qs_create_task_list 11 7.9 0.000 0.000 2.523 2.599 generate_qs_task_list 11 8.9 1.141 1.152 2.523 2.599 calculate_first_density_matrix 1 7.0 0.000 0.000 2.432 2.432 jit_kernel_multiply 11 15.7 2.306 2.388 2.306 2.388 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.372 2.373 transfer_rs2pw_140 131 11.5 1.577 1.600 2.206 2.349 qs_ot_get_derivative_taylor 60 13.0 0.003 0.003 2.294 2.295 cp_fm_cholesky_invert 11 10.9 2.278 2.278 2.278 2.278 pw_gpu_fg 611 14.1 2.202 2.232 2.202 2.232 copy_fm_to_dbcsr 176 11.2 0.002 0.002 1.842 2.120 qs_ot_p2m_diag 50 11.0 0.087 0.088 2.111 2.112 dbcsr_special_finalize 6975 15.5 0.041 0.041 2.092 2.105 build_core_ppl 11 7.9 2.033 2.066 2.033 2.066 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.060 2.061 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64", label="H2O-64", y=99.859, yerr=0.0 Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=71.06099999999999, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=8.937, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=6.708, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.919, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=4.327, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.907, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.027 0.028 95.224 95.224 qs_mol_dyn_low 1 2.0 0.004 0.005 94.754 94.757 qs_forces 11 3.9 0.002 0.002 94.706 94.707 qs_energies 11 4.9 0.001 0.001 83.744 83.745 scf_env_do_scf 11 5.9 0.001 0.001 67.817 67.818 velocity_verlet 10 3.0 0.001 0.002 61.324 61.341 scf_env_do_scf_inner_loop 101 6.6 0.006 0.009 56.763 56.764 rebuild_ks_matrix 112 8.3 0.001 0.001 26.039 26.040 qs_ks_build_kohn_sham_matrix 112 9.3 0.018 0.018 26.039 26.039 dbcsr_multiply_generic 2076 12.5 0.132 0.134 23.953 23.995 qs_ks_update_qs_env 112 7.6 0.001 0.001 23.951 23.953 qs_scf_new_mos 101 7.6 0.001 0.001 19.149 19.149 qs_scf_loop_do_ot 101 8.6 0.001 0.001 19.148 19.148 qs_rho_update_rho_low 112 7.7 0.001 0.001 18.448 18.464 calculate_rho_elec 112 8.7 0.824 0.832 18.447 18.463 ot_scf_mini 101 9.6 0.003 0.003 17.321 17.322 fft_wrap_pw1pw2 1131 11.7 0.021 0.022 15.953 15.993 sum_up_and_integrate 112 10.3 0.002 0.002 13.937 13.979 integrate_v_rspace 112 11.3 0.333 0.334 13.847 13.889 fft_wrap_pw1pw2_140 459 12.2 0.003 0.003 13.747 13.765 multiply_cannon 2076 13.5 0.305 0.315 12.044 12.049 multiply_cannon_loop 2076 14.5 0.238 0.241 11.078 11.105 init_scf_loop 11 6.9 0.000 0.000 10.966 10.966 density_rs2pw 112 9.7 0.007 0.007 10.354 10.457 make_m2s 4152 13.5 0.040 0.041 10.395 10.396 make_images 4152 14.5 1.127 1.149 10.234 10.237 ot_mini 101 10.6 0.001 0.001 10.110 10.110 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.548 8.548 pw_gpu_r3dc1d_3d_ps 571 13.2 2.249 2.270 8.186 8.195 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.695 7.833 pw_gpu_c1dr3d_3d_ps 560 14.2 2.129 2.150 7.740 7.789 prepare_preconditioner 11 7.9 0.000 0.000 7.632 7.636 make_preconditioner 11 8.9 0.000 0.000 7.632 7.635 grid_integrate_task_list 112 12.3 7.377 7.418 7.377 7.418 grid_collocate_task_list 112 9.7 7.235 7.307 7.235 7.307 init_scf_run 11 5.9 0.000 0.000 6.718 6.718 scf_env_initial_rho_setup 11 6.9 0.000 0.001 6.717 6.717 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.390 6.645 hybrid_alltoall_any 4299 16.4 4.530 4.539 6.232 6.273 potential_pw2rs 112 12.3 0.035 0.035 6.136 6.137 qs_ot_get_derivative 101 11.6 0.001 0.001 6.123 6.124 make_images_data 4152 15.5 0.049 0.049 6.040 6.073 multiply_cannon_multrec 4152 15.5 1.984 2.001 5.975 5.989 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.286 4.445 mp_alltoall_z22v 1131 15.7 4.029 4.074 4.029 4.074 build_core_ppl_forces 11 5.9 3.876 3.986 3.876 3.986 ot_diis_step 101 11.6 0.005 0.005 3.963 3.963 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.857 3.917 dbcsr_complete_redistribute 317 12.2 1.407 1.428 3.612 3.870 wfi_extrapolate 11 7.9 0.001 0.001 3.866 3.866 dbcsr_mm_accdrv_process 8880 16.2 0.877 0.880 3.630 3.635 qs_create_task_list 11 7.9 0.000 0.000 3.414 3.513 generate_qs_task_list 11 8.9 1.426 1.446 3.414 3.513 apply_preconditioner_dbcsr 112 12.6 0.000 0.000 3.495 3.496 apply_single 112 13.6 0.001 0.001 3.495 3.496 mp_waitall_1 58587 16.9 3.394 3.466 3.394 3.466 calculate_dm_sparse 112 9.5 0.001 0.001 3.155 3.156 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.123 3.123 qs_ot_get_p 112 10.4 0.001 0.001 2.935 2.937 multiply_cannon_sync_h2d 4152 15.5 2.899 2.929 2.899 2.929 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.928 2.928 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.859 2.864 transfer_rs2pw 459 10.6 0.007 0.008 2.493 2.642 pw_poisson_solve 112 10.3 0.003 0.003 2.511 2.516 yz_to_x 571 14.2 0.437 0.443 2.489 2.516 x_to_yz 560 15.2 0.473 0.477 2.450 2.459 calculate_first_density_matrix 1 7.0 0.000 0.000 2.413 2.414 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.360 2.360 cp_fm_cholesky_invert 11 10.9 2.285 2.285 2.285 2.285 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.241 2.247 transfer_rs2pw_140 123 11.5 1.474 1.499 2.074 2.234 jit_kernel_multiply 10 15.6 2.200 2.201 2.200 2.201 pw_gpu_fg 571 14.2 2.107 2.120 2.107 2.120 copy_fm_to_dbcsr 170 11.1 0.001 0.001 1.833 2.093 qs_ot_get_derivative_taylor 58 13.0 0.002 0.002 2.054 2.055 build_core_ppl 11 7.9 1.979 2.024 1.979 2.024 dbcsr_special_finalize 6228 15.5 0.037 0.037 1.932 1.936 build_kinetic_matrix_low 22 6.9 1.803 1.818 1.899 1.915 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64_nonortho", label="H2O-64_nonortho", y=95.224, yerr=0.0 Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=68.177, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.377, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=7.235, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.53, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=4.029, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.876, yerr=0.0 Running w64PBE.inp with 3 threads and 2 ranks... failed. ----------------------------------- OT --------------------------------------- Step Update method Time Convergence Total energy Change ------------------------------------------------------------------------------ 1 OT DIIS 0.80E-01 3.9 0.00000111 -1102.7676349624 3.25E-09 2 OT DIIS 0.80E-01 1.7 0.00000290 -1102.7676349863 -2.39E-08 3 OT DIIS 0.80E-01 1.7 0.00000075 -1102.7676349838 2.50E-09 4 OT DIIS 0.80E-01 1.7 0.00004142 -1102.7676349936 -9.75E-09 5 OT DIIS 0.80E-01 1.7 0.00000330 -1102.7676349936 -5.00E-12 6 OT DIIS 0.80E-01 1.7 0.00000321 -1102.7676349937 -1.12E-10 7 OT DIIS 0.80E-01 1.7 0.00000054 -1102.7676349938 -1.26E-10 8 OT DIIS 0.80E-01 1.7 0.00000069 -1102.7676350008 -6.95E-09 9 OT DIIS 0.80E-01 1.7 0.00000960 -1102.7676350011 -3.25E-10 10 OT DIIS 0.80E-01 1.7 0.00000492 -1102.7676350041 -2.96E-09 Leaving inner SCF loop after reaching 10 steps. Electronic density on regular grids: -512.0000000044 -0.0000000044 Core density on regular grids: 511.9999999998 -0.0000000002 Total charge density on r-space grids: -0.0000000045 Total charge density g-space grids: -0.0000000045 Overlap energy of the core charge distribution: 0.00000091569564 Self energy of the core charge distribution: -2838.67351367283345 Core Hamiltonian energy: 824.05925401359036 Hartree energy: 1182.15846161755826 Exchange-correlation energy: -270.31183787806174 Total energy: -1102.76763500405104 outer SCF iter = 10 RMS gradient = 0.49E-05 energy = -1102.7676350041 ----------------------------------- OT --------------------------------------- Minimizer : DIIS : direct inversion in the iterative subspace using 7 DIIS vectors safer DIIS on Preconditioner : FULL_SINGLE_INVERSE : inversion of H + eS - 2*(Sc)(c^T*H*c+const)(Sc)^T Precond_solver : DEFAULT stepsize : 0.08000000 energy_gap : 0.08000000 eps_taylor : 0.10000E-15 max_taylor : 4 ----------------------------------- OT --------------------------------------- Step Update method Time Convergence Total energy Change ------------------------------------------------------------------------------ 1 OT DIIS 0.80E-01 3.9 0.00000775 -1102.7676350035 5.30E-10 2 OT DIIS 0.80E-01 1.7 0.00000606 -1102.7676345828 4.21E-07 3 OT DIIS 0.80E-01 1.7 0.00000551 -1102.7676346788 -9.60E-08 4 OT DIIS 0.80E-01 1.7 0.00000227 -1102.7676347360 -5.72E-08 5 OT SD 0.80E-01 1.7 0.00003990 -1102.7676348313 -9.54E-08 6 OT DIIS 0.80E-01 1.7 0.00001560 -1102.7676245153 1.03E-05 7 OT DIIS 0.80E-01 1.7 0.00000267 -1102.7676348355 -1.03E-05 8 OT DIIS 0.80E-01 1.7 0.00000342 -1102.7676348369 -1.38E-09 9 OT DIIS 0.80E-01 1.7 0.00000629 -1102.7676349139 -7.70E-08 10 OT DIIS 0.80E-01 1.7 0.00001110 -1102.7676349106 3.27E-09 Leaving inner SCF loop after reaching 10 steps. Electronic density on regular grids: -512.0000000044 -0.0000000044 Core density on regular grids: 511.9999999998 -0.0000000002 Total charge density on r-space grids: -0.0000000045 Total charge density g-space grids: -0.0000000045 Overlap energy of the core charge distribution: 0.00000091569564 Self energy of the core charge distribution: -2838.67351367283345 Core Hamiltonian energy: 824.05924188765073 Hartree energy: 1182.15847561754390 Exchange-correlation energy: -270.31183965867336 Total energy: -1102.76763491061638 outer SCF iter = 11 RMS gradient = 0.11E-04 energy = -1102.7676349106 outer SCF loop FAILED to converge after 11 iterations or 110 steps ******************************************************************************* * ___ * * / \ * * [ABORT] * * \___/ SCF run NOT converged. To continue the calculation regardless, * * | please set the keyword IGNORE_CONVERGENCE_FAILURE. * * O/| * * /| | * * / \ qs_scf.F:685 * ******************************************************************************* ===== Routine Calling Stack ===== 5 scf_env_do_scf 4 qs_energies 3 qs_forces 2 qs_mol_dyn_low 1 CP2K Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL STOP 1 Summary: Running w64PBE.inp failed. Status: FAILED ---> Removed intermediate container 08a9b8539300 ---> e669e54a31c2 Step 45/46 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in bfeb1c56d97c ---> Removed intermediate container bfeb1c56d97c ---> c5f49c29cedd Step 46/46 : ENTRYPOINT [] ---> Running in e6df56f256ec ---> Removed intermediate container e6df56f256ec ---> be161be73914 [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built be161be73914 Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2026-06-01 07:38:43+00:00