StartDate: 2026-06-02 07:12:28+00:00 CpuId: 12x Intel Xeon W 2000 / D-2100 (Skylake / Cascade Lake) {Skylake}, 14nm GpuId: 1x Tesla V100-SXM2-16GB CommitSHA: e9e4980e053667e8b3e70104a56ee523a2c688f7 CommitTime: 2026-06-01 22:30:40 +0200 CommitAuthor: Ole Schütt CommitSubject: Adjust tolerance of simple_non-ortho_grid_dgemm.inp (#5336) #################### Building Image cp2k-perf-cuda-volta #################### Dockerfile: /tools/docker/Dockerfile.test_performance_cuda_V100 Build-Path: / Build-Args: GIT_COMMIT_SHA=e9e4980e053667e8b3e70104a56ee523a2c688f7 SPACK_CACHE=gs://cp2k-spack-cache Build-Cache: Yes Populating docker build cache... done. DEPRECATED: The legacy builder is deprecated and will be removed in a future release. BuildKit is currently disabled; enable it by removing the DOCKER_BUILDKIT=0 environment-variable. Sending build context to Docker daemon 420.3MB Step 1/46 : FROM nvidia/cuda:12.9.1-devel-ubuntu24.04 12.9.1-devel-ubuntu24.04: Pulling from nvidia/cuda 32f112e3802c: Pulling fs layer 644e9b203583: Pulling fs layer 02559cd4bc8d: Pulling fs layer 2cd52cbb1ebe: Pulling fs layer 6e8af4fd0a07: Pulling fs layer 15a17189b2df: Pulling fs layer 02cb0e091e33: Pulling fs layer 9c3d619183d2: Pulling fs layer 7f7602a82106: Pulling fs layer 5a2aba542b08: Pulling fs layer 6cb9b761b877: Pulling fs layer 2cd52cbb1ebe: Waiting 02cb0e091e33: Waiting 9c3d619183d2: Waiting 7f7602a82106: Waiting 5a2aba542b08: Waiting 6cb9b761b877: Waiting 6e8af4fd0a07: Waiting 15a17189b2df: Waiting 644e9b203583: Verifying Checksum 644e9b203583: Download complete 32f112e3802c: Verifying Checksum 32f112e3802c: Download complete 2cd52cbb1ebe: Verifying Checksum 2cd52cbb1ebe: Download complete 6e8af4fd0a07: Verifying Checksum 6e8af4fd0a07: Download complete 02cb0e091e33: Verifying Checksum 02cb0e091e33: Download complete 9c3d619183d2: Verifying Checksum 9c3d619183d2: Download complete 02559cd4bc8d: Verifying Checksum 02559cd4bc8d: Download complete 7f7602a82106: Verifying Checksum 7f7602a82106: Download complete 6cb9b761b877: Verifying Checksum 6cb9b761b877: Download complete 32f112e3802c: Pull complete 644e9b203583: Pull complete 02559cd4bc8d: Pull complete 2cd52cbb1ebe: Pull complete 6e8af4fd0a07: Pull complete 15a17189b2df: Verifying Checksum 15a17189b2df: Download complete 5a2aba542b08: Verifying Checksum 5a2aba542b08: Download complete 15a17189b2df: Pull complete 02cb0e091e33: Pull complete 9c3d619183d2: Pull complete 7f7602a82106: Pull complete 5a2aba542b08: Pull complete 6cb9b761b877: Pull complete Digest: sha256:020bc241a628776338f4d4053fed4c38f6f7f3d7eb5919fecb8de313bb8ba47c Status: Downloaded newer image for nvidia/cuda:12.9.1-devel-ubuntu24.04 ---> eecafe98c3e1 Step 2/46 : ENV CUDA_PATH /usr/local/cuda ---> Using cache ---> 780681fb1fee Step 3/46 : ENV LD_LIBRARY_PATH /usr/local/cuda/lib64 ---> Using cache ---> ba98a15dc225 Step 4/46 : ENV CUDA_CACHE_DISABLE 1 ---> Using cache ---> 3932740340f7 Step 5/46 : RUN apt-get update -qq && apt-get install -qq --no-install-recommends gfortran && rm -rf /var/lib/apt/lists/* ---> Using cache ---> a06eb14abc29 Step 6/46 : WORKDIR /opt/cp2k-toolchain ---> Using cache ---> 082681bac850 Step 7/46 : COPY ./tools/toolchain/install_requirements*.sh ./ ---> Using cache ---> d8bfc1674c90 Step 8/46 : RUN ./install_requirements.sh ubuntu ---> Using cache ---> de928c312410 Step 9/46 : RUN mkdir scripts ---> Using cache ---> 4aed4b85b643 Step 10/46 : COPY ./tools/toolchain/scripts/VERSION ./tools/toolchain/scripts/parse_if.py ./tools/toolchain/scripts/tool_kit.sh ./tools/toolchain/scripts/common_vars.sh ./tools/toolchain/scripts/signal_trap.sh ./tools/toolchain/scripts/get_openblas_arch.sh ./scripts/ ---> Using cache ---> ce9efe84db60 Step 11/46 : COPY ./tools/toolchain/install_cp2k_toolchain.sh . ---> Using cache ---> dfc1a5ca7e3f Step 12/46 : RUN ./install_cp2k_toolchain.sh --with-mpich=install --mpi-mode=mpich --enable-cuda=yes --gpu-ver=V100 --dry-run ---> Using cache ---> 1bc3916e19c7 Step 13/46 : COPY ./tools/toolchain/scripts/stage0/ ./scripts/stage0/ ---> Using cache ---> bbd97369be82 Step 14/46 : RUN ./scripts/stage0/install_stage0.sh && rm -rf ./build ---> Using cache ---> fbbd58fb6405 Step 15/46 : COPY ./tools/toolchain/scripts/stage1/ ./scripts/stage1/ ---> Using cache ---> 9707298b4465 Step 16/46 : RUN ./scripts/stage1/install_stage1.sh && rm -rf ./build ---> Using cache ---> 10af8edef201 Step 17/46 : COPY ./tools/toolchain/scripts/stage2/ ./scripts/stage2/ ---> Using cache ---> cde1e5c7df26 Step 18/46 : RUN ./scripts/stage2/install_stage2.sh && rm -rf ./build ---> Using cache ---> e634e183ddda Step 19/46 : COPY ./tools/toolchain/scripts/stage3/ ./scripts/stage3/ ---> Using cache ---> 90e1d29eaee5 Step 20/46 : RUN ./scripts/stage3/install_stage3.sh && rm -rf ./build ---> Using cache ---> 456e432c42cd Step 21/46 : COPY ./tools/toolchain/scripts/stage4/ ./scripts/stage4/ ---> Using cache ---> 25314ed00994 Step 22/46 : RUN ./scripts/stage4/install_stage4.sh && rm -rf ./build ---> Using cache ---> 2f32d5fcf1ca Step 23/46 : COPY ./tools/toolchain/scripts/stage5/ ./scripts/stage5/ ---> Using cache ---> f6eb71d2ea73 Step 24/46 : RUN ./scripts/stage5/install_stage5.sh && rm -rf ./build ---> Using cache ---> 89a999028ecd Step 25/46 : COPY ./tools/toolchain/scripts/stage6/ ./scripts/stage6/ ---> Using cache ---> 4fee466a0efd Step 26/46 : RUN ./scripts/stage6/install_stage6.sh && rm -rf ./build ---> Using cache ---> 4a225437d875 Step 27/46 : COPY ./tools/toolchain/scripts/stage7/ ./scripts/stage7/ ---> Using cache ---> b3bdd93e7b5e Step 28/46 : RUN ./scripts/stage7/install_stage7.sh && rm -rf ./build ---> Using cache ---> fc993b0523c8 Step 29/46 : COPY ./tools/toolchain/scripts/stage8/ ./scripts/stage8/ ---> Using cache ---> b243c28b2b5f Step 30/46 : RUN ./scripts/stage8/install_stage8.sh && rm -rf ./build ---> Using cache ---> ac272cd10306 Step 31/46 : COPY ./tools/toolchain/scripts/stage9/ ./scripts/stage9/ ---> Using cache ---> 3ae08df2098f Step 32/46 : RUN ./scripts/stage9/install_stage9.sh && rm -rf ./build ---> Using cache ---> 8632987b9f69 Step 33/46 : WORKDIR /opt/cp2k ---> Using cache ---> b27caf79383d Step 34/46 : COPY ./src ./src ---> 8e52da7f1307 Step 35/46 : COPY ./data ./data ---> 1acd04a6ef59 Step 36/46 : COPY ./tools/build_utils ./tools/build_utils ---> 16df6bbc60b0 Step 37/46 : COPY ./cmake ./cmake ---> bc82b3a58795 Step 38/46 : COPY ./CMakeLists.txt . ---> 97dd01fa141a Step 39/46 : COPY ./tools/docker/scripts/build_cp2k.sh . ---> 00be878ec917 Step 40/46 : RUN ./build_cp2k.sh toolchain_cuda_V100 psmp ---> Running in 156dc7d3c3be ==================== Building CP2K ==================== -- The Fortran compiler identification is GNU 13.3.0 -- The C compiler identification is GNU 13.3.0 -- The CXX compiler identification is GNU 13.3.0 -- Detecting Fortran compiler ABI info -- Detecting Fortran compiler ABI info - done -- Check for working Fortran compiler: /usr/bin/gfortran - skipped -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: /usr/bin/gcc - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/g++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Found PkgConfig: /usr/bin/pkg-config (found version "1.8.1") -- Found Python: /usr/bin/python3.12 (found version "3.12.3") found components: Interpreter -- Found MPI_C: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpi.so (found version "5.0") -- Found MPI_CXX: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpicxx.so (found version "5.0") -- Found MPI_Fortran: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpifort.so (found version "5.0") -- Found MPI: TRUE (found version "5.0") found components: C CXX Fortran -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Found MPI: TRUE (found version "5.0") found components: CXX C Fortran -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: CXX C Fortran -- Could NOT find MKL (missing: CP2K_MKL_INCLUDE_DIRS) -- Checking for module 'openblas' -- Found openblas, version 0.3.33 -- Found OpenBLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/include -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found Lapack: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Checking for module 'libxsmm-shared' -- Found libxsmm-shared, version 1.17.0 -- Checking for module 'libxsmmf-shared' -- Found libxsmmf-shared, version 1.17.0 -- Checking for module 'libxsmmext-shared' -- Found libxsmmext-shared, version 1.17.0 -- Checking for module 'libxsmmnoblas-shared' -- Found libxsmmnoblas-shared, version 1.17.0 -- Found LibXSMM: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include -- Using LIBXSMM for Small Matrix Multiplication -- Checking for module 'scalapack' -- Package 'mpi', required by 'scalapack', not found Package 'lapack', required by 'scalapack', not found Package 'blas', required by 'scalapack', not found -- Found SCALAPACK: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a -- CP2K_WITH_GPU is deprecated in favor of CMAKE_HIP_ARCHITECTURES or CMAKE_CUDA_ARCHITECTURES -- The CUDA compiler identification is NVIDIA 12.9.86 with host compiler GNU 13.3.0 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.9.86") ----------------------------------------------------------- - CUDA - ----------------------------------------------------------- -- GPU architecture number: 70 -- GPU profiling enabled: OFF -- CUDA compiler and libraries found ------------------------------------------------------------ - OPENMP - ------------------------------------------------------------ -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") found components: Fortran C CXX ------------------------------------------------------------ - DBCSR - ------------------------------------------------------------ -- Found MPI: TRUE (found version "5.0") -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Found OpenMP: TRUE (found version "4.5") -- Checking for module 'libxsmmf' -- Found libxsmmf, version 1.17.0 -- Checking for module 'libxsmmext' -- Found libxsmmext, version 1.17.0 ------------------------------------------------------------ - Other dependencies - ------------------------------------------------------------ -- Checking for one of the modules 'elpa_openmp' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -- Found HDF5: hdf5-shared;hdf5_fortran-shared (found version "2.1.1") found components: C Fortran -- Found MPI: TRUE (found version "5.0") found components: CXX -- Found OPENBLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found Blas: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Checking for one of the modules 'fftw3' -- Checking for one of the modules 'fftw3f' -- Checking for one of the modules 'fftw3l' -- Checking for one of the modules 'fftw3q' -- Found Fftw: /opt/cp2k-toolchain/install/fftw-3.3.11/include -- Checking for module 'libint2' -- Package 'libint2', required by 'virtual:world', not found -- Found Libint2: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - found -- Found GSL: /opt/cp2k-toolchain/install/gsl-2.8/include (found version "2.8") -- Checking for one of the modules 'libxc>=3.0.0' -- Found LibXC: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a (Required is at least version "3.0.0") -- Found LibSPG: /opt/cp2k-toolchain/install/spglib-2.7.0/lib/libsymspg.a -- Found HDF5: hdf5-shared (found version "2.1.1") found components: C -- Found FFTW: /opt/cp2k-toolchain/install/fftw-3.3.11/include -- Looking for Fortran sgemm -- Looking for Fortran sgemm - not found -- Found BLAS: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so -- Found OpenMP_C: -fopenmp (found version "4.5") -- Found OpenMP_CXX: -fopenmp (found version "4.5") -- Found OpenMP_CUDA: -fopenmp (found version "4.5") -- Found OpenMP_Fortran: -fopenmp (found version "4.5") -- Looking for Fortran cheev -- Looking for Fortran cheev - found -- Found LAPACK: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so;-lm;-ldl -- Checking for one of the modules 'elpa;elpa_openmp;elpa-openmp-2019.05.001;elpa_openmp-2019.11.001;elpa_openmp-2020.05.001;elpa-2019.05.001;elpa-2019.11.001;elpa-2020.05.001' -- Found Elpa: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so -- Checking for module 'libvdwxc>=0.5.0' -- Found libvdwxc, version 0.5.0 -- Checking for module 'fftw3' -- Found fftw3, version 3.3.11 -- Found LibVDWXC: vdwxc;fftw3 (Required is at least version "0.5.0") -- Setting build type to 'Release' as none was specified. -- Performing Test f2008-norm2 -- Performing Test f2008-norm2 - Success -- Performing Test f2008-block_construct -- Performing Test f2008-block_construct - Success -- Performing Test f2008-contiguous -- Performing Test f2008-contiguous - Success -- Performing Test f95-reshape-order-allocatable -- Performing Test f95-reshape-order-allocatable - Success -- FYPP preprocessor found. -------------------------------------------------------------------- - - - Summary of enabled dependencies - - - -------------------------------------------------------------------- - BLAS - vendor: OpenBLAS - include directories: /opt/cp2k-toolchain/install/openblas-0.3.33/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so - LAPACK - include directories: /opt/cp2k-toolchain/install/openblas-0.3.33/include - libraries: /opt/cp2k-toolchain/install/openblas-0.3.33/lib/libopenblas.so - MPI - include directories: /opt/cp2k-toolchain/install/mpich-5.0.1/include - libraries: /opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpicxx.so;/opt/cp2k-toolchain/install/mpich-5.0.1/lib/libmpi.so - MPI_F08: ON - ScaLAPACK - vendor: auto - include directories: - libraries: /opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a - Hardware Acceleration: - CUDA: - GPU architecture number: 70 - GPU profiling enabled: - GPU accelerated modules - ELPA module: ON - GRID module: ON - DBM module: ON - PW module: ON - LibXC - version: 7.0.0 - include directories: /opt/cp2k-toolchain/install/libxc-7.0.0/include/ - libraries: /opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxcf03.a;/opt/cp2k-toolchain/install/libxc-7.0.0/lib/libxc.a - HDF5 - version: 2.1.1 - include directories: /opt/cp2k-toolchain/install/hdf5-2.1.1/include - libraries: hdf5-shared - FFTW3 - include directories: /opt/cp2k-toolchain/install/fftw-3.3.11/include - libraries: /opt/cp2k-toolchain/install/fftw-3.3.11/lib/libfftw3.a - LIBXSMM - include directories: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/include - libraries: /opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmext.so;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so;/opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmmf.so;:libxsmmext.a;:libxsmm.a;/usr/lib/x86_64-linux-gnu/libpthread.a;/usr/lib/x86_64-linux-gnu/librt.a;/usr/lib/x86_64-linux-gnu/libdl.a;/usr/lib/x86_64-linux-gnu/libm.so;/usr/lib/x86_64-linux-gnu/libc.so;/opt/cp2k-toolchain/install/libxsmm-e0c4a2389afba36c453233ad7de07bd92c715bec/lib/libxsmm.so - SpLA - include directories: /opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include;/opt/cp2k-toolchain/install/SpLA-1.6.1-cuda/include/spla - libraries: $;$;$;$;MPI::MPI_CXX;MPI::MPI_C;MPI::MPI_Fortran - SpLA GEMM offloading - SIRIUS - include directories: - libraries: - COSMA - include directories: /opt/cp2k-toolchain/install/COSMA-2.8.4/include - libraries: MPI::MPI_CXX;costa::costa;$;$;$<$:cosma::BLAS::blas>;$;$<$:Tiled-MM::Tiled-MM>;$<$:Tiled-MM::Tiled-MM>;$<$:semiprof::semiprof>;$<$:cosma::scalapack::scalapack> - Libint2 - include directories: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/include - libraries: /opt/cp2k-toolchain/install/libint-v2.13.1-cp2k-lmax-5/lib/libint2.a - ELPA - include directories: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/include/elpa_openmp-2026.02.001 - libraries: /opt/cp2k-toolchain/install/elpa-2026.02.001/nvidia/lib/libelpa_openmp.so;cudart;cublasLt;cublas;/opt/cp2k-toolchain/install/scalapack-2.2.3/lib/libscalapack.a;:libopenblas.a -------------------------------------------------------------------- - - - List of dependencies not included in this build - - - -------------------------------------------------------------------- - DFTD4 - DeePMD - PEXSI - ACE (libpace) - TBLITE - Spglib - LibSMEAGOL - MiMiC - openPMD - DLA-Future - PLUMED - LibFCI - GauXC - Libvori - LibTorch - TREXIO - GreenX After building and installing CP2K the regtests can be run with the following command: /opt/cp2k/tests/do_regtest.py /opt/cp2k/bin psmp -- Configuring done (14.9s) -- Generating done (0.5s) -- Build files have been written to: /opt/cp2k/build Compiling CP2K ... done ---> Removed intermediate container 156dc7d3c3be ---> c713d298c48f Step 41/46 : COPY ./benchmarks ./benchmarks ---> 0b8a2341dfec Step 42/46 : COPY ./tools/regtesting ./tools/regtesting ---> f2fd43269ca8 Step 43/46 : COPY ./tools/docker/scripts/test_performance.sh ./tools/docker/scripts/plot_performance.py ./ ---> 96c2f3d05a9b Step 44/46 : RUN ./test_performance.sh "toolchain_cuda_V100" 2>&1 | tee report.log ---> Running in 2b507f5b6f5d ============== CP2K Binary Flags ============= cp2kflags: omp libint fftw3 libxc elpa parallel scalapack mpi_f08 cosma xsmm dbcsr_acc sirius offload_cuda spla_gemm_offloading libvdwxc hdf5 ========== Checking Benchmark Inputs ========= Found 83 input files and 0 errors. ========== Running Performance Test ========== Plot: name="total_timings_6cpu_1gpu", title="Total Timings with 6 CPU Cores and 1 GPU", ylabel="time [s]" Running H2O-64.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.028 0.028 99.445 99.446 qs_mol_dyn_low 1 2.0 0.005 0.005 99.021 99.025 qs_forces 11 3.9 0.002 0.002 98.972 98.974 qs_energies 11 4.9 0.001 0.001 88.194 88.196 scf_env_do_scf 11 5.9 0.001 0.001 73.140 73.142 scf_env_do_scf_inner_loop 110 6.5 0.006 0.008 62.451 62.452 velocity_verlet 10 3.0 0.001 0.002 61.929 61.945 rebuild_ks_matrix 121 8.3 0.001 0.001 26.986 26.988 qs_ks_build_kohn_sham_matrix 121 9.3 0.020 0.020 26.985 26.987 dbcsr_multiply_generic 2309 12.5 0.147 0.149 25.966 26.041 qs_ks_update_qs_env 121 7.6 0.001 0.001 25.095 25.098 qs_scf_new_mos 110 7.5 0.001 0.001 21.126 21.126 qs_scf_loop_do_ot 110 8.5 0.001 0.001 21.125 21.125 qs_rho_update_rho_low 121 7.7 0.001 0.001 21.056 21.076 calculate_rho_elec 121 8.7 0.879 0.884 21.055 21.075 ot_scf_mini 110 9.5 0.003 0.003 19.136 19.141 fft_wrap_pw1pw2 1221 11.7 0.023 0.024 16.972 17.008 fft_wrap_pw1pw2_140 495 12.2 0.003 0.003 14.613 14.622 sum_up_and_integrate 121 10.3 0.002 0.002 13.939 14.012 integrate_v_rspace 121 11.3 0.355 0.358 13.844 13.917 multiply_cannon 2309 13.5 0.339 0.341 12.964 12.968 multiply_cannon_loop 2309 14.5 0.258 0.260 11.838 11.840 make_m2s 4618 13.5 0.044 0.044 11.337 11.348 density_rs2pw 121 9.7 0.007 0.007 11.087 11.229 make_images 4618 14.5 1.217 1.230 11.161 11.172 ot_mini 110 10.5 0.001 0.001 11.098 11.100 init_scf_loop 11 6.9 0.000 0.000 10.601 10.602 grid_collocate_task_list 121 9.7 9.057 9.169 9.057 9.169 pw_gpu_r3dc1d_3d_ps 616 13.1 2.403 2.415 8.695 8.712 pw_gpu_c1dr3d_3d_ps 605 14.2 2.295 2.309 8.247 8.300 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.647 7.755 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 7.595 7.595 prepare_preconditioner 11 7.9 0.000 0.000 7.294 7.301 make_preconditioner 11 8.9 0.000 0.000 7.294 7.301 grid_integrate_task_list 121 12.3 6.910 6.985 6.910 6.985 init_scf_run 11 5.9 0.000 0.000 6.805 6.805 scf_env_initial_rho_setup 11 6.9 0.000 0.001 6.804 6.804 hybrid_alltoall_any 4771 16.4 4.908 4.937 6.717 6.730 qs_ot_get_derivative 110 11.5 0.002 0.002 6.684 6.688 make_images_data 4618 15.5 0.055 0.055 6.608 6.626 potential_pw2rs 121 12.3 0.037 0.038 6.579 6.579 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.102 6.363 multiply_cannon_multrec 4618 15.5 2.143 2.151 6.242 6.293 ot_diis_step 110 11.5 0.005 0.006 4.389 4.390 mp_alltoall_z22v 1221 15.7 4.303 4.324 4.303 4.324 wfi_extrapolate 11 7.9 0.001 0.001 3.945 3.945 build_core_ppl_forces 11 5.9 3.845 3.937 3.845 3.937 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.814 3.872 apply_preconditioner_dbcsr 121 12.6 0.000 0.000 3.834 3.835 apply_single 121 13.6 0.001 0.001 3.834 3.834 mp_waitall_1 65147 16.9 3.824 3.828 3.824 3.828 dbcsr_mm_accdrv_process 9724 16.2 0.900 1.309 3.710 3.754 dbcsr_complete_redistribute 329 12.2 1.390 1.391 3.422 3.689 qs_env_update_s_mstruct 11 6.9 0.000 0.000 3.376 3.424 calculate_dm_sparse 121 9.5 0.001 0.001 3.347 3.353 qs_ot_get_p 121 10.4 0.001 0.001 3.291 3.292 multiply_cannon_sync_h2d 4618 15.5 3.133 3.167 3.133 3.167 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.018 3.019 transfer_rs2pw 495 10.6 0.008 0.008 2.720 2.918 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.879 2.881 pw_poisson_solve 121 10.3 0.003 0.003 2.705 2.707 yz_to_x 616 14.1 0.468 0.471 2.667 2.684 jit_kernel_multiply 12 15.5 2.205 2.656 2.205 2.656 copy_dbcsr_to_fm 153 11.3 0.004 0.004 2.621 2.629 x_to_yz 605 15.2 0.507 0.512 2.611 2.615 qs_create_task_list 11 7.9 0.000 0.000 2.500 2.607 generate_qs_task_list 11 8.9 1.134 1.152 2.499 2.607 transfer_rs2pw_140 132 11.5 1.591 1.601 2.271 2.476 calculate_first_density_matrix 1 7.0 0.000 0.000 2.404 2.404 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.327 2.329 qs_ot_get_derivative_taylor 61 13.0 0.003 0.003 2.254 2.257 pw_gpu_fg 616 14.1 2.194 2.213 2.194 2.213 cp_fm_cholesky_invert 11 10.9 2.203 2.203 2.203 2.203 qs_ot_p2m_diag 50 11.0 0.085 0.086 2.096 2.098 dbcsr_special_finalize 6927 15.5 0.039 0.039 2.083 2.092 copy_fm_to_dbcsr 176 11.2 0.001 0.001 1.777 2.037 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.006 2.016 build_core_ppl 11 7.9 1.969 2.009 1.969 2.009 qs_vxc_create 121 10.3 0.003 0.003 1.969 1.995 xc_vxc_pw_create 121 11.3 0.661 0.665 1.965 1.991 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64", label="H2O-64", y=99.445, yerr=0.0 Plot: name="H2O-64_timings_6cpu_1gpu", title="Timings of H2O-64 with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="rest", label="rest", y=70.422, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=9.057, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=6.91, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.908, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=4.303, yerr=0.0 PlotPoint: plot="H2O-64_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.845, yerr=0.0 Running H2O-64_nonortho.inp with 3 threads and 2 ranks... done. From /workspace/artifacts/H2O-64_nonortho_6cpu_1gpu.out: ------------------------------------------------------------------------------- - - - T I M I N G - - - ------------------------------------------------------------------------------- SUBROUTINE CALLS ASD SELF TIME TOTAL TIME MAXIMUM AVERAGE MAXIMUM AVERAGE MAXIMUM CP2K 1 1.0 0.026 0.026 96.838 96.838 qs_mol_dyn_low 1 2.0 0.004 0.004 96.396 96.399 qs_forces 11 3.9 0.002 0.002 96.350 96.350 qs_energies 11 4.9 0.001 0.001 85.359 85.360 scf_env_do_scf 11 5.9 0.001 0.001 69.535 69.536 velocity_verlet 10 3.0 0.001 0.002 62.807 62.824 scf_env_do_scf_inner_loop 103 6.6 0.006 0.008 58.456 58.456 rebuild_ks_matrix 114 8.3 0.001 0.001 26.653 26.655 qs_ks_build_kohn_sham_matrix 114 9.3 0.018 0.018 26.653 26.654 dbcsr_multiply_generic 2114 12.5 0.136 0.136 24.843 24.878 qs_ks_update_qs_env 114 7.6 0.001 0.001 24.566 24.568 qs_scf_new_mos 103 7.6 0.001 0.001 19.941 19.949 qs_scf_loop_do_ot 103 8.6 0.001 0.001 19.940 19.948 qs_rho_update_rho_low 114 7.7 0.001 0.001 18.726 18.748 calculate_rho_elec 114 8.7 0.833 0.837 18.725 18.747 ot_scf_mini 103 9.6 0.003 0.003 18.059 18.060 fft_wrap_pw1pw2 1151 11.7 0.022 0.022 16.168 16.212 sum_up_and_integrate 114 10.3 0.002 0.002 14.234 14.274 integrate_v_rspace 114 11.3 0.337 0.338 14.145 14.184 fft_wrap_pw1pw2_140 467 12.2 0.003 0.003 13.940 13.966 multiply_cannon 2114 13.5 0.314 0.319 12.301 12.423 multiply_cannon_loop 2114 14.5 0.240 0.241 11.188 11.203 make_m2s 4228 13.5 0.041 0.042 10.993 11.115 init_scf_loop 11 6.9 0.000 0.000 10.992 10.992 make_images 4228 14.5 1.198 1.268 10.827 10.947 density_rs2pw 114 9.7 0.007 0.007 10.504 10.607 ot_mini 103 10.6 0.001 0.001 10.568 10.570 qs_energies_init_hamiltonians 11 5.9 0.000 0.000 8.571 8.572 pw_gpu_r3dc1d_3d_ps 581 13.2 2.308 2.342 8.321 8.328 pw_gpu_c1dr3d_3d_ps 570 14.2 2.169 2.182 7.819 7.868 build_core_hamiltonian_matrix_ 11 4.9 0.001 0.001 7.699 7.841 prepare_preconditioner 11 7.9 0.000 0.000 7.641 7.646 make_preconditioner 11 8.9 0.000 0.000 7.641 7.646 grid_integrate_task_list 114 12.3 7.537 7.577 7.537 7.577 grid_collocate_task_list 114 9.7 7.357 7.426 7.357 7.426 make_full_inverse_cholesky 11 9.9 0.000 0.000 6.404 6.665 hybrid_alltoall_any 4375 16.4 4.713 4.809 6.610 6.631 init_scf_run 11 5.9 0.000 0.000 6.594 6.594 scf_env_initial_rho_setup 11 6.9 0.000 0.001 6.594 6.594 make_images_data 4228 15.5 0.050 0.050 6.407 6.423 qs_ot_get_derivative 103 11.6 0.002 0.002 6.305 6.307 potential_pw2rs 114 12.3 0.036 0.036 6.270 6.270 multiply_cannon_multrec 4228 15.5 2.006 2.011 5.990 5.991 qs_env_update_s_mstruct 11 6.9 0.000 0.000 4.292 4.429 ot_diis_step 103 11.6 0.005 0.005 4.239 4.239 mp_alltoall_z22v 1151 15.7 4.075 4.105 4.075 4.105 build_core_ppl_forces 11 5.9 3.888 4.008 3.888 4.008 build_core_hamiltonian_matrix 11 6.9 0.001 0.001 3.851 3.902 dbcsr_complete_redistribute 317 12.2 1.392 1.399 3.586 3.853 wfi_extrapolate 11 7.9 0.001 0.001 3.807 3.807 mp_waitall_1 59659 16.9 3.654 3.786 3.654 3.786 apply_preconditioner_dbcsr 114 12.6 0.000 0.000 3.774 3.776 apply_single 114 13.6 0.001 0.001 3.774 3.776 dbcsr_mm_accdrv_process 9040 16.2 0.777 0.820 3.609 3.616 qs_create_task_list 11 7.9 0.000 0.000 3.425 3.509 generate_qs_task_list 11 8.9 1.413 1.434 3.425 3.509 calculate_dm_sparse 114 9.5 0.001 0.001 3.187 3.196 qs_ks_update_qs_env_forces 11 4.9 0.000 0.000 3.144 3.145 qs_ot_get_p 114 10.4 0.001 0.001 2.978 2.979 multiply_cannon_sync_h2d 4228 15.5 2.919 2.926 2.919 2.926 copy_dbcsr_to_fm 147 11.2 0.004 0.004 2.895 2.915 cp_dbcsr_sm_fm_multiply 37 9.5 0.001 0.001 2.855 2.856 transfer_rs2pw 467 10.7 0.008 0.008 2.542 2.687 pw_poisson_solve 114 10.3 0.003 0.003 2.572 2.581 yz_to_x 581 14.2 0.446 0.449 2.545 2.574 x_to_yz 570 15.2 0.479 0.480 2.455 2.456 calculate_first_density_matrix 1 7.0 0.000 0.000 2.356 2.356 transfer_dbcsr_to_fm 11 10.9 0.001 0.001 2.291 2.312 cp_dbcsr_sm_fm_multiply_core 37 10.5 0.000 0.000 2.310 2.312 jit_kernel_multiply 11 15.9 2.268 2.303 2.268 2.303 transfer_rs2pw_140 125 11.5 1.513 1.523 2.120 2.270 cp_fm_cholesky_invert 11 10.9 2.246 2.246 2.246 2.246 qs_ot_get_derivative_taylor 60 13.0 0.002 0.002 2.136 2.138 pw_gpu_fg 581 14.2 2.113 2.128 2.113 2.128 copy_fm_to_dbcsr 170 11.1 0.001 0.002 1.811 2.076 build_core_ppl 11 7.9 1.987 2.026 1.987 2.026 dbcsr_special_finalize 6342 15.5 0.036 0.037 1.978 1.992 ------------------------------------------------------------------------------- PlotPoint: plot="total_timings_6cpu_1gpu", name="H2O-64_nonortho", label="H2O-64_nonortho", y=96.838, yerr=0.0 Plot: name="H2O-64_nonortho_timings_6cpu_1gpu", title="Timings of H2O-64_nonortho with 6 CPU Cores and 1 GPU", ylabel="time [s]" PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="rest", label="rest", y=69.268, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_integrate_task_list", label="grid_integrate_task_list", y=7.537, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="grid_collocate_task_list", label="grid_collocate_task_list", y=7.357, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="hybrid_alltoall_any", label="hybrid_alltoall_any", y=4.713, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="mp_alltoall_z22v", label="mp_alltoall_z22v", y=4.075, yerr=0.0 PlotPoint: plot="H2O-64_nonortho_timings_6cpu_1gpu", name="build_core_ppl_forces", label="build_core_ppl_forces", y=3.888, yerr=0.0 Running w64PBE.inp with 3 threads and 2 ranks... failed. ----------------------------------- OT --------------------------------------- Step Update method Time Convergence Total energy Change ------------------------------------------------------------------------------ 1 OT DIIS 0.80E-01 3.9 0.00000098 -1102.7676349830 -1.20E-08 2 OT DIIS 0.80E-01 1.7 0.00000718 -1102.7676349949 -1.18E-08 3 OT DIIS 0.80E-01 1.7 0.00001033 -1102.7676349924 2.43E-09 4 OT DIIS 0.80E-01 1.7 0.00000365 -1102.7676349925 -2.75E-11 5 OT DIIS 0.80E-01 1.7 0.00000056 -1102.7676349927 -2.37E-10 6 OT DIIS 0.80E-01 1.7 0.00002747 -1102.7676349989 -6.21E-09 7 OT DIIS 0.80E-01 1.7 0.00000447 -1102.7676349989 4.55E-13 8 OT DIIS 0.80E-01 1.7 0.00000248 -1102.7676349989 -1.48E-11 9 OT DIIS 0.80E-01 1.7 0.00004150 -1102.7676349996 -6.73E-10 10 OT DIIS 0.80E-01 1.7 0.00002680 -1102.7676350020 -2.36E-09 Leaving inner SCF loop after reaching 10 steps. Electronic density on regular grids: -512.0000000044 -0.0000000044 Core density on regular grids: 511.9999999998 -0.0000000002 Total charge density on r-space grids: -0.0000000045 Total charge density g-space grids: -0.0000000045 Overlap energy of the core charge distribution: 0.00000091569564 Self energy of the core charge distribution: -2838.67351367283345 Core Hamiltonian energy: 824.05924745106995 Hartree energy: 1182.15846871225222 Exchange-correlation energy: -270.31183840813935 Total energy: -1102.76763500195511 outer SCF iter = 10 RMS gradient = 0.27E-04 energy = -1102.7676350020 ----------------------------------- OT --------------------------------------- Minimizer : DIIS : direct inversion in the iterative subspace using 7 DIIS vectors safer DIIS on Preconditioner : FULL_SINGLE_INVERSE : inversion of H + eS - 2*(Sc)(c^T*H*c+const)(Sc)^T Precond_solver : DEFAULT stepsize : 0.08000000 energy_gap : 0.08000000 eps_taylor : 0.10000E-15 max_taylor : 4 ----------------------------------- OT --------------------------------------- Step Update method Time Convergence Total energy Change ------------------------------------------------------------------------------ 1 OT DIIS 0.80E-01 3.9 0.00000573 -1102.7676350020 -2.09E-11 2 OT DIIS 0.80E-01 1.7 0.00000346 -1102.7676347891 2.13E-07 3 OT DIIS 0.80E-01 1.7 0.00000488 -1102.7676348588 -6.98E-08 4 OT DIIS 0.80E-01 1.7 0.00000387 -1102.7676348717 -1.29E-08 5 OT DIIS 0.80E-01 1.7 0.00000151 -1102.7676348862 -1.45E-08 6 OT DIIS 0.80E-01 1.7 0.00003857 -1102.7676349252 -3.90E-08 7 OT DIIS 0.80E-01 1.7 0.00001614 -1102.7676349248 3.87E-10 8 OT DIIS 0.80E-01 1.7 0.00000866 -1102.7676349248 -2.50E-12 9 OT DIIS 0.80E-01 1.7 0.00000106 -1102.7676349494 -2.46E-08 10 OT DIIS 0.80E-01 1.7 0.00000041 -1102.7676350000 -5.07E-08 Leaving inner SCF loop after reaching 10 steps. Electronic density on regular grids: -512.0000000044 -0.0000000044 Core density on regular grids: 511.9999999998 -0.0000000002 Total charge density on r-space grids: -0.0000000045 Total charge density g-space grids: -0.0000000045 Overlap energy of the core charge distribution: 0.00000091569564 Self energy of the core charge distribution: -2838.67351367283345 Core Hamiltonian energy: 824.05924182274407 Hartree energy: 1182.15847376487363 Exchange-correlation energy: -270.31183783051267 Total energy: -1102.76763500003290 outer SCF iter = 11 RMS gradient = 0.41E-06 energy = -1102.7676350000 outer SCF loop FAILED to converge after 11 iterations or 110 steps ******************************************************************************* * ___ * * / \ * * [ABORT] * * \___/ SCF run NOT converged. To continue the calculation regardless, * * | please set the keyword IGNORE_CONVERGENCE_FAILURE. * * O/| * * /| | * * / \ qs_scf.F:685 * ******************************************************************************* ===== Routine Calling Stack ===== 5 scf_env_do_scf 4 qs_energies 3 qs_forces 2 qs_mol_dyn_low 1 CP2K Abort(1) on node 0 (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0 Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL STOP 1 Summary: Running w64PBE.inp failed. Status: FAILED ---> Removed intermediate container 2b507f5b6f5d ---> 07d9bfe61423 Step 45/46 : CMD cat $(find ./report.log -mmin +10) | sed '/^Summary:/ s/$/ (cached)/' ---> Running in 7a63051a991c ---> Removed intermediate container 7a63051a991c ---> c748fda7e10f Step 46/46 : ENTRYPOINT [] ---> Running in 7fee8497c1b9 ---> Removed intermediate container 7fee8497c1b9 ---> 9ae9e82aaf4a [Warning] One or more build-args [GIT_COMMIT_SHA SPACK_CACHE] were not consumed Successfully built 9ae9e82aaf4a Successfully tagged us-central1-docker.pkg.dev/cp2k-org-project/cp2kci/img_cp2k-perf-cuda-volta:master Pushing new image... done. #################### Running Image cp2k-perf-cuda-volta #################### Uploading artifacts... done EndDate: 2026-06-02 07:38:27+00:00