Skip to content

Conversation

@casparvl
Copy link
Contributor

@casparvl casparvl commented Dec 24, 2025

Add some initial changes to the hooks to make sure to install with --module-only if this is CUDA-12.6 based but targets CC100 or CC120.

This still needs to be completed. Also, I could potentially make it more clever and match anything that's >=CC100.

Edit 12-01: This PR now does a four things.

  1. Make the way in which unsupported modules are handled more generic. With that, if we ever need to add new cases of unsupported module, the thing we need to do is add the new case to is_unsupported_module
  2. Use that generic approach to check for compatibility between CUDA Compute Capability and CUDA toolkit version - and treat it as an unsupported module if it's not compatible
  3. Replace --cuda-compute-capabilities=9.0a with --cuda-compute-capabilities=9.0 for cuDNN-9.5.0.50 and similarly strip the suffix for 10.0f and 12.0f for cuDNN-9.10.1.4.

To achieve this, I first made the current mechanism that was there for handling the Zen4+foss-2022b incompatibility more generic.

  • Replace specific hooks like parse_hook_zen4_module_only (took care of adding the LmodError in a modluafooter), pre_prepare_hook_ignore_zen4_gcccore1220_error (sets env var to suppress the LmodError when building other software on top, so that you create modules dependent on the unsupported module) and post_prepare_hook_ignore_zen4_gcccore1220_error (unset that env var), with more generic hooks
  • Create a NamedTuple to hold two pieces of information:
    • The name of the environment variable to suppress the associated LmodError
    • The text for the LmodError
  • Set that NamedTuple as an attribute
  • Make other hooks use that attribute to set the relevant environment variable & error message
  • Moved setting the luafooter to the pre_module_hook, since some information relevant to determining if a module is unsupported may not be available as early as the parse_hook (such as the requested CUDA compute capability)

Then, I

  • implemented logic to check for the CUDA compatibility
  • implemented a message & env-var for the CUDA case in is_unsupported_module.

I've left some reviewer-comments in the changed files to make it easier for anyone reviewing this to see why things were changed.

I then ran tests to a) validate that it still worked for zen4+foss-2022b and b) did what it should do for the CUDA CC vs Toolkit version (in)compatibility. Results are in summary below.

zen4+foss-2022b test

Environment:

module load EESSI/2023.06
module load EESSI-extend/2023.06-easybuild

Build log:

# Use eb_hooks from feature branch:
eb --hooks eb_hooks.py h5py-3.8.0-foss-2022b.eb --rebuild

== Temporary log file in case of crash /scratch-local/casparl.18163544/eb-37mpwjnj/easybuild-cuu9nk5g.log
...
== processing EasyBuild easyconfig
/cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/software/EasyBuild/5.2.0/easybuild/easyconfigs/h/h5py/h5py-3.8.0-foss-2022b.eb
== building and installing h5py/3.8.0-foss-2022b...
  >> installation prefix: /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/software/h5py/3.8.0-foss-2022b
== fetching files and verifying checksums...
== Running pre-fetch hook...

WARNING: EasyConfigs using toolchains based on GCCcore-12.2.0 are not supported on Zen4 architectures. Building with
'--module-only --force' and injecting an LmodError into the modulefile.

== Updated build option 'module-only' to 'True'
== Updated build option 'force' to 'True'
...
== Setting EESSI_IGNORE_LMOD_ERROR_ZEN4_GCC1220 to allow loading dependencies that otherwise throw an LmodError
  >> loading toolchain module: foss/2022b
  >> loading modules for build dependencies:
  >>  * pkgconfig/1.5.5-GCCcore-12.2.0-python
  >> loading modules for (runtime) dependencies:
  >>  * Python/3.10.8-GCCcore-12.2.0
  >>  * SciPy-bundle/2023.02-gfbf-2022b
  >>  * mpi4py/3.1.4-gompi-2022b
  >>  * HDF5/1.14.0-gompi-2022b
  >> defining build environment for foss/2022b toolchain
== Running post-prepare hook...
== Resetting rpath_override_dirs to original value: None
== Unsetting EESSI_IGNORE_LMOD_ERROR_ZEN4_GCC1220
...
== Running pre-module hook...
== Setting EESSI_IGNORE_LMOD_ERROR_ZEN4_GCC1220 in initial environment
  >> generating module file @
/home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all/h5py/3.8.0-foss-2022b.lua
== Running post-module hook...
== Restored original build option 'module_only' to False
== Restored original build option 'force' to False
== Removing EESSI_IGNORE_LMOD_ERROR_ZEN4_GCC1220 in initial environment
== ... (took 1 secs)
...
== Results of the build can be found in the log file(s)
/home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/software/h5py/3.8.0-foss-2022b/easybuild/easybuild-h5py-3.8.0-20260108.171940.log.bz2
== Running post-easyblock hook...

== Build succeeded for 1 out of 1 (total: 5 secs)
== Summary:
   * [SUCCESS] h5py/3.8.0-foss-2022b
== Temporary log file(s) /scratch-local/casparl.18163544/eb-37mpwjnj/easybuild-cuu9nk5g.log* have been removed.
== Temporary directory /scratch-local/casparl.18163544/eb-37mpwjnj has been removed.

Test loading module:

module load h5py/3.8.0-foss-2022b

Lmod has detected the following error:  EasyConfigs using toolchains based on GCCcore-12.2.0 are not supported for
the Zen4 architecture.
See
https://www.eessi.io/docs/known_issues/eessi-2023.06/#gcc-1220-and-foss-2022b-based-modules-cannot-be-loaded-on-zen4-architecture
While processing the following module(s):
    Module fullname        Module Filename
    ---------------        ---------------
    GCCcore/12.2.0         /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all/GCCcore/12.2.0.lua
    GCC/12.2.0             /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all/GCC/12.2.0.lua
    foss/2022b             /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all/foss/2022b.lua
    h5py/3.8.0-foss-2022b  /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/modules/all/h5py/3.8.0-foss-2022b.lua

This is indeed the LmodError we expected

zen4+foss-2023b test

Environment:

module load EESSI/2023.06
module load EESSI-extend/2023.06-easybuild

Build log:

# Use eb_hooks from feature branch:
eb --hooks eb_hooks.py h5py-3.11.0-foss-2023b.eb --rebuild

...
== ... (took < 1 sec)
  >> running shell command:
        bzip2 /home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/software/h5py/3.11.0-foss-2023b/easybuild/easybuild-h5py-3.11.0-20260108.174439.log
        [started at: 2026-01-08 17:44:39]
        [working dir: /gpfs/home4/casparl/EESSI/software-layer-scripts]
        [output and state saved to /scratch-local/casparl.18163544/eb-_pcfey5c/run-shell-cmd-output/bzip2-ld1r6gr3]
  >> command completed: exit 0, ran in < 1s
== COMPLETED: Installation ended successfully (took 2 mins 53 secs)

Test loading module:

$ module load h5py/3.11.0-foss-2023b
$ echo $EBROOTH5PY
/home/casparl/eessi/versions/2023.06/software/linux/x86_64/amd/zen4/software/h5py/3.11.0-foss-2023b

This is what we expect - this installation should be unaltered.

CUDA Compute Capability 10.0 with CUDA Toolkit 10.6.0 (incompatible)

Environment:

module load EESSI/2025.06
module load EESSI-extend/2025.06-easybuild

Build log:

# Use eb_hooks from feature branch:
eb --sourcepath=/home/casparl/.local/easybuild/sources --hooks eb_hooks.py --accept-eula-for=CUDA --cuda-compute-capabilities=10.0a CUDA-12.6.0.eb --rebuild

...
== Running pre-fetch hook...

WARNING: Requested a CUDA Compute Capability (['10.0a']) that is not supported by the CUDA toolkit version (12.6.0) used by this software. Switching to '--module-only --force' and injectiong an LmodError into the modulefile.

== Updated build option 'module-only' to 'True'
...
== Setting EESSI_IGNORE_CUDA_12_6_0_CC_10_0 to allow loading dependencies that otherwise throw an LmodError
== Running post-prepare hook...
== Unsetting EESSI_IGNORE_CUDA_12_6_0_CC_10_0
...
== Running pre-module hook...
== Setting EESSI_IGNORE_CUDA_12_6_0_CC_10_0 in initial environment
  >> generating module file @ /home/casparl/eessi/versions/2025.06/software/linux/x86_64/amd/zen2/modules/all/CUDA/12.6.0.lua
== Running post-module hook...
== Restored original build option 'module_only' to False
== Restored original build option 'force' to False
== Removing EESSI_IGNORE_CUDA_12_6_0_CC_10_0 in initial environment
...
== COMPLETED: Installation ended successfully (took 18 secs)
== Results of the build can be found in the log file(s) /home/casparl/eessi/versions/2025.06/software/linux/x86_64/amd/zen2/software/CUDA/12.6.0/easybuild/easybuild-CUDA-12.6.0-20260108.174859.log.bz2
== Running post-easyblock hook...

== Build succeeded for 1 out of 1 (total: 20 secs)
== Summary:
   * [SUCCESS] CUDA/12.6.0

Test loading module:

$ module load CUDA/12.6.0
Lmod has detected the following error:  EasyConfigs using CUDA 12.6.0 or older are not supported for (all) requested Compute Capabilities: ['10.0a'].

While processing the following module(s):
    Module fullname  Module Filename
    ---------------  ---------------
    CUDA/12.6.0      /home/casparl/eessi/versions/2025.06/software/linux/x86_64/amd/zen2/modules/all/CUDA/12.6.0.lua

This is what we expect.

CUDA Compute Capability 9.0 with CUDA Toolkit 10.6.0 (compatible)

Environment:

module load EESSI/2025.06
module load EESSI-extend/2025.06-easybuild

Build log:

# Use eb_hooks from feature branch:
eb --sourcepath=/home/casparl/.local/easybuild/sources --hooks eb_hooks.py --accept-eula-for=CUDA --cuda-compute-capabilities=10.0a CUDA-12.6.0.eb --rebuild

...
== COMPLETED: Installation ended successfully (took 2 mins 52 secs)
== Results of the build can be found in the log file(s) /home/casparl/eessi/versions/2025.06/software/linux/x86_64/amd/zen2/software/CUDA/12.6.0/easybuild/easybuild-CUDA-12.6.0-20260108.175337.log.bz2
== Running post-easyblock hook...

== Build succeeded for 1 out of 1 (total: 2 mins 54 secs)
== Summary:
   * [SUCCESS] CUDA/12.6.0

Test loading module:

$ module load CUDA/12.6.0
$ echo $EBROOTCUDA
/home/casparl/eessi/versions/2025.06/software/linux/x86_64/amd/zen2/software/CUDA/12.6.0

This is, again, what we expect, since there is support for CC9.0 in CUDA 12.6.0

…module-only if this is CUDA-12.6 based but targets CC100 or CC120
@casparvl casparvl marked this pull request as draft December 24, 2025 16:40
@casparvl
Copy link
Contributor Author

casparvl commented Jan 5, 2026

I think we can make this a little more powerful, by defining a lookup-table that, for a given CUDA Compute Capability, returns the CUDA version in which it was first supported, and the CUDA version in which it was last supported (or "99.9.9" or something, if it is still supported). Then, we do a semantic version comparison to figure out if we are in that range. If not, we add an informative error message to the module, and generate with --module-only.

Caspar van Leeuwen and others added 6 commits January 7, 2026 18:11
…ted configurations more generic. Then, also apply this to unsupported combinations of CUDA toolkit versions and requested CUDA compute capabilities. TODO: actually implement a function that checks this compatibility
…da_version actually returns 'None' if CUDA was not in the deps
…ed by the generic X_prepare_hook_unsupported_modules
…nvironment variables don't contain invalid characters like commas and periods. Add some warning messages if installing a module that's unsupported.
@casparvl casparvl changed the title Use module-only for Cuda 12.6 and CC100 or CC120 Use module-only when a CUDA Compute Capability is requested that is incompatible with the CUDA toolkit version used Jan 8, 2026
@casparvl casparvl marked this pull request as ready for review January 8, 2026 17:02
# Supported compute capabilities by CUDA toolkit version
# Obtained by installing all CUDAs from 12.0.0 to 13.1.0, then using:

# #!/bin/bash
Copy link
Contributor Author

@casparvl casparvl Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth leaving this here as a breadcrumb to future contributors, since we'll have to update this list occasionally and doing it manually is silly - especially if you want to add compatibility for a range of toolkit versions

# Clean cuda_cc of any suffixes like the 'a' in '9.0a'
# The regex expects one or more digits, a dot, one or more digits, and then optionally any number of characters
# It will strip all characters by only return the first capture group (the digits and dot)
cuda_cc = re.sub(r'^(\d+\.\d+)[a-zA-Z]*$', r'\1', cuda_cc)
Copy link
Contributor Author

@casparvl casparvl Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lookup table contains CCs in the format of 90, 100, etc, so no periods, and no suffixes. The CUDA compute capabilities passed to EasyBuild contain periods (for sure) and can contain suffixes. So to compare, we need to strip the suffix from EB's CUDA CC, and remove the ..

# Always trigger this one, regardless of ec.name
cpu_target = get_eessi_envvar('EESSI_SOFTWARE_SUBDIR')
if cpu_target == CPU_TARGET_ZEN4:
parse_hook_zen4_module_only(ec, eprefix)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now handled in the pre_module_hook_unsupported_modules.

print_msg(msg % (new_parallel, curr_parallel, session_parallel, self.name, cpu_target), log=self.log)


def pre_prepare_hook_unsupported_modules(self, *args, **kwargs):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaces the specific pre_prepare_hook_ignore_zen4_gcccore1220_error we had before.

os.environ[unsup_mod.envvar] = "1"


def post_prepare_hook_unsupported_modules(self, *args, **kwargs):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaces the post_prepare_hook_ignore_zen4_gcccore1220_error hook we had before

if cpu_target == CPU_TARGET_ZEN4:
pre_prepare_hook_ignore_zen4_gcccore1220_error(self, *args, **kwargs)
# Always trigger this, regardless of ec.name
pre_prepare_hook_unsupported_modules(self, *args, **kwargs)
Copy link
Contributor Author

@casparvl casparvl Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run the new hook instead of the old one. All the logic to check if something is an unsupported module is now contained within is_unsupported_module, so no more use for checking the cpu_target.

if cpu_target == CPU_TARGET_ZEN4:
post_prepare_hook_ignore_zen4_gcccore1220_error(self, *args, **kwargs)
# Always trigger this, regardless of ec.name
post_prepare_hook_unsupported_modules(self, *args, **kwargs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run the new hook instead of the old one. All the logic to check if something is an unsupported module is now contained within is_unsupported_module, so no more use for checking the cpu_target.

print_msg("Changed toolchainopts for %s: %s", ec.name, ec['toolchainopts'])


def parse_hook_zen4_module_only(ec, eprefix):
Copy link
Contributor Author

@casparvl casparvl Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the LmodError to the modluafooter is now done in the generic pre_module_hook_unsupported_module hook



def is_unsupported_module(ec):
class UnsupportedModule(NamedTuple):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a named tuple so that we can have access to the environment variable name and error message through clearly named attributes. That's less sensitive to messing up compared to a regular tuple, where you'd have to remember what is stored in the first and what is stored in the second element of the tuple.


if cpu_target == CPU_TARGET_ZEN4 and is_gcccore_1220_based(ecname=ec.name, ecversion=ec.version, tcname=ec.toolchain.name, tcversion=ec.toolchain.version):
return EESSI_IGNORE_ZEN4_GCC1220_ENVVAR
# If this function was already called by an earlier hook, evaluation of whether this is an unsupported module was
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point in time, the is_unsupported_module function is called 6 or 7 times. Since it may become quite lengthy with lots of logic if we keep adding cases for modules that are unsupported, we want an early return for optimization in case this has already been evaluated before. We can easily do that by checking if either the EESSI_SUPPORTED_MODULE_ATTR or EESSI_UNSUPPORTED_MODULE_ATTR have been set.

If neither has been set, this is the first time we are evaluating this function and we should go through the full logic.

elif hasattr(self, EESSI_UNSUPPORTED_MODULE_ATTR):
return True

# Foss-2022b is not supported on Zen4
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next time we have unsupported modules, this function is the only one that needs changing: we simply add a case to it. A case typically has:

  • Logic (if statements) to determine if this is an unsupported module
  • Print a warning message to stdout to make it clear we're doing something out-of-the-ordinary in this installation
  • Define the LmodError message that should be embedded in the modulefile
  • Define the environment variable name that can be used to suppress the LmodError

ignore_lmoderror_envvar = is_unsupported_module(self)
if ignore_lmoderror_envvar:
if is_unsupported_module(self):
unsup_mod = getattr(self, EESSI_UNSUPPORTED_MODULE_ATTR)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get the UnsupportedModule tuple, so we can use it to set the environment variable that suppresses the LmodError.

# Modules for dependencies are loaded in the prepare step. Thus, that's where we need this variable to be set
# so that the modules can be succesfully loaded without printing the error (so that we can create a module
# _with_ the warning for the current software being installed)
def pre_prepare_hook_ignore_zen4_gcccore1220_error(self, *args, **kwargs):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced by generic pre_prepare_hook_unsupported_modules

os.environ[EESSI_IGNORE_ZEN4_GCC1220_ENVVAR] = "1"


def post_prepare_hook_ignore_zen4_gcccore1220_error(self, *args, **kwargs):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced by generic post_prepare_hook_unsupported_modules

Caspar van Leeuwen added 3 commits January 12, 2026 13:56
…he /cvmfs mount. This makes it easier to update the hooks and immedately test those changes from a software-layer PR
Caspar van Leeuwen added 4 commits January 12, 2026 17:17
Copy link
Contributor

@bedroge bedroge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two small remarks.

eb_hooks.py Outdated
if check_builddeps:
deps = deps + ec_dict['builddependencies'][:]

# Provide default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem relevant here? (or I'm just misunderstanding the comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, that applied to the cuda_ver = None before I moved that...

eb_hooks.py Outdated
msg += "Building with '--module-only --force' and injecting an LmodError into the modulefile."
print_warning(msg)
errmsg = "EasyConfigs using toolchains based on GCCcore-12.2.0 are not supported for the Zen4 architecture.\\n"
errmsg += "See https://www.eessi.io/docs/known_issues/eessi-<EESSI_VERSION>/#gcc-1220-and-foss-2022b-based-modules-cannot-be-loaded-on-zen4-architecture"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gave this a try myself, and noticed that the URL in the error message had <EESSI_VERSION>. We can use $EESSI_VERSION here, or even just hardcode it to 2023.06 (since this only applies to that version).

casparvl and others added 6 commits January 13, 2026 14:36
Co-authored-by: Caspar van Leeuwen <33718780+casparvl@users.noreply.github.com>
Co-authored-by: Caspar van Leeuwen <33718780+casparvl@users.noreply.github.com>
…e-layer-scripts into CUDA_cuDNN_hooks_202506
Co-authored-by: Bob Dröge <b.e.droge@rug.nl>
Copy link
Contributor

@bedroge bedroge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it myself a bit as well by doing the following with EESSI-extend and the hooks file from this PR:

  • (re)building (unsupported) foss 2022b for zen4 -> generates the dummy module containing an Lmod error message
  • (re)building foss 2023a for zen4 -> works fine
  • building an CUDA version that's not in the lookup table -> prints the error message, as expected
  • doing the same with EESSI_OVERRIDE_CUDA_CC_TOOLKIT_CHECK=1 -> works fine
  • building a CUDA version that is in the table, but with an unsupported CC -> generates a dummy module file

So, looks good to me! Nice work @casparvl , also on adding that new and generic is_unsupported_module, which will definitely be useful.

@bedroge
Copy link
Contributor

bedroge commented Jan 13, 2026

bot: build repo:eessi.io-2023.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen2
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen2

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jan 13, 2026

New job on instance eessi-bot-surf for repository eessi.io-2023.06-software
Building on: amd-zen2
Building for: x86_64/amd/zen2
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.01/pr_146/18294924

date job status comment
Jan 13 14:53:29 UTC 2026 submitted job id 18294924 will be eligible to start in about 20 seconds
Jan 13 14:53:35 UTC 2026 received job awaits launch by Slurm scheduler
Jan 13 14:54:05 UTC 2026 running job 18294924 is running
Jan 13 14:56:43 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-18294924.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2023.06-software-linux-x86_64-amd-zen2-17683160860.tar.zstsize: 0 MiB (25440 bytes)
entries: 1
modules under 2023.06/software/linux/x86_64/amd/zen2/modules/all
no module files in tarball
software under 2023.06/software/linux/x86_64/amd/zen2/software
no software packages in tarball
reprod directories under 2023.06/software/linux/x86_64/amd/zen2/reprod
no reprod directories in tarball
other under 2023.06/software/linux/x86_64/amd/zen2
2023.06/init/easybuild/eb_hooks.py
Jan 13 14:56:43 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/6) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/29Aug2024-foss-2023b-kokkos %scale=1_8_node /362173de @BotBuildTests:cpu_zen2+default
P: perf: 670.59 timesteps/s (r:0, l:None, u:None)
[ OK ] (2/6) EESSI_LAMMPS_lj %device_type=cpu %module_name=LAMMPS/2Aug2023_update2-foss-2023a-kokkos %scale=1_8_node /28a69ba4 @BotBuildTests:cpu_zen2+default
P: perf: 685.987 timesteps/s (r:0, l:None, u:None)
[ OK ] (3/6) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_8_node %device_type=cpu /071cef44 @BotBuildTests:cpu_zen2+default
P: latency: 4.59 us (r:0, l:None, u:None)
[ OK ] (4/6) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_8_node %device_type=cpu /91996b2d @BotBuildTests:cpu_zen2+default
P: latency: 4.78 us (r:0, l:None, u:None)
[ OK ] (5/6) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.2-gompi-2023b %scale=1_8_node %device_type=cpu /a2dbeca6 @BotBuildTests:cpu_zen2+default
P: latency: 8.04 us (r:0, l:None, u:None)
[ OK ] (6/6) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.1-1-gompi-2023a %scale=1_8_node %device_type=cpu /2185b3fd @BotBuildTests:cpu_zen2+default
P: latency: 7.23 us (r:0, l:None, u:None)
[ PASSED ] Ran 6/6 test case(s) from 6 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-18294924.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Jan 13 15:01:21 UTC 2026 uploaded transfer of eessi-2023.06-software-linux-x86_64-amd-zen2-17683160860.tar.zst to S3 bucket succeeded

@eessi-bot-surf
Copy link

eessi-bot-surf bot commented Jan 13, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: amd-zen2
Building for: x86_64/amd/zen2
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.01/pr_146/18294929

date job status comment
Jan 13 14:53:34 UTC 2026 submitted job id 18294929 will be eligible to start in about 20 seconds
Jan 13 14:53:48 UTC 2026 received job awaits launch by Slurm scheduler
Jan 13 14:54:01 UTC 2026 running job 18294929 is running
Jan 13 14:55:56 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-18294929.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen2-17683160860.tar.zstsize: 0 MiB (25721 bytes)
entries: 2
modules under 2025.06/software/linux/x86_64/amd/zen2/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/amd/zen2/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/amd/zen2/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/amd/zen2
2025.06/init/easybuild/eb_hooks.py
2025.06/scripts/gpu_support/nvidia/easystacks/eessi-2025.06-eb-5.2.0-CUDA-host-injections.yml
Jan 13 14:55:56 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/2) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_8_node %device_type=cpu /7039b5b1 @BotBuildTests:cpu_zen2+default
P: latency: 2.36 us (r:0, l:None, u:None)
[ OK ] (2/2) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_8_node %device_type=cpu /1055c4de @BotBuildTests:cpu_zen2+default
P: latency: 5.2 us (r:0, l:None, u:None)
[ PASSED ] Ran 2/2 test case(s) from 2 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-18294929.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case
Jan 13 15:01:52 UTC 2026 uploaded transfer of eessi-2025.06-software-linux-x86_64-amd-zen2-17683160860.tar.zst to S3 bucket succeeded

@bedroge
Copy link
Contributor

bedroge commented Jan 13, 2026

Note that this PR renames 2025.06/scripts/gpu_support/nvidia/easystacks/eessi-2025.06-eb-5.1.2-CUDA-host-injections.yml to 2025.06/scripts/gpu_support/nvidia/easystacks/eessi-2025.06-eb-5.2.0-CUDA-host-injections.yml for version 2025.06. Deploying this will still result in having both files in the CVMFS repository, so I will remove the old one right after the tarball has been deployed.

@bedroge
Copy link
Contributor

bedroge commented Jan 13, 2026

Staging PR merged, tarballs ingested, and I've removed the old GPU host_injections easystack for 2025.06:

# cvmfs_server transaction software.eessi.io 
# rm /cvmfs/software.eessi.io/versions/2025.06/scripts/gpu_support/nvidia/easystacks/eessi-2025.06-eb-5.1.2-CUDA-host-injections.yml
# cvmfs_server publish -m "remove versions/2025.06/scripts/gpu_support/nvidia/easystacks/eessi-2025.06-eb-5.1.2-CUDA-host-injections.yml" software.eessi.io 
Using auto tag 'generic-2026-01-13T15:22:39Z'
Swissknife Sync: WARNING: cannot apply pathspec /versions/*/software/*/*/*/*/accel/*/*/reprod
Swissknife Sync: WARNING: cannot apply pathspec /versions/*/software/*/*/*/accel/*/*/reprod
Swissknife Sync: Processing changes...
Waiting for upload of files before committing...
Committing file catalogs...
Swissknife Sync: Wait for all uploads to finish
Swissknife Sync: Exporting repository manifest
Statistics stored at: /var/spool/cvmfs/software.eessi.io/stats.db
Tagging software.eessi.io
Flushing file system buffers
Signing new manifest
Remounting newly created repository revision

@bedroge bedroge merged commit 5f36708 into EESSI:main Jan 13, 2026
71 of 78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants