Skip to content

Conversation

@ben-schwen
Copy link
Member

@ben-schwen ben-schwen commented Oct 28, 2025

Adds arithmetic for GForce as demanded in #3815 but does not add support for blocks in j like d[, j={x<-x; .(min(x))}, by=y].

@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

❌ Patch coverage is 99.62264% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 99.01%. Comparing base (a325db9) to head (383b60a).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
R/test.data.table.R 95.23% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #7401      +/-   ##
==========================================
- Coverage   99.02%   99.01%   -0.02%     
==========================================
  Files          87       87              
  Lines       16803    16893      +90     
==========================================
+ Hits        16640    16727      +87     
- Misses        163      166       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Oct 28, 2025

  • HEAD=modular_gforce slower P<0.001 for memrecycle regression fixed in #5463
    Comparison Plot

Generated via commit 383b60a

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 2 minutes and 57 seconds
Installing different package versions 44 seconds
Running and plotting the test cases 5 minutes and 8 seconds

@ben-schwen ben-schwen marked this pull request as ready for review November 2, 2025 18:01
@ben-schwen
Copy link
Member Author

I'm also not sure about moving the tests to optimize.Rraw since this feels kind of wrong and not needed after introducing the new levels/optimization parameter to test.

@ben-schwen ben-schwen mentioned this pull request Nov 2, 2025
@MichaelChirico
Copy link
Member

@MichaelChirico I'm also not 100% convinced about the new optimize.Rraw. I guess the whole idea was that we could simply run the script multiple times with different optimization levels. This need was eliminated by adding the optimize parameter to test() which somehow feels cleaner.

I see. I still like the idea of a separate script -- the more we peel out of the behemoth tests.Rraw, the better. "eventually" it would be nice to have most tests live in purpose-made test scripts, IMO.

test(2357.2, fread(paste0("file://", f)), DT)
})

# gforce should also work with Map in j #5336
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one last idea -- what happens when the grouping column is part of the aggregation in j?

DT[, .(sum(b) - mean(a)), by=b]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the grouping column is part of the aggregation we turn off GForce since it will be in .SDall

data.table/R/data.table.R

Lines 430 to 432 in 8129198

for (ii in seq.int(from=2L, length.out=length(jsub)-1L)) {
if (!.gforce_ok(jsub[[ii]], SDenv$.SDall, envir)) {GForce = FALSE; break}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make sure it's covered by a test 👍

Copy link
Member

@MichaelChirico MichaelChirico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About halfway done reading the implementation now. Thanks for your patience with the review! I'm really excited for this to get finished :)


# Optimize expressions using GForce (C-level optimizations)
# This function replaces functions like mean() with gmean() for fast C implementations
.optimize_gforce = function(jsub, SDenv, verbose, i, byjoin, f__, ansvars, use.I, lhs, names_x, envir) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing that comes to mind seeing such a long signature -- using a "struct" instead of passing individual arguments, e.g.

https://stackoverflow.com/questions/31864162/what-are-the-pros-and-cons-of-using-a-struct-argument-v-s-multiple-parameters

There may be some possibility to make the code easier to understand if some arguments are grouped or combined.

Not a requirement but something to ponder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think if I we would use structs/lists then we should probably use them for all helpers here, no (for consistency?), e.g. also .optimize_sd_subset, .optimize_c_expr, .optimize_lapply, .optimize_gforce, .optimize_mean and .attempt_optimize.

For .optimize_gforce I can even see the benefit for the long signature but on the other side we run into the problem that arguments might get lost in there...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, that's why I mentioned it, but not required -- there's no simple path there.

jvnames = c(jvnames, sdvars)
}
# Case 2e: Complex .SD usage - can't optimize
else if (any(all.vars(this) == ".SD")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just drop this branch? since it's not yet supported.

jsub = as.call(ans) # important no names here
jvnames = sdvars # but here instead
list(jsub=jsub, jvnames=jvnames, funi=funi+1L)
# It may seem inefficient to construct a potentially long expression. But, consider calling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth benchmarking this (atime?)... written 14 years ago, I wonder if it's still true 5176108

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants