Skip to content

Conversation

@simonvandel
Copy link
Contributor

@simonvandel simonvandel commented Jan 10, 2026

Which issue does this PR close?

  • Closes #.

Rationale for this change

When a case expression has identical result expressions, we can remove the case, replacing it with the expression.
It's unlikely that a human would write such an expression, but a generated query may lead to such an expression.

Example where this optimization applies:

CASE v when 100 then 1 else 1 end

What changes are included in this PR?

Extend the expression simplifier to detect the case.

Are these changes tested?

Yes, added SLT.
The first commit shows the plan before the optimization.

Are there any user-facing changes?

Slightly better plan so fewer runtime ops

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Jan 10, 2026
@simonvandel
Copy link
Contributor Author

CI found an error using the sqlite test suite:

External error: 1 errors in file /__w/datafusion/datafusion/datafusion-testing/data/sqlite/random/expr/slt_good_102.slt

1. query is expected to fail, but actually succeed:
[SQL] SELECT 57 * - - 12 * + + 83 + - 74 + - + CASE + ( COUNT ( * ) ) WHEN + 60 THEN + CAST ( NULL AS INTEGER ) ELSE NULL END + COUNT ( ALL - 4 + 61 / 0 ) * 29 * 28
at /__w/datafusion/datafusion/datafusion-testing/data/sqlite/random/expr/slt_good_102.slt:8867

Makes sense that we should not remove case expressions that could cause runtime-failures (divide by zero error in this case).

The question is how to detect if an expression could cause a side effect.
I couldn't find any Expr::side_effect_free (or Expr::has_side_effect). Am I missing something? Does anyone have ideas on how this could be handled?
We could of course restrict this optimization to only apply to simpler expressions like literals or columns, but it would be cool if we could apply this optimization more generally

@Dandandan
Copy link
Contributor

Dandandan commented Jan 11, 2026

CI found an error using the sqlite test suite:

External error: 1 errors in file /__w/datafusion/datafusion/datafusion-testing/data/sqlite/random/expr/slt_good_102.slt

1. query is expected to fail, but actually succeed:
[SQL] SELECT 57 * - - 12 * + + 83 + - 74 + - + CASE + ( COUNT ( * ) ) WHEN + 60 THEN + CAST ( NULL AS INTEGER ) ELSE NULL END + COUNT ( ALL - 4 + 61 / 0 ) * 29 * 28
at /__w/datafusion/datafusion/datafusion-testing/data/sqlite/random/expr/slt_good_102.slt:8867

Makes sense that we should not remove case expressions that could cause runtime-failures (divide by zero error in this case).

The question is how to detect if an expression could cause a side effect. I couldn't find any Expr::side_effect_free (or Expr::has_side_effect). Am I missing something? Does anyone have ideas on how this could be handled? We could of course restrict this optimization to only apply to simpler expressions like literals or columns, but it would be cool if we could apply this optimization more generally

I think we could mark certain operators as pure/safe/non-fallible (will never cause runtime error) and operators as division as non-pure and only optimize subtrees with only pure / non-failing expressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants