fix: null in array_agg with DISTINCT and IGNORE #19736

davidlghellin · 2026-01-10T22:23:11Z

Which issue does this PR close?

Closes [BUG] not ignore null in ARRAY_AGG with DISTINCT and IGNORE NULLS #19735.

Rationale for this change

The SingleDistinctToGroupBy optimizer rewrites aggregate functions with DISTINCT into a GROUP BY operation for better performance. However, during this rewrite, it was discarding important aggregate function parameters: null_treatment, filter, and order_by.

This caused queries like ARRAY_AGG(DISTINCT x IGNORE NULLS) to include NULL values in the result because the IGNORE NULLS clause (stored as null_treatment) was being lost during optimization.

What changes are included in this PR?

Preserve aggregate parameters in optimizer: Modified SingleDistinctToGroupBy to extract and preserve null_treatment, filter, and order_by from the original aggregate function when creating the rewritten version.

Add regression test: Added SQL logic test to verify that ARRAY_AGG(DISTINCT x IGNORE NULLS) correctly filters out NULL values.

Files changed:

datafusion/optimizer/src/single_distinct_to_groupby.rs: Extract and pass through filter, order_by, and null_treatment parameters
datafusion/sqllogictest/test_files/aggregate.slt: Add test case for ARRAY_AGG(DISTINCT ... IGNORE NULLS)

Are these changes tested?

Yes:
New SQL logic test in aggregate.slt verifies the fix works end-to-end
Existing optimizer tests continue to pass (19 tests in single_distinct_to_groupby)
Existing aggregate tests continue to pass (20 tests in array_agg)

Are there any user-facing changes?

Bug fix - Users can now correctly use IGNORE NULLS (and FILTER / ORDER BY) with DISTINCT aggregates:

Before (broken):

SELECT ARRAY_AGG(DISTINCT x IGNORE NULLS) 
FROM (VALUES (1), (2), (NULL), (2), (1)) AS t(x);
-- Result: [2, NULL, 1]  ❌ NULL incorrectly included

After (fixed):

SELECT ARRAY_AGG(DISTINCT x IGNORE NULLS) 
FROM (VALUES (1), (2), (NULL), (2), (1)) AS t(x);
-- Result: [1, 2]  ✅ NULLs correctly filtered

Copilot

Pull request overview

This PR fixes a bug where the IGNORE NULLS clause was being lost when optimizing ARRAY_AGG(DISTINCT x IGNORE NULLS) queries. The SingleDistinctToGroupBy optimizer was incorrectly discarding the null_treatment, filter, and order_by parameters when rewriting DISTINCT aggregates into GROUP BY operations.

Changes:

Modified the optimizer to preserve aggregate function parameters (null_treatment, filter, order_by) during the DISTINCT-to-GROUP-BY transformation
Added a regression test to verify ARRAY_AGG(DISTINCT x IGNORE NULLS) correctly filters NULL values

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
datafusion/optimizer/src/single_distinct_to_groupby.rs	Extracts and preserves `filter`, `order_by`, and `null_treatment` parameters when rewriting DISTINCT aggregates
datafusion/sqllogictest/test_files/aggregate.slt	Adds regression test for `ARRAY_AGG(DISTINCT ... IGNORE NULLS)` functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Jefffrey

Nice spot. I notice in the branch below it also doesn't carry over the properties, is this something we should also fix?

datafusion/datafusion/optimizer/src/single_distinct_to_groupby.rs

Lines 212 to 234 in 458b491

    
           } else { 
        
               index += 1; 
        
               let alias_str = format!("alias{index}"); 
        
               inner_aggr_exprs.push( 
        
                   Expr::AggregateFunction(AggregateFunction::new_udf( 
        
                       Arc::clone(&func), 
        
                       args, 
        
                       false, 
        
                       None, 
        
                       vec![], 
        
                       None, 
        
                   )) 
        
                   .alias(&alias_str), 
        
               ); 
        
               Ok(Expr::AggregateFunction(AggregateFunction::new_udf( 
        
                   func, 
        
                   vec![col(&alias_str)], 
        
                   false, 
        
                   None, 
        
                   vec![], 
        
                   None, 
        
               ))) 
        
           }

Jefffrey

Is it possible to find a test case for the 2-phase rewrite cases? 🤔

Jefffrey · 2026-01-11T10:58:10Z

datafusion/optimizer/src/single_distinct_to_groupby.rs

                                    func,
                                    vec![col(&alias_str)],
                                    false,
                                    None,


There's another case here

I’ll look into whether we can add a minimal test case covering the 2-phase rewrite scenario.

davidlghellin · 2026-01-11T13:27:57Z

Is it possible to find a test case for the 2-phase rewrite cases? 🤔

If I understood correctly, I’ve added a couple of tests covering those cases.

Jefffrey · 2026-01-12T01:00:56Z

Looking into it some more, it looks like the two-phase aggregation related changes aren't really needed as they don't affect anything 🤔

The introduced tests don't fail on main, and it seems its because the only supported aggregates for the two-phase aggregation branch are min/max/sum:

datafusion/datafusion/optimizer/src/single_distinct_to_groupby.rs

Lines 85 to 94 in f9697c1

    
           if *distinct { 
        
               for e in args { 
        
                   fields_set.insert(e); 
        
               } 
        
           } else if func.name() != "sum" 
        
               && func.name().to_lowercase() != "min" 
        
               && func.name().to_lowercase() != "max" 
        
           { 
        
               return Ok(false); 
        
           }

See how we bail if we find a non-distinct function that isn't sum/min/max

I guess it doesn't hurt to keep the fix but they won't actually affect anything (since ignore nulls doesn't affect sum/min/max), and we bail out of the rule if we have a filter or order_by in any aggregate:

datafusion/datafusion/optimizer/src/single_distinct_to_groupby.rs

Lines 81 to 83 in f9697c1

    
           if filter.is_some() || !order_by.is_empty() { 
        
               return Ok(false); 
        
           }

fix: null in array_agg with DISTINCT and IGNORE

9195185

Copilot AI review requested due to automatic review settings January 10, 2026 22:23

github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Jan 10, 2026

Copilot started reviewing on behalf of davidlghellin January 10, 2026 22:23 View session

Copilot AI reviewed Jan 10, 2026

View reviewed changes

Jefffrey reviewed Jan 11, 2026

View reviewed changes

suggestion

dcb0595

Jefffrey approved these changes Jan 11, 2026

View reviewed changes

suggestion test

cc4fbe4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: null in array_agg with DISTINCT and IGNORE #19736

fix: null in array_agg with DISTINCT and IGNORE #19736

davidlghellin commented Jan 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Jefffrey left a comment

Uh oh!

Jefffrey left a comment

Uh oh!

Jefffrey Jan 11, 2026

Uh oh!

davidlghellin Jan 11, 2026

Uh oh!

davidlghellin commented Jan 11, 2026

Uh oh!

Jefffrey commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	} else {
	index += 1;
	let alias_str = format!("alias{index}");
	inner_aggr_exprs.push(
	Expr::AggregateFunction(AggregateFunction::new_udf(
	Arc::clone(&func),
	args,
	false,
	None,
	vec![],
	None,
	))
	.alias(&alias_str),
	);
	Ok(Expr::AggregateFunction(AggregateFunction::new_udf(
	func,
	vec![col(&alias_str)],
	false,
	None,
	vec![],
	None,
	)))
	}

fix: null in array_agg with DISTINCT and IGNORE #19736

Are you sure you want to change the base?

fix: null in array_agg with DISTINCT and IGNORE #19736

Conversation

davidlghellin commented Jan 10, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Jefffrey left a comment

Choose a reason for hiding this comment

Uh oh!

Jefffrey Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

davidlghellin Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

davidlghellin commented Jan 11, 2026

Uh oh!

Jefffrey commented Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants