Skip to content

Conversation

@CuteChuanChuan
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

The CaseExpr implementation is expensive. A common usage pattern (particularly in TPC-DS benchmarks) is to protect against divide-by-zero:

CASE WHEN y > 0 THEN x / y ELSE NULL END

This entire expression can be replaced with a simpler divide operation that returns NULL when the divisor is zero, avoiding the overhead of full CASE evaluation.

What changes are included in this PR?

  1. New EvalMethod::DivideByZeroProtection variant - A specialization for the divide-by-zero protection pattern
  2. Pattern detection - Detects patterns like:
  • CASE WHEN y > 0 THEN x / y ELSE NULL END
  • CASE WHEN y != 0 THEN x / y ELSE NULL END
  • CASE WHEN 0 < y THEN x / y ELSE NULL END
  1. Critical validation - Ensures the divisor in the division matches the operand being checked (addresses feedback from PR#12049)
  2. Safe division implementation - Uses Arrow kernels to perform division that returns NULL on zero:
  • eq to create zero mask
  • zip to replace zeros with ones (avoid division error)
  • div to perform division
  • nullif to set NULL where divisor was zero

Are these changes tested?

Yes, added two new tests:

  • test_divide_by_zero_protection_specialization - Verifies pattern is detected and results are correct
  • test_divide_by_zero_protection_specialization_not_applied - Verifies optimization is NOT applied when divisor doesn't match checked operand (key feedback from PR#12049)

Are there any user-facing changes?

No. This is an internal optimization that produces the same results but with better performance.

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Jan 25, 2026
@CuteChuanChuan CuteChuanChuan force-pushed the raymond/11570-optimize-case-when branch 2 times, most recently from 267d8df to 334785f Compare January 25, 2026 11:33
@Dandandan
Copy link
Contributor

run benchmark tpcds

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing raymond/11570-optimize-case-when (334785f) to e5e7636 diff using: tpcds
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and raymond_11570-optimize-case-when
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ raymond_11570-optimize-case-when ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │    73.53 ms │                         72.15 ms │     no change │
│ QQuery 2  │   211.96 ms │                        212.04 ms │     no change │
│ QQuery 3  │   163.11 ms │                        158.57 ms │     no change │
│ QQuery 4  │  1859.54 ms │                       1859.56 ms │     no change │
│ QQuery 5  │   289.80 ms │                        292.57 ms │     no change │
│ QQuery 6  │  1460.86 ms │                       1451.05 ms │     no change │
│ QQuery 7  │   523.20 ms │                        519.77 ms │     no change │
│ QQuery 8  │   174.05 ms │                        174.25 ms │     no change │
│ QQuery 9  │   304.03 ms │                        317.25 ms │     no change │
│ QQuery 10 │   177.88 ms │                        180.29 ms │     no change │
│ QQuery 11 │  1253.95 ms │                       1250.79 ms │     no change │
│ QQuery 12 │    68.28 ms │                         68.87 ms │     no change │
│ QQuery 13 │   556.79 ms │                        557.18 ms │     no change │
│ QQuery 14 │  1886.69 ms │                       1880.89 ms │     no change │
│ QQuery 15 │    31.68 ms │                         31.36 ms │     no change │
│ QQuery 16 │    67.90 ms │                         66.24 ms │     no change │
│ QQuery 17 │   374.36 ms │                        370.23 ms │     no change │
│ QQuery 18 │   199.18 ms │                        200.05 ms │     no change │
│ QQuery 19 │   233.55 ms │                        229.54 ms │     no change │
│ QQuery 20 │    26.44 ms │                         25.91 ms │     no change │
│ QQuery 21 │    39.73 ms │                         38.82 ms │     no change │
│ QQuery 22 │   734.66 ms │                        706.03 ms │     no change │
│ QQuery 23 │  1769.26 ms │                       1780.40 ms │     no change │
│ QQuery 24 │   695.92 ms │                        698.61 ms │     no change │
│ QQuery 25 │   536.61 ms │                        536.98 ms │     no change │
│ QQuery 26 │   128.16 ms │                        130.89 ms │     no change │
│ QQuery 27 │   516.70 ms │                        515.08 ms │     no change │
│ QQuery 28 │   308.95 ms │                        314.87 ms │     no change │
│ QQuery 29 │   462.08 ms │                        452.27 ms │     no change │
│ QQuery 30 │    76.33 ms │                         75.76 ms │     no change │
│ QQuery 31 │   330.66 ms │                        307.32 ms │ +1.08x faster │
│ QQuery 32 │    86.10 ms │                         84.90 ms │     no change │
│ QQuery 33 │   208.08 ms │                        208.13 ms │     no change │
│ QQuery 34 │   160.78 ms │                        158.85 ms │     no change │
│ QQuery 35 │   177.42 ms │                        173.31 ms │     no change │
│ QQuery 36 │   288.57 ms │                        283.71 ms │     no change │
│ QQuery 37 │   268.25 ms │                        263.37 ms │     no change │
│ QQuery 38 │   154.18 ms │                        150.76 ms │     no change │
│ QQuery 39 │   206.35 ms │                        207.82 ms │     no change │
│ QQuery 40 │   182.77 ms │                        182.53 ms │     no change │
│ QQuery 41 │    33.59 ms │                         33.35 ms │     no change │
│ QQuery 42 │   146.85 ms │                        143.80 ms │     no change │
│ QQuery 43 │   129.40 ms │                        125.70 ms │     no change │
│ QQuery 44 │    28.27 ms │                         28.68 ms │     no change │
│ QQuery 45 │    90.93 ms │                         93.64 ms │     no change │
│ QQuery 46 │   326.32 ms │                        320.31 ms │     no change │
│ QQuery 47 │  1002.56 ms │                        992.63 ms │     no change │
│ QQuery 48 │   420.49 ms │                        417.63 ms │     no change │
│ QQuery 49 │   386.66 ms │                        382.37 ms │     no change │
│ QQuery 50 │   351.08 ms │                        329.46 ms │ +1.07x faster │
│ QQuery 51 │   306.92 ms │                        298.73 ms │     no change │
│ QQuery 52 │   146.18 ms │                        145.57 ms │     no change │
│ QQuery 53 │   155.53 ms │                        151.14 ms │     no change │
│ QQuery 54 │   226.95 ms │                        228.22 ms │     no change │
│ QQuery 55 │   146.10 ms │                        144.10 ms │     no change │
│ QQuery 56 │   208.29 ms │                        205.82 ms │     no change │
│ QQuery 57 │   296.35 ms │                        294.90 ms │     no change │
│ QQuery 58 │   503.21 ms │                        515.55 ms │     no change │
│ QQuery 59 │   296.25 ms │                        285.87 ms │     no change │
│ QQuery 60 │   216.64 ms │                        211.77 ms │     no change │
│ QQuery 61 │   249.26 ms │                        245.76 ms │     no change │
│ QQuery 62 │  1264.67 ms │                       1265.81 ms │     no change │
│ QQuery 63 │   156.24 ms │                        154.40 ms │     no change │
│ QQuery 64 │  1215.91 ms │                       1216.99 ms │     no change │
│ QQuery 65 │   354.02 ms │                        345.83 ms │     no change │
│ QQuery 66 │   396.15 ms │                        391.08 ms │     no change │
│ QQuery 67 │   550.29 ms │                        529.99 ms │     no change │
│ QQuery 68 │   374.32 ms │                        371.29 ms │     no change │
│ QQuery 69 │   179.16 ms │                        169.64 ms │ +1.06x faster │
│ QQuery 70 │   499.78 ms │                        485.62 ms │     no change │
│ QQuery 71 │   188.78 ms │                        184.06 ms │     no change │
│ QQuery 72 │  2118.42 ms │                       2108.28 ms │     no change │
│ QQuery 73 │   158.28 ms │                        152.54 ms │     no change │
│ QQuery 74 │   803.43 ms │                        789.61 ms │     no change │
│ QQuery 75 │   419.38 ms │                        405.28 ms │     no change │
│ QQuery 76 │   188.68 ms │                        192.89 ms │     no change │
│ QQuery 77 │   290.51 ms │                        301.41 ms │     no change │
│ QQuery 78 │   956.92 ms │                        947.49 ms │     no change │
│ QQuery 79 │   329.07 ms │                        325.97 ms │     no change │
│ QQuery 80 │   529.17 ms │                        530.97 ms │     no change │
│ QQuery 81 │    55.19 ms │                         54.42 ms │     no change │
│ QQuery 82 │   285.80 ms │                        283.52 ms │     no change │
│ QQuery 83 │    82.97 ms │                         83.56 ms │     no change │
│ QQuery 84 │    76.51 ms │                         74.25 ms │     no change │
│ QQuery 85 │   228.50 ms │                        239.17 ms │     no change │
│ QQuery 86 │    58.58 ms │                         58.22 ms │     no change │
│ QQuery 87 │   161.57 ms │                        148.84 ms │ +1.09x faster │
│ QQuery 88 │   269.38 ms │                        261.30 ms │     no change │
│ QQuery 89 │   172.37 ms │                        170.60 ms │     no change │
│ QQuery 90 │    47.68 ms │                         45.38 ms │     no change │
│ QQuery 91 │   104.84 ms │                        100.60 ms │     no change │
│ QQuery 92 │    86.48 ms │                         84.88 ms │     no change │
│ QQuery 93 │   283.73 ms │                        280.09 ms │     no change │
│ QQuery 94 │    92.47 ms │                         92.04 ms │     no change │
│ QQuery 95 │   252.91 ms │                        246.37 ms │     no change │
│ QQuery 96 │   115.00 ms │                        113.58 ms │     no change │
│ QQuery 97 │   191.15 ms │                        189.40 ms │     no change │
│ QQuery 98 │   220.21 ms │                        220.04 ms │     no change │
│ QQuery 99 │ 13973.38 ms │                      13911.88 ms │     no change │
└───────────┴─────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 50668.05ms │
│ Total Time (raymond_11570-optimize-case-when)   │ 50311.35ms │
│ Average Time (HEAD)                             │   511.80ms │
│ Average Time (raymond_11570-optimize-case-when) │   508.20ms │
│ Queries Faster                                  │          4 │
│ Queries Slower                                  │          0 │
│ Queries with No Change                          │         95 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

@CuteChuanChuan
Copy link
Contributor Author

Hi @andygrove ,
could you PTAL when you have a chance. Thanks!

@alamb
Copy link
Contributor

alamb commented Jan 26, 2026

run benchmark tpcds

@alamb
Copy link
Contributor

alamb commented Jan 26, 2026

Also FYI @pepijnve

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing raymond/11570-optimize-case-when (334785f) to e5e7636 diff using: tpcds
Results will be posted here when complete

@pepijnve
Copy link
Contributor

pepijnve commented Jan 26, 2026

It might be useful to add a microbenchmark to https://github.com/apache/datafusion/blob/main/datafusion/physical-expr/benches/case_when.rs so we can compare before/after. I don't think any of the existing ones will cover this particular pattern.

The TCP-DS queries that contain this pattern (based on grep case *.sql | grep /) are q4, q11, q31, q39, q47, q57, q63, q74, and q89. Looking at the TCP-DS benchmark results it's not clear cut if the extra code provides a meaningful improvement over the ExprOrExpr implementation. Hopefully a microbenchmark can make that more clear.

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and raymond_11570-optimize-case-when
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ raymond_11570-optimize-case-when ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │    74.22 ms │                         73.09 ms │     no change │
│ QQuery 2  │   217.44 ms │                        216.89 ms │     no change │
│ QQuery 3  │   163.87 ms │                        162.40 ms │     no change │
│ QQuery 4  │  2026.32 ms │                       1960.99 ms │     no change │
│ QQuery 5  │   289.74 ms │                        285.49 ms │     no change │
│ QQuery 6  │  1464.31 ms │                       1486.77 ms │     no change │
│ QQuery 7  │   526.02 ms │                        523.15 ms │     no change │
│ QQuery 8  │   174.99 ms │                        179.57 ms │     no change │
│ QQuery 9  │   298.70 ms │                        297.78 ms │     no change │
│ QQuery 10 │   186.01 ms │                        181.85 ms │     no change │
│ QQuery 11 │  1315.39 ms │                       1318.46 ms │     no change │
│ QQuery 12 │    71.85 ms │                         71.62 ms │     no change │
│ QQuery 13 │   563.56 ms │                        564.29 ms │     no change │
│ QQuery 14 │  1875.27 ms │                       1897.99 ms │     no change │
│ QQuery 15 │    31.57 ms │                         31.44 ms │     no change │
│ QQuery 16 │    66.83 ms │                         66.62 ms │     no change │
│ QQuery 17 │   378.66 ms │                        372.09 ms │     no change │
│ QQuery 18 │   196.21 ms │                        202.44 ms │     no change │
│ QQuery 19 │   236.26 ms │                        236.03 ms │     no change │
│ QQuery 20 │    26.90 ms │                         27.83 ms │     no change │
│ QQuery 21 │    38.51 ms │                         38.91 ms │     no change │
│ QQuery 22 │   734.60 ms │                        721.89 ms │     no change │
│ QQuery 23 │  1770.88 ms │                       1806.40 ms │     no change │
│ QQuery 24 │   707.80 ms │                        719.96 ms │     no change │
│ QQuery 25 │   537.13 ms │                        556.43 ms │     no change │
│ QQuery 26 │   129.58 ms │                        136.20 ms │  1.05x slower │
│ QQuery 27 │   514.56 ms │                        535.05 ms │     no change │
│ QQuery 28 │   314.50 ms │                        323.78 ms │     no change │
│ QQuery 29 │   462.49 ms │                        473.25 ms │     no change │
│ QQuery 30 │    75.88 ms │                         79.38 ms │     no change │
│ QQuery 31 │   325.91 ms │                        328.01 ms │     no change │
│ QQuery 32 │    87.79 ms │                         88.77 ms │     no change │
│ QQuery 33 │   211.68 ms │                        212.82 ms │     no change │
│ QQuery 34 │   163.45 ms │                        160.95 ms │     no change │
│ QQuery 35 │   182.50 ms │                        177.59 ms │     no change │
│ QQuery 36 │   287.85 ms │                        304.90 ms │  1.06x slower │
│ QQuery 37 │   267.86 ms │                        258.72 ms │     no change │
│ QQuery 38 │   152.79 ms │                        161.68 ms │  1.06x slower │
│ QQuery 39 │   207.26 ms │                        224.05 ms │  1.08x slower │
│ QQuery 40 │   177.26 ms │                        185.25 ms │     no change │
│ QQuery 41 │    34.06 ms │                         35.76 ms │     no change │
│ QQuery 42 │   145.49 ms │                        146.99 ms │     no change │
│ QQuery 43 │   128.99 ms │                        127.93 ms │     no change │
│ QQuery 44 │    30.55 ms │                         31.27 ms │     no change │
│ QQuery 45 │    91.00 ms │                         92.33 ms │     no change │
│ QQuery 46 │   322.88 ms │                        341.06 ms │  1.06x slower │
│ QQuery 47 │  1044.84 ms │                       1202.10 ms │  1.15x slower │
│ QQuery 48 │   416.25 ms │                        450.28 ms │  1.08x slower │
│ QQuery 49 │   379.54 ms │                        402.49 ms │  1.06x slower │
│ QQuery 50 │   335.35 ms │                        363.33 ms │  1.08x slower │
│ QQuery 51 │   305.38 ms │                        310.19 ms │     no change │
│ QQuery 52 │   147.91 ms │                        148.05 ms │     no change │
│ QQuery 53 │   156.77 ms │                        155.96 ms │     no change │
│ QQuery 54 │   245.74 ms │                        235.85 ms │     no change │
│ QQuery 55 │   146.14 ms │                        150.66 ms │     no change │
│ QQuery 56 │   211.09 ms │                        223.74 ms │  1.06x slower │
│ QQuery 57 │   300.47 ms │                        328.80 ms │  1.09x slower │
│ QQuery 58 │   501.81 ms │                        542.59 ms │  1.08x slower │
│ QQuery 59 │   288.81 ms │                        317.96 ms │  1.10x slower │
│ QQuery 60 │   216.72 ms │                        231.92 ms │  1.07x slower │
│ QQuery 61 │   252.94 ms │                        266.19 ms │  1.05x slower │
│ QQuery 62 │  1322.10 ms │                       1357.68 ms │     no change │
│ QQuery 63 │   162.08 ms │                        161.27 ms │     no change │
│ QQuery 64 │  1251.30 ms │                       1299.58 ms │     no change │
│ QQuery 65 │   371.23 ms │                        405.98 ms │  1.09x slower │
│ QQuery 66 │   415.97 ms │                        420.40 ms │     no change │
│ QQuery 67 │   556.46 ms │                        573.10 ms │     no change │
│ QQuery 68 │   386.47 ms │                        395.70 ms │     no change │
│ QQuery 69 │   179.11 ms │                        174.83 ms │     no change │
│ QQuery 70 │   509.44 ms │                        507.92 ms │     no change │
│ QQuery 71 │   189.63 ms │                        189.14 ms │     no change │
│ QQuery 72 │  2110.07 ms │                       2179.77 ms │     no change │
│ QQuery 73 │   163.03 ms │                        172.29 ms │  1.06x slower │
│ QQuery 74 │   913.76 ms │                        866.29 ms │ +1.05x faster │
│ QQuery 75 │   442.47 ms │                        414.73 ms │ +1.07x faster │
│ QQuery 76 │   199.37 ms │                        194.69 ms │     no change │
│ QQuery 77 │   308.23 ms │                        292.21 ms │ +1.05x faster │
│ QQuery 78 │   970.51 ms │                        981.48 ms │     no change │
│ QQuery 79 │   352.72 ms │                        340.08 ms │     no change │
│ QQuery 80 │   556.21 ms │                        538.21 ms │     no change │
│ QQuery 81 │    58.77 ms │                         54.13 ms │ +1.09x faster │
│ QQuery 82 │   303.43 ms │                        287.00 ms │ +1.06x faster │
│ QQuery 83 │    89.38 ms │                         84.59 ms │ +1.06x faster │
│ QQuery 84 │    71.06 ms │                         72.36 ms │     no change │
│ QQuery 85 │   245.30 ms │                        244.27 ms │     no change │
│ QQuery 86 │    62.83 ms │                         59.27 ms │ +1.06x faster │
│ QQuery 87 │   169.87 ms │                        156.40 ms │ +1.09x faster │
│ QQuery 88 │   283.60 ms │                        273.98 ms │     no change │
│ QQuery 89 │   184.75 ms │                        175.03 ms │ +1.06x faster │
│ QQuery 90 │    48.74 ms │                         49.63 ms │     no change │
│ QQuery 91 │   107.45 ms │                        102.83 ms │     no change │
│ QQuery 92 │    88.32 ms │                         89.57 ms │     no change │
│ QQuery 93 │   294.04 ms │                        284.99 ms │     no change │
│ QQuery 94 │    97.37 ms │                         93.80 ms │     no change │
│ QQuery 95 │   259.53 ms │                        261.05 ms │     no change │
│ QQuery 96 │   117.48 ms │                        117.55 ms │     no change │
│ QQuery 97 │   199.62 ms │                        194.28 ms │     no change │
│ QQuery 98 │   221.47 ms │                        226.23 ms │     no change │
│ QQuery 99 │ 14026.91 ms │                      14042.42 ms │     no change │
└───────────┴─────────────┴──────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                               │ 51527.23ms │
│ Total Time (raymond_11570-optimize-case-when)   │ 52090.93ms │
│ Average Time (HEAD)                             │   520.48ms │
│ Average Time (raymond_11570-optimize-case-when) │   526.17ms │
│ Queries Faster                                  │          9 │
│ Queries Slower                                  │         17 │
│ Queries with No Change                          │         73 │
│ Queries with Failure                            │          0 │
└─────────────────────────────────────────────────┴────────────┘

- Adds a specialization for the common pattern: CASE WHEN y > 0 THEN x / y ELSE NULL END
- Add EvalMethod::DivideByZeroProtection variant
- Add pattern detection in find_best_eval_method()
- Implement safe_divide using Arrow kernels
- Handle CastExpr wrapping on divisor
@CuteChuanChuan CuteChuanChuan force-pushed the raymond/11570-optimize-case-when branch from 334785f to 6479884 Compare January 28, 2026 14:00
@CuteChuanChuan
Copy link
Contributor Author

CuteChuanChuan commented Jan 28, 2026

Microbenchmark result:

Scenario DivideByZeroProtection ExpressionOrExpression Speedup
0% zeros 8.5 µs 75.9 µs 8.9x faster
10% zeros 18.7 µs 71.0 µs 3.8x faster
50% zeros 55.5 µs 52.4 µs ~same
90% zeros 56.1 µs 22.1 µs 2.5x slower

@CuteChuanChuan
Copy link
Contributor Author

CuteChuanChuan commented Jan 29, 2026

Hi @pepijnve , thanks for pointing out the need for a microbenchmark. I compared DivideByZeroProtection with ExpressionOrExpression and got the result. However, I'm not sure if I'm doing it correctly — could you PTAL? Thanks!

}
}

fn benchmark_divide_by_zero_protection(c: &mut Criterion, batch_size: usize) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please pull this additional benchmark code code into its own PR we could then use our benchmarking scripts to compare performance of this PR with main?

Thank you 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Potential optimization for CASE WHEN for protecting against divide by zero

5 participants