pest to chumsky migration #185

gerau · 2025-12-18T11:12:21Z

No description provided.

apoelstra · 2025-12-18T13:21:06Z

cc @canndrew may want to keep an eye on progress here

gerau · 2026-01-12T13:16:25Z

Right now there is a working parser using the chumsky crate which replicates the behavior of the pest parser in terms of building a correct parse tree -- it should produce the same Simplicity program. This implementation also fixes #79.

Error reporting is currently broken because we need to replace the logic of parse::ParseFromStr to return multiple errors or handle recoverable errors differently, and error recovery is proving to be more overwhelming than I estimated it would be.

The code will be refactored because some parts are only half-finished (such as adding Spanned for certain names) and there are better ways to use parser combinators. However, I want to show this progress before implementing error recovery.

gerau · 2026-01-12T13:16:48Z

cc @canndrew

uncomputable · 2026-01-12T15:19:49Z

src/lib.rs

    }

    #[test]
-    #[ignore]


1b1e751 It's nice to see that chumsky seems to be faster than pest here.

src/error.rs

canndrew · 2026-01-16T08:10:06Z

src/error.rs

+                    })
+                    .map_or(0, |ts| u32::from(ts) as usize);
+
+                let start_col = file[line_start_byte..self.span.start].chars().count();


Do we want to count columns as being the number of utf8 codepoints? There's no good way to define "number of columns" in general for non-ascii text, but LSP defines it as the number of utf16 codepoints and that's the closest thing to a standard that I'm aware of.

Actually I just checked and LSP now allows you to choose between utf{8,16,32} at your leisure. But it's moot anyway since this is just deciding how long an underline to print and that's going to depend on the terminal.

We should consider switching to ariadne for error pretty-printing, as it's the "sister-crate" for chumsky.

canndrew · 2026-01-16T08:19:09Z

It's weird that the lexer is treating all our built-in macro/function/etc names as being keywords. I realize that's how the compiler currently works, so it's okay to land this PR as-is to keep the changes small. But obviously we'd want to eventually treat these as just being identifiers.

The lexer parses incoming code into tokens, which makes it simpler to process using `chumsky`.

This commit introduce multiple changes, because it full rewrite of parsing and error Changes in `error.rs`: - Change `Span` to use byte offsets in place of old `Position` - Add `line-index` crate to calculate line and column of byte offset - Change `RichError` implementation to use new `Span` structure - Implement `chumsky` error traits, so it can be used in error reporting of parsers - add `expected..found` error Changes in `parse.rs`: - Fully rewrite `pest` parsers to `chumsky` parsers. - Change `ParseFromStr` trait to use this change.

This adds `ParseFromStrWithErrors`, which would take `ErrorCollector` and return an `Option` of AST. Also changes `TemplateProgram` to use new trait with collector

it's not slow anymore

gerau mentioned this pull request Dec 26, 2025

Refactor parsing and analysis for better tooling support #191

Open

gerau force-pushed the simc/chumsky-migration branch from 6db55db to 1b1e751 Compare January 12, 2026 13:01

uncomputable reviewed Jan 12, 2026

View reviewed changes

src/lib.rs

}

#[test]

#[ignore]

Copy link

Collaborator

uncomputable Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1b1e751 It's nice to see that chumsky seems to be faster than pest here.

gerau force-pushed the simc/chumsky-migration branch from 1b1e751 to 1e7c61b Compare January 14, 2026 15:10

canndrew reviewed Jan 16, 2026

View reviewed changes

src/error.rs Outdated Show resolved Hide resolved

canndrew reviewed Jan 16, 2026

View reviewed changes

src/error.rs Outdated Show resolved Hide resolved

canndrew reviewed Jan 16, 2026

View reviewed changes

gerau force-pushed the simc/chumsky-migration branch from 3592b31 to c03241c Compare January 16, 2026 16:38

gerau added 5 commits January 20, 2026 17:42

add lexer

e6db1b5

The lexer parses incoming code into tokens, which makes it simpler to process using `chumsky`.

add ErrorCollector

f12e380

add multiple error handling

297f78b

This adds `ParseFromStrWithErrors`, which would take `ErrorCollector` and return an `Option` of AST. Also changes `TemplateProgram` to use new trait with collector

remove #[ignore] above fuzz_slow_unit_1()

bd5c30f

it's not slow anymore

gerau force-pushed the simc/chumsky-migration branch from c03241c to bd5c30f Compare January 20, 2026 16:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pest to chumsky migration #185

pest to chumsky migration #185

gerau commented Dec 18, 2025

Uh oh!

apoelstra commented Dec 18, 2025

Uh oh!

gerau commented Jan 12, 2026

Uh oh!

gerau commented Jan 12, 2026

Uh oh!

uncomputable Jan 12, 2026

Uh oh!

Uh oh!

Uh oh!

canndrew Jan 16, 2026

Uh oh!

canndrew Jan 16, 2026

Uh oh!

gerau Jan 20, 2026

Uh oh!

canndrew commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pest to chumsky migration #185

Are you sure you want to change the base?

pest to chumsky migration #185

Conversation

gerau commented Dec 18, 2025

Uh oh!

apoelstra commented Dec 18, 2025

Uh oh!

gerau commented Jan 12, 2026

Uh oh!

gerau commented Jan 12, 2026

Uh oh!

uncomputable Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

canndrew Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

canndrew Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

gerau Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

canndrew commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants