SIMD-accelerated CSV parser for Go - Drop-in replacement for encoding/csv with AVX-512 optimization.
Experimental: Requires Go 1.26+ with
GOEXPERIMENT=simd. The SIMD API is unstable and may change. Not recommended for production use.
import csv "github.com/nnnkkk7/go-simdcsv"
reader := csv.NewReader(strings.NewReader("name,age\nAlice,30\nBob,25"))
records, err := reader.ReadAll()
if err != nil {
log.Fatal(err)
}
// records: [[name age] [Alice 30] [Bob 25]]Same API as encoding/csv - just change the import.
| Feature | Description |
|---|---|
| API Compatible | Drop-in replacement for encoding/csv.Reader and Writer |
| Full CSV Support | Quoted fields, escaped quotes (""), multiline fields, CRLF handling |
| Auto Fallback | Gracefully falls back to scalar on non-AVX-512 CPUs |
| Direct Byte API | ParseBytes() and ParseBytesStreaming() for []byte input |
go get github.com/nnnkkk7/go-simdcsv// All at once
records, err := csv.NewReader(r).ReadAll()
// Record by record
reader := csv.NewReader(r)
for {
record, err := reader.Read()
if err == io.EOF { break }
if err != nil { return err }
// process record
}
// Direct byte parsing (fastest)
records, err := csv.ParseBytes(data, ',')
// Streaming callback
csv.ParseBytesStreaming(data, ',', func(record []string) error {
return processRecord(record)
})writer := csv.NewWriter(w)
writer.WriteAll(records)All standard encoding/csv options are supported:
reader := csv.NewReader(r)
reader.Comma = ';' // Field delimiter (default: ',')
reader.Comment = '#' // Comment character
reader.LazyQuotes = true // Allow bare quotes
reader.TrimLeadingSpace = true // Trim leading whitespace
reader.ReuseRecord = true // Reuse slice for performance
reader.FieldsPerRecord = 3 // Expected fields (0 = auto-detect, -1 = variable)Extended options:
reader := csv.NewReaderWithOptions(r, csv.ReaderOptions{
SkipBOM: true, // Skip UTF-8 BOM if present
})Benchmarks on AMD EPYC 9R14 with AVX-512 (Go 1.26, GOEXPERIMENT=simd).
| Content | Size | encoding/csv | go-simdcsv | Comparison |
|---|---|---|---|---|
| Unquoted fields | 10K rows | 228 MB/s | 309 MB/s | +35% faster |
| Unquoted fields | 100K rows | 214 MB/s | 288 MB/s | +35% faster |
| All quoted fields | 10K rows | 765 MB/s | 537 MB/s | -30% slower |
Note: SIMD acceleration benefits unquoted CSV data at scale. Quoted fields currently have overhead from validation passes.
┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Raw Input │ ──▶ │ scanBuffer() │ ──▶ │ parseBuffer()│ ──▶ │buildRecords()│ ──▶ [][]string
│ []byte │ │ (bitmasks) │ │ (positions) │ │ (strings) │
└─────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
| Stage | Function | Description |
|---|---|---|
| Scan | scanBuffer() |
SIMD scanning in 64-byte chunks. Detects ", ,, \n, \r positions as bitmasks. |
| Parse | parseBuffer() |
Iterates bitmasks to find field boundaries. Outputs fieldInfo and rowInfo. |
| Build | buildRecords() |
Extracts strings from positions. Applies "" → " unescaping. |
| Requirement | Details |
|---|---|
| Go | 1.26+ with GOEXPERIMENT=simd |
| Architecture | AMD64 (x86-64) only |
| SIMD Acceleration | AVX-512 (F, BW, VL) required |
The library still works but falls back to scalar implementation (no speedup). This includes:
- Most CI environments (GitHub Actions
ubuntu-latest, etc.) - Intel CPUs before Skylake-X
- AMD CPUs before Zen 4
- Apple Silicon (ARM64)
# Build
GOEXPERIMENT=simd go build ./...
# Test
GOEXPERIMENT=simd go test -v ./...
# Benchmark
GOEXPERIMENT=simd go test -bench=. -benchmem- Experimental API:
simd/archsimdmay have breaking changes in future Go releases - Memory: Reads entire input into memory (streaming I/O planned)
- Quoted fields: Currently slower than
encoding/csvfor heavily quoted content
Contributions are welcome! Please open issues or pull requests on GitHub.
MIT License - see LICENSE file for details.