Releases: medialab/xan
Releases Β· medialab/xan
v0.54.1
v0.54.0
The SIMD update.
Breaking
- Bumping MSRV to
1.83.0. - Dropping
xan plot -Y/--add-series. It is now possible to select multiple columns as<y>inxan plot <x> <y>instead. - Dropping the
-C/--force-colorsflag inflatten,heatmap,hist,plotandviewin favor of the more standardized and flexible--color=(auto|never|always)flag. xan joinwill now automatically drop joined columns from one the files when it is obviously safe to do so.xan behead&xan renamedo not normalize the output anymore to be as fast as possible.- The new SIMD CSV parser might not deal with CSV irregular cases the same way
rust-csvdid. In any case,xan inputwill still continue to userust-csv. xan slice -B/--byte-offset&xan slice -A/--accumulateare now mutually exclusive.xan inputhas been overhauled.- Dropping
xan count --sample-size. - Overhauling
xan fixlengthsto accept streams by shifting default from double-pass read to buffering the whole stream into memory. xan plot --x-scale log & --y-scale logare now natural log. Uselog10for the base10 log as before.- Dropping
xan reverse -m/--in-memoryflag. Behavior is now automatically detected. - Dropping
xan shuffle -m/--in-memoryflag. Loading the file into memory is now the default. Thexan shuffle -e/--externalflag has been added if
you want the old default behavior. xan binsnow outputs<empty>values instead of<nulls>.- Overhauling
xan bins. The default is now to find nice boundaries for the bins. Use-e/--exactto revert to the old behavior. The default number of bins is now10, and won't use Freedman-Diaconis rule by default. A-H/--heuristicflag has been added if you want to automatically select a suitable number of bins.
Features
- Adding
xan flatten -F/--flatter. xan pivotcan now target multiple columns.- Adding the
xan grepcommand for fast but coarse filtering. - Adding
xan search -f/--flag. - Adding
xan map -F/--filter. xan search -B/--breakdownnow consolidates the results when multiple patterns have a same name.- Adding
xan flatten --row-separator. - Adding
xan flatten --csv. - Adding
xan headers --color. - Adding the
xan join <columns> <input1> <input2>arity as a convenience when joined column names are the same in both inputs. - Adding
xan join -D/--drop-key=(none|both|left|right). - Adding
xan fuzzy-join -D/--drop-key=(none|both|left|right). - Adding
xan plot -A/--aggregate. - Adding support for plural selection clauses in both
xan select -e&xan mape.g.xan map 'full_name.split(" ") as (first_name, last_name). - Adding
xan search -P/--add-pattern. - Adding
xan groupby -M/--along-matrix. - Adding
xan groupby -T/--total. - Adding support for
.ndjson&.jsonlfiles. Those are considered as headless TSV files with null byte quoting so you can easily use them withxancommands. - Adding out-of-the-box support for
.vcf,.sam,.bed,.gtf&.gff2files. - Adding a
xan cat colsalias toxan cat columns. - Adding
zstdsupport. - Adding
earliest&latestmoonblade functions. - Adding
xan dedup -f/--flag. - Adding
-kshort flag forxan dedup --keep-duplicates, and-Cshort flag forxan dedup --choose. - Adding
xan fixlengths -H/--trust-header. - Adding
xan separate. - Adding full log scale support to
xan plot. - Adding
xan hist --scale. xan windowis now able to run total aggregations.- Adding
thousands_sep,commaandsignificancekwargs tonumfmtmoonblade function.
Fixes
- Fixing
xan dedup --checkbug where the first record was ignored. - Fixing
xan hist -Dwhen a same date is found multiple times. - Fixing
xan from -f xlsdatetime conversion. - Fixing
xan flatten&xan viewwhen column names contain line breaks. - Fixing invalid argument parsing error being printed to stdout instead of stderr.
- Fixing
xan progressSIGINT corrupting output. - Fixing
xan enum -A/--accumulate. - Fixing
xan from -f tarwhen tarball archive is not gzipped. - Fixing
min&maxmoonblade function when passing a list of numbers. - Fixing
xan flatten -Hedge cases. - Fixing commands requiring seekable streams accepting unindexed compressed files by error.
- Fixing
xan plot --count --y-scale log.
Performance
- Wildly improving performance of most of
xancommands by leveraging a novel SIMD CSV parser/writer. - Improving performance of
xan from -f txt&xan from -f npy. - Improving memory footprint of hash-based commands (e.g.
frequency,groupby,dedupetc.). - Improving performance of
xan progress,xan range,xan enum,xan behead,xan rename.
Quality of Life
xan parallel catnow flushing more consistently.- Better highlighting of problematic strings in
xan flatten,xan view&xan headers. xan parallelwill now generally stop as soon as an error is detected in a subprocess and cleanly report errors.- Better argv parsing error UX in general.
- The
-pflag will now avoid going further than 16 to avoid issues on server with many CPUs where hogging the resources is an issue and where using too much threads at once could hurt performance. The-tflag remain available to tweak the number of threads. xan histwill now dim bars having a0count so you can easily distinguish them from non-empty bars.
v0.54.0-rc.4
Bump 0.54.0-rc.4
v0.54.0-rc.3
Bump 0.54.0-rc.3
v0.54.0-rc.2
Bump 0.54.0-rc.2
v0.54.0-rc.1
Bump 0.54.0-rc.1
v0.53.0
Breaking
xan partitionnow normalizes filenames to lowercase to correctly deal with case-insensitive filesystems.xan partitionalso gets a related-C/--case-sensitiveflag.
Features
- Adding
allandanymoonblade higher-order functions. - Allowing moonblade
printffunction to be called with lists. - Adding
-f/--evaluate-fileflag tomap,filter,flatmap&transformcommands. - Adding
xan map -O/--overwrite.
Fixes
- Fixing
xan top -T/--tiesedge case. - Fixing broken pipe panics for some commands.
- Dropping remnant
dbg!macro when reading files in reverse.
Performance
- Using
jemallocatorfor musl builds.
Quality of Life
- Better moonblade
printffunction error messages.
v0.53.0-rc.1
Bump 0.53.0-rc.1
v0.52.0
Breaking
xan search --countwill not emit rows with 0 matches anymore unless--leftis used.
Features
xan transformis now able to work on a selection of columns, rather than on a single column.- Adding the
xan unpivotcommand. - Adding the
xan pivotcommand. - Adding
xan join --semi&xan join --anticommands. - Adding
xan slice --raw. - Adding default expression argument to
lead&lagwindow functions. - Adding
shlex_split,cmdandshellmoonblade functions. - Adding
aarch64-apple-darwinandaarch64-unknown-linux-gnuto CI builds. - Adding
to_fixedmoonblade function. - Adding decimal places optional argument to
ratio&percentageaggregation functions. - Adding
frac&dense_rankaggregation functions toxan window.
Fixes
- Loosening
xan partitionsanitizer to allow hyphens, dashes and points. - Fixing
xan parallel --progressdisplay. - Fixing logic error in
xan search -Bwhen using without--left. - Fixing
xan parallel catwhen working on file chunks with-Por-H. - Fixing moonblade list/string slicing with some combinations of negatives indices.
- Fixing moonblade
splitfunction not using regex patterns properly. - Fixing moonblade parsing wrt regex patterns and comments (using a regex pattern containing
#was not possible). - Fixing
leadwindow aggregation function when working on any column that is not the first one. - Fixing
xan view -S/--significancebeing overzealous, especially wrt integers.
Performance
- Improving performance of
xan parallelwhen working on file chunks.
Quality of Life
xan headersnow report more useful information when files have diverging headers.- Better error messages for
read_jsonandparse_jsonmoonblade functions. xan view -pwill not engage pager when input errored or is empty.xan select -e & -fbecome boolean flags instead of error-inducing invocation variants.
v0.52.0-rc.4
Bump 0.52.0-rc.4