Skip to content
This repository was archived by the owner on Oct 29, 2024. It is now read-only.
/ datautils Public archive

🐹 Collection of handy text manipulation tools

License

Notifications You must be signed in to change notification settings

sfischer13/datautils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

bff1372 · May 1, 2021

History

68 Commits
May 1, 2021
May 1, 2021
Dec 20, 2017
May 1, 2021
Dec 21, 2017
May 1, 2021
May 1, 2021
Dec 18, 2017
Dec 21, 2017
Dec 18, 2017
May 1, 2021
May 1, 2021
May 1, 2021
May 1, 2021
May 1, 2021
May 1, 2021
Apr 2, 2018
Apr 2, 2018
Dec 19, 2017
May 1, 2021
Apr 2, 2018
Apr 3, 2018
Apr 2, 2018
Apr 2, 2018
Apr 2, 2018
Dec 20, 2017

Repository files navigation

datautils logo

datautils

The best toolbox for processing textual data.

Release License Go Report Card


Contents

Introduction

The Data Utilities are a collection of handy text manipulation tools. These tools are supposed to make a data wrangler’s life on the command-line easier.

Much of the functionality can be solved with standard command-line tools (awk, sed, cut, sort, uniq, …), but that would often become tedious. Zealots of the Unix philosophy will probably not use these tools and create a set of sophisticated aliases instead.

On the other hand, some of the tools fix actual problems. The tools use UTF-8 by default. As a consequence, one does not have to deal with the quirks of sort and uniq w.r.t. non-ASCII input.

Installation

go get -v github.com/sfischer13/datautils/...

Tools

These tools are part of the collection:

  • count
  • norm
  • rows
  • text
  • trim

Usage

count

$ echo "a\na\na\nb\nb\nc"
a
a
a
b
b
c
$ echo "a\na\na\nb\nb\nc" | count --keys
3	a
2	b
1	c
$ echo "a\na\na\nb\nb\nc" | count --counts
1	c
2	b
3	a
$ echo "a\na\na\nb\nb\nc" | count --flip
a	3
b	2
c	1
$ echo "a\na\na\nb\nb\nc" | count --threshold 2
3	a
2	b

norm

$ echo "¹²³" | norm --nfc
¹²³
$ echo "¹²³" | norm --nfkc
123

rows

echo "a\nb\nc\nd\ne" | rows --rows 2:4
b
c
d
echo "a\nb\nc\nd\ne" | rows --rows 1,5
a
e

text

$ echo abca | text chars
a
b
c
a
$ echo "This is a test." | text words
This
is
a
test.

trim

$ echo "   abc" | trim --left
abc

Credits

This project is authored and maintained by Stefan Fischer.
The source code is available under the MIT License.
See LICENSE for further details.