Hacker News new | past | comments | ask | show | jobs | submit login
Recutils – Tools and libraries to access plain text databases called Recfiles (gnu.org)
106 points by jdemler on Sept 21, 2017 | hide | past | favorite | 46 comments



At CurrySoftware we use recfiles combined with git for all business-processes (incoming and outgoing invoices, customers, etc). It allows us to automate everything we want with simple bash scripts. But we remain flexible because we can perform non-automated tasks manually.


Your processes made me think about a email based interface instead of a bash script, this may allows to easily interact with the database bot without knowing bash or python.


We plan use a Telegam interface for many things (status checks, new invoices etc). Its easier and faster than E-Mail and available everywhere!


Telegram (Messenger)?


Yes. Communicating with telegram from bash is simple. Check out https://www.curry-software.com/en/blog/telegram_unit_fail/ for example.


I'm wondering if it is possible also with Signal (signal.org)


Harder, since they don't have an open API and don't want people using non-standard clients. The clients are open-source though, so probably you can do it.


I have a growing affinity for non-database-databases for personal and low/sparse-traffic projects. Lots less hassle.

Here's one I maintain designed for use with AWS Lambda which uses S3 as a Pythonic data-store: https://github.com/Miserlou/NoDB


I'm always fascinated that people get stuck differentiating between using a database for indexing and as a canonical data store.

Like the majority of media apps, it's either impossible or a huge pain to get them to index without managing.


Apparently you can get output in csv (natively) and json (python one liner) without too much assle (http://swick.2flub.org/recutils_JSON_output.html) which suddently makes it even more interesting.


The link to the newer presentation video on the page is broken but this is probably the same one: https://fscons.org/videos/2011/gnu-recutils-changed-title-an...


Thanks! I was a bit dissapointed that only the older video link was working.


I really like the idea of plain text data files so am really interested in this.

YAML serves this purpose too but I'm not a huge fan of indenting so recfiles look great. Anyone compared and contrasted?

Also are there other resources on this? Would be nice to have Java/C++/Python libraries. (As well as convert to parquet, arrow etc )


I have done some prototyping on a similar idea, but I think with a more idiomatic approach. The idea is mostly adding relational structure (schema) to CSV, and enabling a cleaner lexical syntax (get rid of the line noise).

Might some day dust it off and try to bring it to a more serious level (performance, tooling etc).

http://jstimpfle.de/projects/python-wsl/main.html


Fro docs:

YAML 1 is an example of a hierarchical data storage format which is much more readable than XML. The problem with YAML is that it was designed as a “data serialization language” and thus to map the data constructs usually found in programming languages. That makes it too complex for the simple task of storing plain lists of items.

I dont see how this is true. Provided sample with books is almost identical in yaml.

The main benefit over yaml looks like more control of individial fields but again, yaml based db app could do that too.


I noticed this is GPLv3, which means if you use this library, all your application will have to be open source, however IANAL.


"Using" doesn't require it to be open source. Only if you distribute the resulting binaries, which is basically the "SaaS loophole".

If it were AGPL, then what you said would be more accurate.


They have a page explaining their reasoning: https://www.gnu.org/licenses/why-not-lgpl.en.html


The only reasoning they really need is "we wrote the GPL and we're gonna goddamn use it".


I was surprised that this wasn't LGPL, which seems more suited. Granted it's GNU. So, they're going to do it their way.


This reminds me a bit of using CGI.pm's "Save" function. I built a pretty decent invoicing app using that and the searches for data in documents saved in that format are pretty fast.

I won't pretend to know the ins-and-outs of that but was told on a Perl mail list that the server created a "B-Tree" index when an initial search was made and used that afterwards.


I do something similar with toml files for simple stuff. Python for piping around, but maybe this is more convenient on the commandline.


Did this come out of Amazon ? I remember a similar set of tools for passing along "recs" through pipes.


I guess these are quite slow (because no indexing) once you have a serious number of records? That in itself isn't a problem as long as you understand the scope of the project. I wonder why they didn't use (a well-defined subset of) CSV as the format however.


CSV is neither human-readable nor -writable.

And I don't think the performance issue exists. Computers are fast nowadays. Parsing recfiles is straightforward. Also you could easily archive historic/old/probably irrelevant records.


This is why I was very careful to say "well-defined subset". I wrote a full CSV library[1], and so I'm well aware of how deceptively difficult CSV is to deal with. However with a well-defined subset (and perhaps not using "," as a separator as well) it should be editable for at least simple changes.

[1] https://github.com/Chris00/ocaml-csv


CSV locks you to the same fields per data entry, that makes it a little less flexible. Plus one of the appeals for me is readability of the raw data; CSV gets long and thin very quickly. Granted each will best suit a certain type of data.


No built-in indexing, but no one forbids you from indexing text files if you need it.


But, any indexing system you create won't work with the rec* tools. For example, `recsel` will not be any faster on large files.

Not sure if they have indexing on the roadmap, but it does make sense to me for people that have adopted it and are starting to get bigger databases.

Of course, you could argue that when the files get too big, it's time to switch to a different solution.

It seems that's kind of a natural tension in projects. Do you grow the scope to accommodate existing users with growing use cases? Or, do you draw the line in the sand and have people move on to a different solution?


If you want something lighter to use text files as tables you could try TextQL: https://github.com/dinedal/textql.


Erm... am I the only one that's a little thrown off by the "mascot" in this project?


About the logo

Why is the logo depicting a pair of copulating turtles?

Ask ams@gnu.org.

What is the name of the turtles?

They are called Fred and George. And yes, they are both male.




No I wasn't expecting turtles humping as soon as I landed on the page either.


I'm not fond of the joke right there but alas.

ps: I hope they have org-mode interop.


It says so in the feature list so I guess they probably do.


dammit I failed at simple search, it's indeed listed


Apparently it made it to the FAQ.


It's two turtles on top of each other, what's the problem?


A question in the [FAQ](https://www.gnu.org/software/recutils/faq.html#whyturtles):

> Why is the logo depicting a pair of copulating turtles?


This reminds me that kame.net has a turtle as a logo, but as an incentive to upgrade, when accessed over IPv6, the turtle is animated. So just be grateful the recutils developers didn't have IPv6 when they were looking for inspiration.


Gay animals are a part of nature. There is nothing to be offended about.


I'm not offended at all, just wasn't expecting it. If anything it made me chuckle, it was so out of place, heh.


Probably.


Yeah, this needs to be marked NSFW. ;)

But, yes, I too am a little thrown off by it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: