Skip to content

An INI critique of TOML

madmurphy edited this page Oct 7, 2021 · 3 revisions

Be conservative in what you do, be liberal in what you accept from others.

Postel's law

Comparing TOML and INI is not straightforward. The first is a unique standard, the second is a federation of dialects. All INI dialects however are well-defined (every INI file is parsed by some application, and by studying a parser's source code it is possible to deduce its rules), and, if one looks closely, the number of INI dialects actually used in the wild is not infinite. With an inclusive approach in mind, libconfini tries to acknowledge, catalog and extend many of them, so that what once was an informal standard becomes a flexible standard engraved by years of common habits. In referring to “the INI format” this document implicitly refers to that fluid format that libconfini is able to parse; and such flexibility is referred to as one of the language's intrinsic features.

Although it claims to be a human-friendly language, TOML constitutes a step back into something more robotic and primitive when compared to INI files and libconfini's approach. Some of TOML's problems are shared with JSON, which is a problematic format outside the ECMAScript realm. Other are instead problems that TOML has created on its own. The reasons why this document addresses TOML and not, for example, JSON or YAML are two. The first obvious reason is that TOML's syntax is so similar to INI that it is useful to draw a line in front of other formats' problematic design and explain why INI is something else. The other reason is that, while other formats like YAML modestly define themselves as “official subsets” of JSON (which was not born as a configuration format, but rather as a serialization format), TOML claims to be a “minimal configuration file format” – despite being another, definitely not minimal, JSON preprocessor, with dates.

TOML's syntax is documented at https://toml.io/en/v1.0.0 (at the time of writing its current revision is r793.8296d6b). The following paragraphs explain why you might want to avoid TOML for your next configuration file for a C or a C++ application.

Writing a harsh critique of someone else's efforts is never a pleasant task. But it might be a necessary task when these efforts go in the wrong direction. It is possible that at some point TOML designers will fix the issues listed below. If they will do so, they will finally end up re-inventing INI files.

1. Data types

A TOML document syntactically defines data types. This means that writing 89 and writing "89" are two different things (the first is a number, the second is a string). And this also means that a compliant TOML parser must respond to type changes in a TOML document. Today you write "89", your application receives a string and everything goes well, but tomorrow you write 89 and your application must receive a number…

…and crash.

Of course no application would let anyone do that. Any judicious application using TOML for its configuration will be either tolerant towards improper data types, or will be obstinate and refuse type changes. In the first scenario you will have a non-compliant TOML parser (basically an INI parser), in the second scenario you will have a parser slightly more stupid than an INI parser and still non-compliant (it will not allow a TOML document to define a data type). The question is then: why giving the configuration file the power to speak about data types when at the end of the day it does not have this power (the application does)? The whole thing sounds like “You decide. No, wait, I decide.”

In INI files everything is a castable string. It means that an application always receives a string, and such string is always able to produce a boolean, a number, a simple string, or an array of castable strings, without generating errors. But the application decides what to pick up and how to react to it, not the configuration file. If you want to tell the human about it, write it in a comment, but don't give the configuration file the illusion of a power it does not possess.

2. Quotes in values

There is also a deeper issue with data types. Imagine the following configuration file:

[server]
continent = Europe

The value above is not required to be a string, it is required to be a continent name. Writing

[server]
continent = 1009

is not worse than writing

[server]
continent = "Vacuum cleaner"

There is no award to gain in demanding that a value avoid a syntax that for some reason is reserved for numbers when it must not be a lot of other things either. And it does not make much sense to create a “string data type” when a “continent name data type” would be required – once again: the continent key above does not expect a string, it expects a continent name (which is not a string more than the ASCII characters used to express numbers are).

Instead, without any valid reason, TOML's syntax forces humans to encapsulate anything that is not a number, a boolean or a date in quotes, disregarding the fact that this would incorrectly present Europe as a string (it is an enumeration label to be exact – in configuration files most values tend to be enumeration labels of some sort) and despite humans would not need quotes for understanding when a sequence of characters – like poet – is not a boolean, or a number, or a date – as for the machines, that would not be a hard task either.

[shakespeare]
birth = 1564
death = 1616

# invalid in TOML
job = poet

It is probably not a coincidence that one of the first things that the Hjson project did in order to create a dialect of JSON “easy for humans to read and write” was to remove the necessity of using quotes for declaring strings.

TOML's creator claims that unquoted strings are inherently ambiguous. We can try to imagine the following scenario,

version = "252"

where the quotes seem to suggest that also "252.1" (i.e. a string) would be a valid value for the version key. But do they? What about?

version = "252_1%2!3?4-5/6=7"

Would that be a valid version string? What other information does quoting 252 give except that there could also be “something else” than a simple number?

Without a comment that explains exactly how to format the version key there is just no way to make it unambiguous, quotes or not.

#INI

# Please use MAJOR(.MINOR(.REVISION)) here
version = 252.1.0

As in a language that has an extensible semantics, in INI files quotes serve the simple purpose of giving hints, expressing literalness, or removing syntactic (not semantic) ambiguity when there is the risk of it, exactly like a human would do. For instance, an INI file would use quotes like the following example does, for indicating that the # character in #fff000 does not mark a comment.

color = "#fff000"

3. Case sensitivity

TOML's syntax is always case-sensitive, despite the fact that there are situations where a configuration file must be case-insensitive (think of configuration files that map a FAT32 filesystem or HTML tags, for example). INI formats can be either case-sensitive or case-insensitive depending on the application's choice.

4. Unicode key names

TOML's syntax forbids non-ASCII key names unless these are surrounded by quotes.

value in € = 345    # valid with libconfini but invalid in TOML

There is no apparent motivation behind this rule, except that of conforming TOML to JSON, and probably a personal habit in dealing with the latter. But although JSON does have a valid reason to do so because of the programming language it has been designed to work with (ECMAScript property names follow the same rule of identifiers), TOML's reason remains somewhat mysterious.

5. Square brackets

TOML forces arrays to be encapsulated within square brackets (exactly like section paths do), although humans do not need square brackets for recognizing when something is a list.

# not an array in TOML
wishes = apples, cars, elephants, chairs

Nested arrays are also not a valid reason for justifying square brackets, since in INI files it is already possible to nest arrays either by using different delimiters for each level,

wishes = \
    apples : oranges : lemons, \
    cars, \
    elephants : tigers, \
    chairs

or by recursively quoting.

wishes = \
    "apples oranges lemons" \
    cars \
    "elephants tigers" \
    chairs

But there is a more important reason why square brackets are a bad idea in a human-friendly configuration format: one-member arrays. There is no way to convince a human that something composed of only one member is a list (if you think differently, chances are that you are partly non-human). As friendly as they are, INI files behave accordingly, while TOML of course doesn't. Compare this (INI):

wishes = "I am fine"

with this (TOML):

wishes = [ "I am fine" ]

As in the C language, in INI files a one-member array and a simple value are stored in the same way. Of course you can declare a one-member array in INI files: just write a simple string.

Thanks to this, INI arrays do not constitute a syntactically distinct type and any string can be parsed as an array. If you have ever dealt with m4 macro arguments you will know the beauty of this.

6. Array delimiters

TOML forces arrays to be always comma-separated, although a human can recognize a list even when the separator is a mushroom.

[Super Mario]
wishes = jumping 🍄 sneaking into pipes 🍄 princess Peach 🍄 flying
coins = 39586235

libconfini does not allow mushrooms either – but for practical, not philosophical reasons (and the library is not human yet) – but you are free to choose any character within the ASCII range as array delimiter and change it as often as you wish. For instance, in an INI file where normally arrays are comma-separated you might decide that an IP address is also an array, but whose members are separated by dots instead of commas – and just because that is what an IP address actually is, and that might be what your application needs.

7. Mixed arrays

TOML encourages a nightmare for strongly typed languages like C and C++: mixed arrays. In short, after deciding that a configuration file must express strong types (and nevertheless still allowing "Vacuum cleaner" as a continent name), TOML forces applications to be able to mix them and display some kind of support for something that is natively not supported.

An array that mixes numbers, strings and other arrays is something a C or C++ application would escape from. Although it is possible to reach the same result also with INI, with both TOML and INI a mixed array can be just emulated, never really implemented from the C perspective (we have left an example under examples/miscellanea/toml-like.c, and we would discourage anyone from doing it). The difference between INI and TOML? INI syntax has the power to express mixed arrays but does not require applications to map them as such, TOML does.

8. Composite configuration files

TOML's syntax forbids to populate sections in different steps (sections are named “tables” in TOML). The following example, understood by a human and an INI parser, would be forbidden in TOML:

[visitors]
list = karl, lisa, Andrew Smith, rick92

[host]
foo = bar

[visitors]          # invalid in TOML
checked = true      # invalid in TOML

Although this might look like an insignificant detail, allowing to populate a configuration file in different steps can come very much in handy when dealing with the composition of several smaller configuration files.

9. Dates

This is probably the most mysterious part of TOML language. In INI files a value can be interpreted as a boolean, a number, a string, an array, or whatever else you like (although in this last case libconfini will not help you). The situation is kind of similar in TOML (without the “whatever else you like” part), except that a value can also be a date.

There is something intriguing in all this. Even forgetting that an application might not need dates at all, why constraining something so particular and that can be formatted in so many different ways into a rigid primitive? Why not doing that for a path? Or a username? Or an email address? Or a regular expression? …Or a continent name? These have all a more constraining semantics than dates.

In INI files a date is either a time stamp or a human-friendly string.

date = "Thu, 30 Aug 2012 12:31:00 GMT"

10. Empty key names

Although a human would have no idea of what it could possibly mean (and probably a machine would not do any better), TOML's syntax explicitly allows (but discourages) to assign values to empty key names.

"" = "whatever"     # valid in TOML
'' = 'whatever'     # valid in TOML
= 'whatever'        # invalid in TOML (seriously?)

11. Arrays of tables (a.k.a. arrays of sections)

Sometimes what initially appears to be a nice invention can end up being the opposite – yes, this can happen too. We are talking about arrays of tables here (a.k.a. arrays of sections).

Arrays of tables are declared in TOML using the double square bracket notation:

# TOML

[[server]]
ip = "214.252.11.145"
country = "Australia"

[[server]]
ip = "214.252.11.146"
country = "India"

[[server]]
ip = "214.252.11.147"
country = "Sweden"

...

Strictly speaking, arrays of tables introduce the concept of “unnamed tables” – server in the example above is not the name of a table, it is the name of a collection of tables, each of which does not have a name. But independently of the syntactical consequences, this feature carries a major problem: it encourages using configuration files as databases.

The common way to deal with similar scenarios in INI files would be that of keeping a common parent section – so that the application can scroll blindly through the sibling subsections – and making the nesting explicit by giving each subsection a name (in fact, the only unnamed section in INI files can be the document's root):

# INI

[server.main]
ip = 214.252.11.145
country = Australia

[server.secondary]
ip = 214.252.11.146
country = India

[server.broken]
ip = 214.252.11.147
country = Sweden

# You can add an infinite number of `server.*` subsections here and use
# arbitrary names, the application will retrieve all of them.

The INI way is inherently more human-readable (humans like descriptive names), and produces the nice outcome that when the entries have become too many, and naming each of them has become too cumbersome, it is the good sign that you should finally switch to a database format and keep your configuration file clean.

Even TOML's featured example proposes “the INI way”

[servers.alpha]
ip = "10.0.0.1"
role = "frontend"

[servers.beta]
ip = "10.0.0.2"
role = "backend"

instead of “the TOML way”

[[servers]]
ip = "10.0.0.1"
role = "frontend"

[[servers]]
ip = "10.0.0.2"
role = "backend"

for presenting the language.

But if this does not convince you, and at the end of the day you really want to play dirty with your configuration files, INI sill offers you its quirks to reach TOML's effect, without the inconvenience of introducing anonymous sections:

# INI

[server."214.252.11.145"]
country = Australia

[server."214.252.11.146"]
country = India

[server."214.252.11.147"]
country = Sweden

...

It goes without saying that you should not follow TOML in this. INI is not a database format; it targets primarily humans, not machines. If you want to store multiple sections of the same kind, please give them human-friendly names, and have fun.

12. Lack of support for implicit keys

Serialization formats often have shortcuts for expressing a true boolean implicitly. A bare HTML attribute, for instance, is automatically given the "true" value – i.e. the contenteditable attribute in the following example is automatically parsed as contenteditable="true".

<div contenteditable class="my-class"></div>

Similarly, in the following INI fragment from /etc/pacman.conf (Arch), Color is an implicit key representing a true boolean – i.e. Color = YES.

HoldPkg = pacman glibc
Architecture = auto
IgnorePkg =
Color
SigLevel = Required DatabaseOptional
LocalFileSigLevel = Optional

TOML lacks support for implicit keys, and key names not followed by an equals sign always constitute syntax errors.

13. Inline tables must remain… inline

In addition to the INI way, TOML introduces a duplicate way of declaring sections: “inline tables”. The following TOML example:

# TOML

homepage = { page_header = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.", page_footer = "Orci varius natoque penatibus et magnis dis parturient montes." }

is an exact synonym of:

# TOML

[homepage]
page_header = "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
page_footer = "Orci varius natoque penatibus et magnis dis parturient montes."

Besides the visual inconvenience of presenting entire sections like keys, not much would be wrong with this feature, not even the redundancy, had the feature not come with an ugly rule attached: inline tables must remain inline.

Such a coercion would become tolerable after being reminded that in that language a new line is supposed to end a node, if only there had not been an exception that makes it intolerable: arrays, on the contrary, can span multiple lines.

Thus, you can write,

# TOML

homepage = [
	"Lorem ipsum dolor sit amet, consectetur adipiscing elit.",
	"Orci varius natoque penatibus et magnis dis parturient montes."
]

but you cannot write

# Invalid TOML example

homepage = {
	page_header = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.",
	page_footer = "Orci varius natoque penatibus et magnis dis parturient montes."
}

What makes things worse is the fact that this prohibition exists only for the sake of not having two ways of declaring tables that are both multi-line. It is “a moral prohibition”, not dictated by any practical reasons. In short, it exists only for telling you how to behave.

TOML's designers insist saying that allowing multi-line tables declared in this way would break one of TOML's pillars, which is precisely that of terminating a node when a (non-escaped) new line is found. But that is one of INI's pillars, not TOML's: TOML had already betrayed this principle after establishing its array syntax. Looking at the asymmetry above from a different perspective, one could indeed say that the original mistake lies with arrays, not with inline tables (but however one puts it, a mistake lies somewhere).

It is possible to argue further that inline tables bring TOML's syntax closer to JSON. That alone is a good reason to be happy that inline tables are alien in INI files.

14. Incompatibility

By design TOML is explicitly incompatible with about fourty years of configuration files.

15. Immediacy

A configuration file is meant to be edited by a human – possibly someone who has only Microsoft Notepad as text editor and has never heard of TOML or INI before – and editing it should feel like a natural and welcomed thing to do, not like hacking a program source code, especially if this does not give any expressive advantage.

If a person who has never heard of TOML sees the following configuration file,

# TOML

["bank"]
"ip" = "192.168.1.1"
"square root" = 15000

["client"]
"hello world" = ["sunny"]
"foo" = "9234"

how is the person supposed to know that "ip" can be written also without quotes, but that is not the case of "square root", as this contains spaces and keys containing spaces must be always quoted? or that the string ["sunny"] is not a reference to a section name but is an array instead? or what data types a particular array can contain?

INI, on the other hand, encourages human-friendly comments for explaining what is not immediately visible.

# INI

[bank]
ip = 192.168.1.1
square root = 15000

[client]
hello world = sunny  # it is possible to write a comma-separated list here
foo = 9234

You can write comments in TOML as well, of course. But the risk is that they end up being lists of things to avoid that have nothing to do with the application you are configuring, rather than suggestions of what is possible to do.

# TOML

["bank"]
"ip" = "192.168.1.1"    # you can remove the quotes from `"ip"` if you want
"square root" = 15000   # do not remove the quotes from `"square root"`, TOML forbids it

["client"]
"hello world" = ["sunny"]   # `["sunny"]` is not a section name
"foo" = "9234"  # do not remove the quotes from `"9234"`, it is not a number (I know...)

16. Genesis

Configuration files are born out of necessity, and different applications can have different requirements. There are cases where a configuration file differs substantially from the INI format. It is not rare in these situations that developers have ended up abandoning a widespread and solid configuration format such as INI only after realizing that they had no other choice and not without pain.

In this respect, the way libconfini was born is paradigmatic. It was born for an application – an editor – and that application had a very peculiar task: read different types of INI files written in the real world for the applications typically installed on a GNU/Linux distribution. What a better scenario for creating a parser?

The genesis of TOML instead is quite different. Someone without a parser decided that unquoted strings in INI files are ugly and forbad them. A lot of rules have then been added afterwards on paper, without really thinking of any real case usage and only keeping JSON as a reference point.

In the beginning it was still only a specification. Many people, enthusiastic finally to read a specification of something somewhere, started to create their own parser for the newborn language. And that was the moment when problems began to appear.

When you write a parser you might indeed begin to notice contradictions in an apparently unflawed rule, your code might start to become unnecessarily complex because of absurd edge cases, and you might realize that the language you are trying to parse is not that well-designed after all.

And even if you do survive the process of writing a parser that is fully compliant with TOML (some people don't), you still have done only half of the job, that of writing a parser, without really thinking of any real case usage. It is still possible that you have completely wasted your time after all.

There are of course cases where TOML works just fine, and these are the cases where JSON would also work fine (although one has always to tolerate TOML's idea to introduce a syntax for dates). But where JSON works fine, also other JSON dialects more human-friendly than TOML do.

17. Against Postel's law by design

Postel's law is a good indicator of how robust a language is: the more a language is able to make sense of different types of input, the more robust the language.

In front of a heterogeneous landscape like that of configuration files, a parser that applies Postel's law will try to make sense of the largest possible set of habits and explore all possible solutions to avoid that errors be generated merely because of diversity.

TOML is a good example of a language designed against this principle. The language's founding element was that of generating errors when quotes were missing, and subsequent rules seem to have in generating errors their only reason (think of the requirement of using quotes for key names containing spaces or unicode characters, which has no justification whatsoever – if it is for aesthetical reasons, think that if you were a Chinese speaker you would rather be tempted to use quotes for the Latin characters and leave the Chinese ideograms out of quotes instead).

One would think that a language with such tendencies will always have only one way to express the same thing, at least. And instead no: inline tables were introduced as a duplicate of standard tables, despite being less readable and completely alien in the common practice – something vaguely similar were libconfig's sections, but these spanned multiple lines and constituted the only way to declare sections in that language.

At the end of the day TOML's main goal seems to be that of generating errors. The opposite approach, instead, would be that of taking advantage of diversity and regard it is as a strength.

18. Performance

It is not so obvious to talk about “TOML's performance”: TOML is a language, not a particular parser. It is possible however to predict that any TOML-compliant parser will be on average much slower than an INI parser.

This entire section is being created while a TOML parser written in C (tomlc99) tries to parse a 50 MiB file that libconfini usually parses in half a second – and libconfini's primary goal is not speed (yes, we gave the TOML parser an INI file to parse).

This is not a critique to the particular parser chosen – we can assume that tomlc99 is doing its best in its hard task. The reason why a TOML parser will always be slow is the error checking fury required by the language. Where most INI parsers' approach will be that of “don't throw an error unless you really cannot make any sense of what is written in a configuration file – the application will do the rest and will do it better”, the approach of a TOML-compliant parser will be that of searching for errors even when both the application and the human would have already understood a content.

After 13 minutes and 20 seconds our TOML parser has finally parsed…

54691749 bytes parsed in 800.849218 seconds.
Number of bytes parsed per second: 68292.192551

ERROR: cannot parse - line 1: extra chars after value

Of course. Someone forgot to put quotes around the first value.

19. Human-friendly vs. human-readable

“Human-friendly” and “human-readable” might sound as synonyms, but often they are not. Some texts can be very easy to read but hard to edit.

An inscription on the front of the Pantheon in Rome says “Marcus Agrippa, son of Lucius, made this building when consul for the third time”. This is a very human-readable text if you know a bit of Latin. But in order to edit it you would need a ladder, a chisel and the wish to ruin a millenary monument – please do not try to do it.

An emblematic example of this in file formats is JSON. Due to curly brackets, a systematic indentation and a strict syntax it is probably one of the most human-readable serialization formats. But exactly because of the same reasons it is not the most human-friendly one.

Similarly, if used with a syntax highlighter, the human-readability of TOML is comparable to that of INI. Its human-friendliness, instead, lies a few steps below.

20. Aesthetics

Appearance has its importance too. TOML's specification comes with the following example for illustrating the language:

# This is a TOML document.

title = "TOML Example"

[owner]
name = "Tom Preston-Werner"
dob = 1979-05-27T07:32:00-08:00 # First class dates

[database]
server = "192.168.1.1"
ports = [ 8000, 8001, 8002 ]
connection_max = 5000
enabled = true

[servers]

  # Indentation (tabs and/or spaces) is allowed but not required
  [servers.alpha]
  ip = "10.0.0.1"
  dc = "eqdc10"

  [servers.beta]
  ip = "10.0.0.2"
  dc = "eqdc10"

[clients]
data = [ ["gamma", "delta"], [1, 2] ]

# Line breaks are OK when inside arrays
hosts = [
  "alpha",
  "omega"
]

There are many ways of expressing exactly the same content using libconfini. The following is probably the most obvious one:

# examples/ini_files/toml-like.conf

# Relax, this is an INI document.


title = INI Example

[owner]
name = madmurphy
dob = "Sun, 27 May 1979 15:32:00 GMT"

[database]
server = 192.168.1.1    # you can parse an IP address as an array too! :-)
ports = 8000, 8001, 8002
connection_max = 5000
enabled

  # Indentation (tabs and/or spaces) is allowed but not required
  [servers.alpha]
  ip = 10.0.0.1
  dc = eqdc10

  [servers.beta]
  ip = 10.0.0.2
  dc = eqdc10

[clients]
data = gamma : delta, 1 : 2

hosts = alpha, omega

For a parsing example, please have a look at examples/miscellanea/toml-like.c.

Further readings

State of this document

Last revision: October 2021