lua for data representation
my major advocacy is that lua is a very good general purpose language. this is not an opinion shared by all. lua’s major design goal was extending and embedding into existing programs. I think the simplicity and flexibility in design persued towards the end of that goal extends to make lua a very good language in general. that said, it does its job quite well as an embedding language, and one of its more interesting use cases (and actually, one of its oldest use cases) is as a data or configuration format.
lua’s syntax provides many niceties. for instance, lua has an infinite family of multiline strings: the most basic one starts with [[ and ends with ]]. however, any amount of equal signs can be put between the two square brackets: print([======[hello, world]======]). the only way to close a [===[ is with an accompanying ]===], not with a ]] or a ]==], or a ]====].
here is a function that for any string, creates the correct multiline string to represent it:
local function stringize(str)
local equal_signs = ""
for e in str:gmatch("](=*)]") do
if #e >= #equal_signs then
equal_signs = e .. "="
end
end
return ("[%s[%s]%s]"):format(equal_signs, str, equal_signs)
end
of course, you can just use string.format with %q, but multiline strings usually provide nicer looking and more compact results. these strings are very good for representing lua code as strings in other lua code.
anyway, download this file. if you want to consume massive amounts of memory and possibly crash your computer, run it in cpython. if you want to see “hello, world” printed 5 million times, run it in lua. lua’s parser is designed to handle massive files with stride, which is fairly unique and quite necessary if lua is to be used for this purpose.
design and implementation choices like these come together to suit lua for this purpose. I’ve seen 3 main styles of using lua code to represent data or configuration. I think lua manages to accomodate each of them well, and each have their appropriate use.
styles of data representation
JSON style (LON?)
chunks are much like function bodies. they have return values. so, one thing that one can do is just return a big table from a lua program.
return {
foo = "bar",
list = {1, 2, 3},
rule_of_thirds = true,
}
if you’re using this style, you can omit the return and then prepend it when you load the lua files. this makes any trailing statements syntactically invalid because the return statement must be the last statement in the block. keep in mind, though, that this does not prevent arbitrary code because function literals can be used instead.
return (function() arbitrary_code() end)()
trying to strip out function literals to make this technique safer is fraught and defeats the general elegance of this technique. one can fairly readily use JSON instead.
INI style
what is often missed about lua is that it’s subtly full of metaprogrammability. for instance, lua globals are just stored in a table, and which table is used can be controlled by the code itself. this table is known as the “environment”. an environment may be passed to load as an argument. the way this can be used is that you can load a chunk with an empty environment and then capture the values it sets as globals.
you can have a file like this:
-- config.lua
foo = "bar"
list = {1, 2, 3}
rule_of_thirds = true
then, you can get a table with all the configuration as simply as this:
local config = {}
assert(loadfile("config.lua", "t", config))()
imperative style
lua has a bit of sugar that allows you to shed the parenthesis on function calls to make them look cleaner. an imperative style is a very suitable means to represent data in lua, and it’s very powerful.
path "/path/to/things"
config {
foo = "bar",
list = {1, 2, 3},
rule_of_thirds = true,
}
lipsum [[
Voluptatibus laudantium et omnis ut debitis. Similique omnis molestias
consequuntur. Praesentium corporis omnis nihil et quidem dolore. Veritatis
fugiat eaque corporis eligendi officiis praesentium. Libero quo vero
temporibus impedit sit inventore assumenda
]]
programming in lua provides a real world example of this.
security considerations
the major problem with using an entire programming language to represent data is that untrusted data can be readily crafted to be malicious. indeed, I would not use this technique for untrusted data. with that in mind, it is possible to solve or mitigate the potential vulnerabilities.
the main technique for sandboxing lua is controlling the environment. when using lua for data representation, an empty enivronment is the most appropriate (or, an environment that only exposes data constructors). this prevents any I/O or access to the larger program’s state. preventing access to all standard library functions is safest. there are a few standard library functions that have consequences you might not expect.
another concern is denial of service. this is trickier to prevent. it’s fairly trivial to construct a lua chunk which does not return (while true do end). debug.sethook can mitigate this somewhat. even so, it’s probably possible to construct something that uses excessive amounts of memory. that said, denial of service is significantly less problematic than the vulnerabilities that are completely avoided by depriving the untrusted code of any standard library functions.
finally, make sure to reject lua bytecode. untrusted lua bytecode is not safe and may cause arbitrary nasal demons in the lua interpreter. load can accept both bytecode and lua source by default. however, passing "t" for the mode parameter prevents this.
why?
clearly, it’s silly to bring in the entire lua interpreter where you would just use JSON or INI. where this use case of lua shines is when you’re already using lua for a project (and then you already have this powerful data description language), or when you need something with a lot of power or flexibility.
these mini languages can be extended to arbitrary complexity and nuance. unlike JSON, you can add arbitrary type constructors, or anything at all. you can build any kind of language you want, and that’s how lua is designed to be used, and it’s actually approximately the story of how it was invented.
if you’re anything like me, this will sound a little familiar. indeed, many have said similar things about lisp and forth. both of those languages have very different approaches but reach the same conclusion of “inventing your own language”. I believe lua belongs in their pantheon.
bonus content
![Please do not write it as 'LUA', which is both ugly and confusing, because then it becomes an acronym with different meanings [highlighted link] for different people. So, please, write 'Lua' right!](https://citrons.xyz/files/U9eKLdeq.png)

python? js? ruby? what the hell are those? all I care about is LUA.