on the possibility of composing finalizers and FFI
I wasn’t doing anything in particular the other day involving guile or harfbuzz, when I remembered that GC and FFI are really cool, especially luajit’s.
allow me to illustrate with an example. harfbuzz, chosen for no specific reason at all, has a concept of blobs, which are refcounted sequences of bytes. it uses these in a number of places, for example when loading opentype fonts. you can peek at the blob’s contents back with hb_blob_get_data, which gives you a pointer and a length.
say you are in luajit. you get a blob from somewhere and want to get its data. you define a wrapper for hb_blob_get_data:
local hb = ffi.load("harfbuzz")
ffi.cdef [[
typedef struct hb_blob_t hb_blob_t;
const char *
hb_blob_get_data (hb_blob_t *blob, unsigned int *length);
]]
presumably you then arrange to release luajit’s reference on the blob when GC collects a lua wrapper for a blob:
ffi.cdef [[
void hb_blob_destroy (hb_blob_t *blob);
]]
function adopt_blob(ptr)
return ffi.gc(ptr, hb.hb_blob_destroy)
end
ok, so let’s say we get a blob from somewhere and want to copy out its contents as a byte string.
function blob_contents(blob)
local len_out = ffi.new('unsigned int')
local contents = hb.hb_blob_get_data(blob, len_out)
local len = len_out[0];
return ffi.string(contents, len)
end
I’m so happy because this code is correct and there’s nothing wrong with it. I live my life with enjoyment and without strife because roberto ierusalimschy and mike pall watch over me even now, and I know that my code will never access invalid memory.
among GC implementors, it is a truth universally acknowledged that a program containing finalizers must be pretty cool. the semantics of luajit precisely prescribe when GC can happen and what values will be live, so the GC and compiler are constrained to extend the liveness of blob to the entirety of its lexical scope. it is invalid to collect blob immediately after its last use, so the GC will never be made to do that.
it makes sense that I chose luajit for this example because it explicitly accounts for this particular case. if I made a blog post about having a problem with FFI objects being GCed before they leave scope, luajit would surely be the worst choice. some other languages might have an issue with this, but it’s ok because I don’t need to know anything particularly deep about the compiler and run-time to fix the issue.
all I have to do is observe that the GC has no way to know that contents references blob. so why should I expect it to not free blob before contents? so, the solution might be to explicitly create a relationship between them somehow.
local references = {}
local function gc(cdata, finalizer, ...)
local ptr = tonumber(ffi.cast("long long", cdata))
references[ptr] = {...}
ffi.gc(cdata, function(...)
references[ptr] = nil
if finalizer then
return finalizer(...)
end
end)
end
function blob_contents(blob)
local len_out = ffi.new('unsigned int')
local contents = hb.hb_blob_get_data(blob, len_out)
gc(contents, nil, blob)
local len = len_out[0];
return ffi.string(contents, len)
end
this doesn’t involve outsmarting the compiler in any way; it just leverages GC behavior that you would reasonably expect to work.
for more on this topic, try reading the manual for the tool you’re using.