Storing Julia code in a Mongo database

I want to store and organize arbitrary code.

We’re going to use Mongoc.jl to drive the database, but Mike Inne’s BSON.jl which is more robust and all-julia. I forked BSON.jl to do a few modifications which aren’t needed for the minimal example code at the bottom:

] add Mongoc
] add BSON
] add /Users/afq/Documents/Dropbox/MyLibraries/BSON.jl#master

In [5]:

using Mongoc
using BSON # A different BSON implementation

You’ll have to set up your own server to follow along with the write queries. The password is left out because this is the read/write account. The queries at the bottom of this post connect to the database with a read-only account.

In [6]:

client2=Mongoc.Client("mongodb+srv://train:PASSWORD@codedump-pmluz.azure.mongodb.net/test?retryWrites=true")
Client(URI("mongodb+srv://train:train@codedump-pmluz.azure.mongodb.net/test?retryWrites=true"))

Let’s do a couple of trivial operations to make sure the database driver works:

In [7]:

Mongoc.ping(client2)
BSON("{ "ok" : 1 }")

In [4]:

document = Mongoc.BSON("a" => 1, "b" => "field_b",
    "c" => [1, 2, 3])
push!(client2["mydb"]["collection"], document)
BSON("{ "a" : 1, "b" : "field_b", "c" : [ 1, 2, 3 ] }")

Getting expressions into it

We store a function-symbol or an anonymous function.

In [26]:

f(x) = 2*x
f (generic function with 1 method)

In [25]:

g = (x) -> 2*x
#5 (generic function with 1 method)

When we compare the two of them, the anonymous function saves all of the referenced data. The BSON library considers the first case as a “leaf” symbol that will be available in the namespace when we load at a later time, versus a deep data structure that needs to be traversed and stored.

In [69]:

# This is from my fork:
doc = BSON.@documentize(f)
doc[:f]
Dict{Symbol,Any} with 3 entries:
  :tag  => "struct"
  :type => Dict{Symbol,Any}(:tag=>"datatype",:params=>Any[],:name=>Any["Main", …
  :data => Any[]

In [86]:

doc = BSON.@documentize(g)
doc
Dict{Symbol,Any} with 2 entries:
  :g         => Dict{Symbol,Any}(:tag=>"struct",:type=>Dict{Symbol,Any}(:tag=>"…
  :_backrefs => Any[Dict{Symbol,Any}(:tag=>"struct",:type=>Dict{Symbol,Any}(:ta…

Let’s try out a round trip of writing to a buffer:

In [74]:

buf = IOBuffer()
BSON.@save buf g
bufs=seek(buf, 0)
d = BSON.load(bufs)
Dict{Symbol,Any} with 1 entry:
  :g => ##5#6()

In [76]:

d[:g](3)
6

The first hack: writing to a buffer with library #1 to loading it into library #2 to send to the database driver:

In [31]:

buf = IOBuffer()
BSON.@save buf g
bufs=seek(buf, 0)
k= Mongoc.read_bson(bufs)
push!(client2["mydb"]["collection_func"], k[1] )
Mongoc.InsertOneResult{Mongoc.BSONObjectId}(BSON("{ "insertedCount" : 1 }"), BSONObjectId("5cc6a4f3b589b4026e5fb433"))

(The glass-half-full way to think about this is translating from the cutting- edge pure-julia implementation into the C data structures that the database implementation provides.)

Looking at what we stored

mongodb atlas

Get expressions back

Now we can turn around and pull everything that we saved back:

In [72]:

c = collect(client2["mydb"]["collection_func"]);
g_doc = c[2] # I checked
BSON("{ "_id" : { "$oid" : "5cc6a4f3b589b4026e5fb433" }, "g" : { "tag" : "struct", "type" : { "tag" : "jl_anonymous", "params" : [  ], "typename" : { "tag" : "backref", "ref" : 1 } }, "data" : [  ] }, "_backrefs" : [ { "tag" : "struct", "type" : { "tag" : "datatype", "params" : [  ], "name" : [ "Main", "Core", "TypeName" ] }, "data" : [ "1.1.0", { "tag" : "symbol", "name" : "##5#6" }, { "tag" : "svec", "data" : [  ] }, { "tag" : "datatype", "params" : [  ], "name" : [ "Main", "Core", "Function" ] }, { "tag" : "svec", "data" : [  ] }, { "tag" : "svec", "data" : [  ] }, true, false, false, 0, [ { "tag" : "symbol", "name" : "#5" }, [ { "tag" : "struct", "type" : { "tag" : "datatype", "params" : [  ], "name" : [ "Main", "Core", "Method" ] }, "data" : [ { "tag" : "ref", "path" : [ "Main" ] }, { "tag" : "symbol", "name" : "#5" }, { "tag" : "symbol", "name" : "In[25]" }, 1, { "tag" : "datatype", "params" : [ { "tag" : "jl_anonymous", "params" : [  ], "typename" : { "tag" : "backref", "ref" : 1 } }, { "tag" : "datatype", "params" : [  ], "name" : [ "Main", "Core", "Any" ] } ], "name" : [ "Main", "Core", "Tuple" ] }, { "tag" : "svec", "data" : [  ] }, null, 2, false, 0, { "tag" : "struct", "type" : { "tag" : "datatype", "params" : [  ], "name" : [ "Main", "Core", "CodeInfo" ] }, "data" : [ [ { "tag" : "struct", "type" : { "tag" : "backref", "ref" : 2 }, "data" : [ { "tag" : "symbol", "name" : "call" }, [ { "tag" : "struct", "type" : { "tag" : "datatype", "params" : [  ], "name" : [ "Main", "Core", "GlobalRef" ] }, "data" : [ { "tag" : "ref", "path" : [ "Main" ] }, { "tag" : "symbol", "name" : "*" } ] }, 2, { "tag" : "struct", "type" : { "tag" : "datatype", "params" : [  ], "name" : [ "Main", "Core", "SlotNumber" ] }, "data" : [ 2 ] } ] ] }, { "tag" : "struct", "type" : { "tag" : "backref", "ref" : 2 }, "data" : [ { "tag" : "symbol", "name" : "return" }, [ { "tag" : "struct", "type" : { "tag" : "datatype", "params" : [  ], "name" : [ "Main", "Core", "SSAValue" ] }, "data" : [ 1 ] } ] ] } ], { "tag" : "array", "type" : { "tag" : "datatype", "params" : [  ], "name" : [ "Main", "Core", "Int32" ] }, "size" : [ 2 ], "data" : { "$binary" : { "base64": "AQAAAAEAAAA=", "subType" : "00" } } }, null, 2, [ { "tag" : "struct", "type" : { "tag" : "datatype", "params" : [  ], "name" : [ "Main", "Core", "LineInfoNode" ] }, "data" : [ { "tag" : "ref", "path" : [ "Main" ] }, { "tag" : "symbol", "name" : "#5" }, { "tag" : "symbol", "name" : "In[25]" }, 1, 0 ] } ], { "$binary" : { "base64": "", "subType" : "00" } }, { "$binary" : { "base64": "AAA=", "subType" : "00" } }, [ { "tag" : "symbol", "name" : "#self#" }, { "tag" : "symbol", "name" : "x" } ], false, false, false, false ] } ] } ], 2, null ] ] }, { "tag" : "datatype", "params" : [  ], "name" : [ "Main", "Core", "Expr" ] } ] }")

The BSON library throws an error if some of the entries don’t have a Julia interpretation, so we strip these out:

In [103]:

strip_info(doc::Dict) = filter( kv->kv[1]!="_id", doc)
strip_info(doc::Mongoc.BSON) = Mongoc.BSON( strip_info(Dict(doc)) )
strip_info (generic function with 3 methods)

Now we do the opposite: create temporary buffer, write the Mongoc return to it, then load the symbols and expressions back into the namespace:

In [104]:

buf_read = IOBuffer()
g_doc_stripped = strip_info(g_doc)
Mongoc.write_bson(buf_read, g_doc_stripped )
buf_read_start = seek(buf_read,0)
BSON.@load buf_read_start g

And we can verify it:

In [105]:

g(5)
10

Putting it together and loading code remotely:

The real verification is to run these bottom cells on another computer, or at least a new session, and then run this code:

In [1]:

using Mongoc
using BSON

function write_symbol(symbol)
    buf = IOBuffer()
    BSON.@save buf symbol
    bufs=seek(buf, 0)
    k = Mongoc.read_bson(bufs)
end

strip_info(doc::Dict) = filter( kv->kv[1]!="_id", doc)
strip_info(doc::Mongoc.BSON) = Mongoc.BSON( strip_info(Dict(doc)) )
function load_symbol(g_doc::Mongoc.BSON)
    g_doc_stripped = strip_info(g_doc)
    buf_read = IOBuffer()
    Mongoc.write_bson(buf_read, g_doc_stripped )
    buf_read_start = seek(buf_read,0)
    BSON.@load buf_read_start g
    g
end
load_symbol (generic function with 1 method)

In [2]:

client2 = Mongoc.Client(
    "mongodb+srv://infer:infer@codedump-pmluz.azure.mongodb.net/test?retryWrites=true")
c = collect(client2["mydb"]["collection_func"]);
g_doc = c[2] # I checked
g = load_symbol(g_doc)
g(7)
14

The user:password combination infer:infer is a public read-only account, so you could run this code yourself… if you trust me enough to download and execute arbitrary code, which you really shouldn’t. There are plenty of security holes with this type of paradigm. Modern web applications are constantly sending around Javascript code to be executed on your computer, but the browser has “some” notion of “security”. There is none here; arbitrary Julia code is loaded into your interpreter. A system using this type of code storage needs to carefully secure write access to the server.