Loading a large json file into elixir's ETS (Erlang Term Storage) Cache using Jaxon

Karthik D 13/4/2021

Elixir's ETS cache is an in-memory data store that can be accessed across processes. It can be used to build lookup tables such as geocoders, translators etc. in web applications.

For an application I was building, I had to load a large dataset (~500k rows) of geocodes into an ETS table before the webserver endpoint starts. This data can be shared across all the processes that handle incoming requests.

First attempt: Load the file and then parse

At first, I attempted loading the file into memory, then parsing it with Jason and...

# here my json is one single root object with key-value pairs
def load_file(filename, tablename) do      
  :ets.new(tablename, [:named_table])
  with {:ok, body} <- File.read(filename), {:ok, json} <- Jason.decode(body),
  do: load_from_map(json, tablename)

defp load_from_map(parsed_map, tablename) do :ets.new(tablename, [:named_table]) for {k,v} <- parsed_map do :ets.insert(tablename, {k,v}) end end

It worked, but it took quite a while and hogged quite some RAM. My machine with 4GB RAM froze for about a minute.

Streaming to the rescue

At this point, I thought there could be a better way to do this, may be something that doesn't involve reading the entire file into memory. That's when I found Jaxon, a streaming JSON Parser. So now the file is opened as a stream and the JSON is parsed as the stream is being read. Pretty neat right?

# here my json is an array of objects {"k":<key>,"v":<value>}
def load_file(filename, tablename) do      
  :ets.new(tablename, [:named_table])
  |> File.stream!()
  |> Jaxon.Stream.from_enumerable()
  |> Jaxon.Stream.query([:root, :all])
  |> Stream.each(fn (kv) -> :ets.insert(tablename, {kv["k"],kv["v"]}) end)
  |> Stream.run()

At first this didn't seem to work and I was disappointed until I realized I my JSON wasn't pretty and was just a single line. I generated a multi-line pretty JSON and voila! It worked!