Linked Open Data Publishing
I worked with Swirrl, Department for Trade and Investment (DIT) and Office for National Statistics (ONS) to develop a search engine for trade data.
The landscape for trade data is a confusing mesh of overlapping datasets. As part of the Integrated Data Programme, the ONS collated and standardised these datasets using the RDF Data Cube ontology. Swirrl made these datasets available online via it’s Linked-Open-Data platform PublishMyData. We worked with the DIT to design a search engine capable of finding observations inside the datasets.
I designed the architecture and built a Clojure backend to consume billions of triples from a SPARQL endpoint, convert them into JSON-LD and store them in ElasticSearch together with APIs for serving the results up to a ClojureScript frontend.
We designed a natural language tool capable of finding specific facts (statistical observations) to help answer questions from policy-makers. The search engine could thus help to fill the role of a subject-matter expert.
This work was presented at DataConnect in 2022: searching for data, not just for datasets.