Stanbol Entityhub

Stanbol is a piece of software for doing all sorts of crazy things with RDF data. At present, we have loaded the LC authority records (subject, name, genre) into the system as well as the GeoNames data. This allows us to map string-based metadata to unambiguous URIs, which can be dereferenced from this local cache.

The entity hub queries support LDPath sub-queries. For instance, a query for 'Amherst' in geonames, will respond with numerous results, but using an LDPath program to add more structure to the results is helpful:

@prefix gn: <http://www.geonames.org/ontology#>;
label = rdfs:label :: xsd:string;
parent = gn:parentFeature / rdfs:label :: xsd:string;
curl "http://localhost:8080/entityhub/site/geonames/find?name=Amherst&limit=10&ldpath=%40prefix%20gn%3A%20<http%3A%2F%2Fwww.geonames.org%2Fontology%23>%3B%20label%20%3D%20rdfs%3Alabel%20%3A%3A%20xsd%3Astring%3B%20parent%20%3D%20gn%3AparentFeature%20%2F%20rdfs%3Alabel%20%3A%3A%20xsd%3Astring%3B"

Returns a response like so:

{
  results: [
    {
      id: "http://sws.geonames.org/5884469/",
      label: "Amherst",
      parent: [
        {
          type: "value",
          xsd:datatype: "xsd:string",
          value: "Amherst"
        },
        {
          type: "value",
          xsd:datatype: "xsd:string",
          value: "Canada"
        },
        {
          type: "value",
          xsd:datatype: "xsd:string",
          value: "Laurentides"
        },
        {
          type: "value",
          xsd:datatype: "xsd:string",
          value: "Québec"
        }
      ]
    },
    {
      id: "http://sws.geonames.org/4929023/",
      label: "Amherst Center",
      parent: [
        {
          type: "value",
          xsd:datatype: "xsd:string",
          value: "Hampshire County"
        },
        {
          type: "value",
          xsd:datatype: "xsd:string",
          value: "Massachusetts"
        },
        {
          type: "value",
          xsd:datatype: "xsd:string",
          value: "United States"
        }
      ]
    }
  ]
}

One can also use LDPath for extracting the desired fields from queries, such as a `skos:prefLabel`:

prefLabel = skos:prefLabel :: xsd:string;
curl "http://localhost:8080/entityhub/site/lcnames/find?name=Hitchcock,+Edward*&limit=10&ldpath=prefLabel%20%3D%20skos%3AprefLabel%20%3A%3A%20xsd%3Astring%3B"

With results such as:

{
  results: [  
    {  
      id: "http://id.loc.gov/authorities/names/nr00005065",
      prefLabel: [  
        {  
          type: "value",
          xsd:datatype: "xsd:string",
          value: "Hitchcock, Orra White, 1796-1863"
        }
      ]
    },
    {  
      id: "http://id.loc.gov/authorities/names/n90674676",
      prefLabel: [  
        {  
          type: "value",
          xsd:datatype: "xsd:string",
          value: "Hitchcock, Edward, 1828-1911"
        }
      ]
    },
    {  
      id: "http://id.loc.gov/authorities/names/n79054359",
      prefLabel: [  
        {  
          type: "value",
          xsd:datatype: "xsd:string",
          value: "Hitchcock, E. R. (Edward Robert)"
        }
      ]
    }
  ]
}

Individual URLs can be dereferenced, either by accessing the resource at the identified location, or by querying the Stanbol cache:

curl "http://localhost:8080/entityhub/site/geonames/entity?id=http://sws.geonames.org/4929023/"

By default, this returns application/json, but by using Accept headers, most RDF serialization formats can be requested.

Building an Entityhub site

The Apache Stanbol project provides documentation for indexing a custom vocabulary/dataset into an entityhub site (e.g. Geonames or Generic RDF).

For Geonames, you will need a lot of RAM. For most generic RDF data, 1GB should be sufficient. The resulting ZIP files should be placed in $STANBOL_HOME/datafiles and the JAR bundle can be installed through the web-based system console.

stanbol.txt · Last modified: 2016/10/21 15:32 by acoburn
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International