Amherst College Digital Collections Wiki

This is a wiki for the Amherst College Digital Collections repository. Its primary purpose is to describe some of the implementation details of the various components that make the site function. Any specific questions can be sent to Aaron Coburn.

Fedora Implementation Details

Content Models - How our data is modeled

Personal Collections - Settings for how users' personal collections work

Access Controls - Users and groups, including which collections/datastreams each should be able to access.

External API - providing external access to fedora's RESTful API.

Searching with Solr

Search Settings - Specific settings related to how the searching and indexing works

External API - providing external access to Solr's API.

Using Riak as a document store

Document Cache - Riak is a distributed, high-performance key-value (NoSQL) store that supports parallel map-reduce queries. Most user queries retrieve data directly from Riak rather than accessing Fedora, making the entire system significantly faster and more fault-tolerant.

Application Messaging

Fedora has an embedded ActiveMQ messaging system that is highly configurable. This means that, when a fedora object changes, any number of related systems can be alerted to that change asynchronously: all with no blocking, waiting or other synchronization going on that would cause the system to slow down or become unresponsive for users. This also means that the system will be eventually consistent (there are a few seconds of lag built in by design), but it also means that the different components can easily be distributed across multiple hosts.

Broker Configuration - Our fedora messaging broker is linked to a remote broker cluster for higher availability. Furthermore, instead of publishing messages on ActiveMQ topics, they are pushed onto queues. This means that the message routing application will never miss a message.

Message Routing - Message routing is handled by Apache Camel, which means that I no longer write any code when integrating different components over the ActiveMQ messaging system.

Routing Container - Camel can run in any JAVA container, but I chose Karaf, because it makes deployment extremely easy.

Web Interface

The web interface is written using Backbone.js, which allows the site to run as a “single-page web-app” – even though the URL appears to change (thanks to the magic of HTML5 pushState()). Using Backbone.js also means that the JavaScript is nicely structured into Views and Models: every view is decoupled from every other view. Each View manages the UI events that take place in its own DOM region, while also acting as observers to the various models. Models are each tied to particular RESTful endpoints (implemented in Node.js).

Cool URLs - making the URLs look friendly and platform-agnostic, all while remaining persistent and bookmarkable.

API JSON Structure - the expected JSON structure for search queries

Namespaces - namespaces in use throughout the repository

Displaying Metadata - metadata fields

Linked Open Data

The entire site uses RDFa to publish schema.org attributes. We have developed a mapping from standard MODS metadata to populate these attributes in the HTML.

We are also using the OpenGraph protocol, primarily for better integration with social media sites, such as Facebook, Google+, Twitter, etc.

RDF - Details on the ontologies in use

Stanbol Entityhub - We are using a local Stanbol entityhub with LC authority records, allowing us to fix and/or enhance the existing MODS metadata.

Sparql Endpoint

start.txt · Last modified: 2014/01/10 14:12 by acoburn
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 3.0 Unported