Riak

Riak is a key-value, document-oriented datastore that we are using as a cache tier, making the overall site significantly faster for web users. Since each page on the website would ordinarily pull data from multiple fedora (XML) datastreams, run various XSL transformations, and may also contain several SparQL queries simply to render a page, we have instead cached all of that data in a single JSON object, stored in riak. This way, when a user requests a certain page, all of the structural, descriptive and access-control metadata is available in a single location, already in the desired format.

Hydra already does this by default, but it uses Solr to store the data. Riak provides a number of advantages over the Solr-as-document-store model, including the ability to run map-reduce commands over arbitrary sets of objects as well as the ability to store image data (for derived images).

Riak is typically run in the context of a multi-server cluster, and ours is composed of four nodes. The cluster is “masterless” and highly fault-tolerant, meaning that any node in the cluster can fail without affecting the system. If multiple nodes fail (or if the entire cluster fails), then the system reverts to generating the requisite data directly from Fedora: this is slower, but the site will continue to work for users. Likewise, if Fedora must be shut down, most users will not notice a change, as most data is pulled directly from riak anyway.

cache.txt · Last modified: 2014/01/10 14:24 by acoburn
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International