HOW IT WORKS

A library uses the LOCKSS software to turn a low-cost PC into a digital preservation appliance that performs four functions:

  • It collects newly published content from the target e-journals using a web crawler similar to those used by search engines.
  • It continually compares the content it has collected with the same content collected by other appliances, and repairs any differences.
  • It acts as a web proxy or cache, providing browsers in the library's community with access to the publisher's content or the preserved content as appropriate.
  • It provides a web-based administrative interface that allows the library staff to target new journals for preservation, monitor the state of the journals being preserved, and control access to the preserved journals.

Collecting

Before LOCKSS appliances can preserve a journal, two things have to happen:

  • The publisher has to give permission for the LOCKSS system to collect and preserve the journal. They do this by adding a page to the journal's web site containing a permission statement, and links to the issues of the journal as they are published.
  • The appliance has to know where to find this page, how far to follow the chains of web links so that it doesn't crawl off the edge of the journal and try to collect the whole Web, some bibliographic information, and so on. In order to add new publishing platforms, The LOCKSS system provides a fill-in-the-blanks tool that a librarian or administrator can use to collect this information, and test that it is correct. The information is then saved in a file (the LOCKSS plug-in) and added to the publisher's web site or to some other plug-in repository, so that it is available to all LOCKSS systems.

Preserving and Auditing

The LOCKSS appliances at libraries around the world use the Internet to audit, continually but very slowly, the content they are preserving. At intervals appliances take part in polls, voting on the digest of some part of the content they have in common. If the content in one appliance is damaged or incomplete that appliance will lose the poll, and it can repair the content from other appliances. This cooperation between the appliances avoids the need to back them up individually. It also provides unambiguous reassurance that the system is performing its function and that the correct content will be available to readers when they try to access it. The more organizations that preserve given content, the stronger the guarantee they each get of continued access.

Providing Access

LOCKSS appliances provide transparent access to the content they preserve. Institutions often run web proxies, to allow off-campus users to access their journal subscriptions, and web caches, to reduce the bandwidth cost of providing Web access to their community. Their LOCKSS appliance integrates with these systems, intercepting requests from the community's browsers to the journals being preserved. When a request for a page from a preserved journal arrives, it is first forwarded to the publisher. If the publisher returns content, that is what the browser gets. Otherwise the browser gets the preserved copy.

Administering

Library staff administer their LOCKSS appliance via a Web user interface. A demonstration version of the interface is available. It allows for targeting the appliance to preserve new journals, monitoring the preservation of existing journals, controlling access to the appliance and other functions.

For technical details see http://www.lockss.org/pub-wiki/