
System Simulation / Aquarelle
Aquarelle
Aquarelle - Networked Cultural Information
George Mallen , System Simulation Ltd, United Kingdom Mike Stapleton , System Simulation Ltd., United Kingdom ichim99 - Cultural Heritage Informatics Session: Archives and Museum Informatics Abstract Aquarelle was a European Union assisted project designed to provide an information retrieval service for searching across different cultural database systems with differing database architectures. A broker architecture was implemented in which a central Access Server received queries from user-clients, distributed these to defined remote data servers, collected the results and passed them back to the user-client. Each query passed through a series of transformations as they were encoded in various protocols. The principal protocols were HTTP, Z39.50 and SGML and the local protocols used at the data servers.There were two main types of data server, Archive Servers, containing primary information about objects and sites, and Folders, containing authored information usually as SGML documents. The project was a collaboration involving both technical and data supply partners from the European museum, gallery and monuments communities. Its technical development and demonstration was over the period 1996-1998 preceded by a 1 year feasibility and proposal development study. This paper describes the project and speculates on its future application in networked cultural information systems. Authors George Mallen is founder and managing director of System Simulation Ltd one of the technical partners in the project. Mike Stapleton is Technical Director of System Simulation Ltd and responsible for SSL's Contribution to the project. 1. The Aquarelle Project Policy makers in advanced economies are increasingly aware that the Internet is a potentially powerful infrastructure for disseminating cultural and educational content. The reasons for wanting to develop and disseminate such content lie perhaps in the realisation that, in a world of increasing globalisation, it is important to anchor a sense of identity in local and regional cultures but also to encourage an appreciation of cultural diversity and an understanding of the value and interdependence of different human cultures. Such a view counterbalances a more prevalent fear that the Internet is an agent of globalisation and will lead to cultural homogeneity rather than diversity. The technology itself is neutral on these matters and the direction eventually taken will depend on the successes and failures of emerging policies and projects. For those of us who believe the internet is potentially a major new source and medium for intercultural learning the onus is to create the technology to make it happen. Aquarelle is one such effort. Its full title is "Aquarelle: Sharing Cultural Heritage through Multimedia Telematics" and it was an R&D project partly supported by the Telematics Application Programme of the European Union. It was initially set up through a close collaboration of public institutions in four countries, namely France, Greece, Italy and the UK. The project eventually involved 22 partners from institutions and technical organisations from these countries. The project was co-ordinated by ERCIM (The European Consortium for Research in Informatics and Mathematics). This is a consortium of the main national research laboratories in these subjects and the ERCIM member which ran the project day to day was INRIA, ( Institut Nationale pour Research en Informatique et Automatique) near Paris. The project website is maintained at INRIA at http://aqua.inria.fr. The Aquarelle partnership designed and demonstrated an information system offering access to varied cultural data repositories mostly held by public bodies but also some private. A major challenge which had to be addressed was the requirement to provide access to legacy data which had been created well before the emergence of the Internet in its present form and which used very varied database systems. The digital information was also heterogeneous, ranging from databases with very different schemas, with very different terminologies, very different document types such as multimedia presentations on CD-ROM, ordinary office documents created by various word processors, HTML documents created for use on the WWW, (Dawson, 1996). Given this variety of source material and technologies the project's goal was to allow culture professionals such as museum curators, urban planners, commercial publishers and researchers to collect information relevant to their needs or interests notwithstanding the information location and organisation. In addition any author of a given information component should be able to link his or her work directly to other information assets created by other authors. Linking, annotating and commenting on relevant pieces of information from different sources will add considerable value to the content and the overall Aquarelle architecture was designed to relieve users from the cumbersome manual tasks of maintaining cross-references while also supporting high precision referencing and retrieval. Aquarelle defined two main sources of cultural information, namely, Archive Servers and Folders. Archive Servers contain primary material like museum object catalogues, associated images, architectural drawings of historic buildings, maps or text corpora. They provide information about individual objects or sites. The model of an Archive Server is designed so that an existing museum collection documentation system, a photograph library catalogue or a data service system could act as an Aquarelle Archive Server, they are expected to be able to return a record about each object or site held in the database. They follow a conventional information retrieval model for database access. The mapping between the Aquarelle access points and the typically larger number of actual fields in the host database is carried out at the Archive Server. This enables content specific knowledge of the content owners to be applied to give the most appropriate mapping. It also simplifies the task of maintaining mappings. Aquarelle Archive Servers respond to the queries from the Access Server and return the results of the searches. The dialogue between the two is mediated by the Z39.50 protocol. Folders are secondary or derived material describing, commenting on, linking primary material, for example multimedia essays on cultural topics. Thus Folders are containers for semantically linked archive data and Folders can themselves be linked. They are SGML hypertext documents typically providing information relating to groups of objects (Bounne et al,1997). Aquarelle provides a unified interface for finding and browsing folders in conjunction with object information. The folder DTD includes a simplified set of elements providing metadata describing the contents of the folder. This metadata can be searched directly in the same manner as object data held on Archive Servers. In many respects Folder Servers and Archive Servers are treated simply as data servers serving different types of content. They both communicate with the Access Server via Z39.50. Folders themselves are returned as SGML documents encapsulated in GRS-1 record syntax. Hyperlinks between folders and records held on Archive Servers a remediated by the Access Server. The Aquarelle Z39.50 profile has elements for establishing and following such links. The project was successfully demonstrated at the end of 1998. The final architecture allowed users to access Archive and Folder servers in the four countries using standard web browsers via the central Access Server. The communication between the access server and the data servers uses the Z39.50 protocol with a profile developed in collaboration with CIMI, the Consortium for the Interchange of Museum Information. This profile describes both functional aspects of the protocol and also the access points available for querying. 2. The Aquarelle Access Server The Access Server lies at the heart of the Aquarelle system. It provides search, retrieval and presentation mechanisms to allow the information held in the varied databases accessible by the system to appear to the user as a coherent set of web pages. The Access Server receives queries via the user's web browser and broadcasts them in a suitable form to the selected data servers. It collates the responses and returns them to the user interface module. It also retrieves folders and manages the links between folders and between folders and the archive records. The Access Server also provides a range of central services. It controls access to the Aquarelle system through the user management functions which include the storage and manipulation of user profiles. It supports the services provided by the user client, namely resource discovery, query handling, result management, folder publication and one-to-one connections with data servers through specific functions. It provides a uniform interface to Archive and Folder Servers based on the search and retrieval protocol Z39.50. It also provides an interface with a thesaurus browser to assist users to select search terms. Finally it guarantees the consistency of hyperlinks in folders through a link management subsystem. Thus the Access Server embeds a user-client server, a user session manager, a Z39.50 client and a link management subsystem. The user-client server comprises a Web server and a set of CGI programs which process the user requests, invoke the corresponding functionality in the access server, encode the returned data before sending them to the Web browser. The user interface is a set of static and dynamic HTML pages. The static pages are accessed directly through the Web server, the dynamic pages are generated on the fly by the CGI programs. To prepare a query the user is presented with a set of HTML forms. The form is submitted by HTTP and interpreted by CGI scripts in the user-client module which converts the query to the AQL (Aquarelle Query Language) and passes it to the User Session. The user session then has the opportunity to modify the query, for instance to apply various terminology resources to translate or expand terms. The modified query is still expressed in AQL and passed to the Z Client. The Z Client converts the query from AQL to Z39.50, using the Aquarelle profile, and broadcasts it to the currently selected data servers. Each data server interprets the native protocol and responds to the Z Client with a GRS-1record. The GRS-1 record can contain structured data or SGML documents. The Z Client collates the responses and encodes them as SGML, if necessary, and returns them to the user session. In order to display folders the Web browser must be capable of displaying SGML documents. This can be done in the current generation of browsers with the aid of an appropriate "plug-in". The emerging generation of browsers are XML compatible and so able to present conforming Aquarelle folders without such plug-ins. Note that contemporary developments in this area, such as the Dublin Core project, (see Bibliography) define metadata records which can act as a surrogate for the primary object or digital record. Several data services are based on holding surrogate records for the primary data records centrally and searching those rather than the primary data, for example SCRAN, ADAM, ELISE and VAN EYCK (seeBibliography). The Aquarelle Access Server does not hold surrogate records at the object level. It can hold surrogate records describing complete archives for the purpose of providing finding aids and other directory services. However the set of access points can be seen as defining a virtual record which can act as a surrogate for the primary data when querying. Archive Servers can be implemented using either a gateway or a dedicated server holding surrogate records. Hence data services based on holding surrogate records, such as those mentioned, fit easily into the Aquarelle architecture where they act as Archive Servers. 3. The User Model Aquarelle provides users with a common vocabulary, including a set of access points, for phrasing queries that can be used to search across different database systems with different data architectures. The Aquarelle cultural partners use a large number of fields to store the data in their respective databases[1]. Individual databases may use over two hundred different fields to store the information at the level of detail required by the specific requirements of the host institution. Furthermore, different institutions, particularly when from different countries, have different approaches to structuring the data. Early research showed that there was little commonality at this level. Searching these databases using the native fields as access points provides the highest search precision possible but can lead to frustratingly low recall if the user is not familiar with the dataset. Although aimed at professionals, users cannot be expected to have detailed knowledge of the target datasets. Accordingly, services like Aquarelle need to provide higher level access points for specifying queries. This approach improves the recall at the cost of reduced precision. The common set of access points are mapped to the target datasets to perform the actual query. At an Aquarelle project workshop the desirability of multiple sets of access points, particularly in a hierarchy, was raised by representatives of certain user communities. At the workshop the following generic levels of description were identified, characterised by the approximate number of access points. 1 "Justsearch", regardless of access points and data types. This is the classic free-text retrieval approach. <10 Eg Who, What, Where, When; an appropriate starting point for low precision queries for public access or where the researcher has little knowledge of the subject domain. 20-30 Typical of general metadata schemes such as Dublin Core and CIMI (see Bibliography). Of use to researchers with a reasonable knowledge of the subject domain. Typical of Library and Archive Management systems. Approx100 Typical of core data standards such as CIDOC and SPECTRUM >200 Actual number of fields in use in full scale collection documentation systems. Useful to researchers with detailed knowledge of the dataset. In order to provide a set of access points as the basis of the common model of searching, the fields used by the cultural partners were mapped to various independently defined reduced sets of access points including those defined by CIMI, the Consortium for the Interchange of Museum Information in the CHIO, Cultural Heritage Information Online Project (see Bibliography); the Museum Documentation Association in the SPECTRUM Museum Documentation Standard; and the CIDOC Core Data Standard for Archaeological Sites and Monuments. Project CHIO was of particular relevance, it aimed to support a similar though narrower constituent community to Aquarelle. Approximately half of the fields used by the Aquarelle cultural partners could be mapped to 20 distinct access points from the Project CHIO set. Although this set a limit to the precision with which queries could be made, it was felt appropriate considering the state of knowledge of user requirements and other developments at the time. The main areas of difference came about because Project CHIO primarily addressed object information while Aquarelle was also concerned with information relating to sites and buildings. Further work, revising the mapping and adding some additional access points to the Project CHIO set, resulted in a set of access points for the Aquarelle system. Through liaison with CIMI the access points were revised to meet the Aquarelle requirements. 4. Interoperability Interoperability is an important aspect of the Aquarelle project. There is currently considerable activity in the area of providing "finding aids", metadata and unified access methods for digital resources, (Bouthors et al, 1997; Taylor & Stapleton, 1997; Weibel et al,1995). Z39.50 has emerged as one of the important technologies. It seemed in the best interest of Aquarelle and other projects using Z39.50 that there is a high degree of compatibility between their profiles, at least in so far as they address the heritage sector. In order to use Z39.50 as the communication protocol between the Access Server and the data servers a profile must be defined which identifies the subset of the protocol that will be used. The profile specifies implementation-dependent aspects of the protocol, and adds semantics to the encoding. The profile specifies various low-level characteristics that govern how the communication between servers and clients using the protocol takes place, together with higher level attributes that govern what information can be exchanged in this way. The high-level attributes cover: o The access points that identify the units of information that can be queried. o The query functionality o The content and the structure of the information returned. The Aquarelle Z39.50 Profile defines a "brief" record, giving a selection of data elements for use in a summary display of results, and a "full" record, containing all the data elements the data server is prepared to provide. It is essential that Aquarelle clients and servers implemented by different bodies are able to interoperate with one another. The Aquarelle Z39.50 Profile primarily addresses this central requirement. For Aquarelle purposes alone the profile could be restricted to those facilities that are needed to support the user interface. Unusually in a Z39.50 context, there is only one client for the Aquarelle Z39.50 Profile, namely, the Access Server. The Aquarelle Profile defines the Z39.50 entities that the Access Server will generate and what it will accept in return. It is not necessary for an Aquarelle data server to handle any entities other than those handled by the Access Server. However, in order to increase the acceptability of the Aquarelle system and, hence, the availability of data servers, it was thought desirable that the Aquarelle Access Server could connect to existing servers and that data servers configured for Aquarelle could serve other uses. This would be achieved by adopting a profile compatible with other profiles in the heritage sector. Earlier work had already identified the relevance of the CIMI Project CHIO (see above) and it was decided to work with CIMI to produce a joint profile that would serve the needs of both communities. Aquarelle uses the CIMI Z39.50 Profile (see Bibliography) subject to an "Implementers' Agreement" which further defines certain aspects. Interoperation with non-Aquarelle systems has two aspects: 1. Compatibility with non-Aquarelle servers For an existing Z39.50 server to be able to act as an Aquarelle server the Aquarelle profile must be a subset of the server profile or at least the Access Server must be tolerant of unsupported elements in the target server. 2. Compatibility with non-Aquarelle clients For an Aquarelle server to be able to act as a server to other Z39.50 clients it must make an acceptable response to all requests issued by the client. Experience from various projects using Z39.50 to provide a common means of access to heterogeneous databases shows that full interoperability cannot be assumed until the systems have been tested against one another and in conjunction with other systems. The Aquarelle project's use of the CIMI Profile increases the likelihood of successful interoperation without undue effort. A further common experience, shared by the Aquarelle project, is that once the technical aspects of interoperation are resolved a new set of issues arises to do with the interpretation of the access points. The detail of the mapping between the host data set and the common data model implicit in the profile will inevitably need to be tuned once the results of searching across diverse databases are visible. This cannot be done by the technical partners as it requires the development of a common understanding within the user and content-owning community. The Aquarelle approach puts the responsibility for these mappings with the Archive Servers with the aim of empowering the content owners to maintain the mappings in response to discussion amongst the user community. 5. Next Steps The Aquarelle project has produced a functioning system which met the original objectives and demonstrated the technical viability of such systems. The cultural partners provided a vital contribution by hosting data servers and by providing user requirements and feedback. The project was successfully reviewed by the EU review processes and now looks to future exploitation. So what form will that take? We must note that, during the period of the project, the Web has grown very quickly and both cultural and educational, as well as e-commerce applications, are much more common and understood than they were at the time the Aquarelle project was conceived. Thus, given the technical success of the demonstrator, the question to be addressed is whether the original vision is still relevant in a much changed computing environment? The answer is an unequivocal affirmative. The overall principle of a broker architecture is emerging rapidly as a generic model for the effective exploitation of resources on the Internet. Resource directories and specialist portals are now fairly common in education and e-commerce. The need within the cultural sector for Aquarelle type systems is, we believe, more pressing now than it was 5 years ago. More and more museums are implementing collection information systems on different database platforms. For example in the UK government policy is demanding that museums have well found IT systems for documentation support as integral parts of museum quality assurance and registration processes. In other countries, particularly as Europe expands to the East, museums and galleries are, and will be, busy creating digital assets. The Aquarelle vision was designed precisely for this situation. By providing coherent interlinking and common access interfaces it would catalyse the accessibility of cultural information via expanding network technologies and make information available to colleagues in other cultural institutions and the education sector. Note that the goal of the Aquarelle system was support for culture sector professionals, not the public or basic educational applications. Yet the hope was to facilitate access to primary object information so that information could be used in a wider context and be incorporated into descriptive content in folders which might then become source material for further interpretation for public and general educational application. The Aquarelle project demonstrated that the technology infrastructure for this approach could be made to work. Its goal now must be to demonstrate that the "trickledown" information transformation process from object descriptions to folders, which the technology facilitates, is of real benefit to culture sector professionals in their work of care and scholarship but also provides a growing pool of new source material which can be used by authors, teachers, publishers and artists to create further interpreted, added value content for public access and education. Proposals on how to test the information process model are under consideration for presentation to the new Framework 5 programme of the European Union. Though not yet complete these will almost certainly comprise three main components, first a further technical step to make a version of the Access Server which can be distributed free, possibly on an "open source" basis, second, to build an information provider community committed to extensive experimental use of the system to explore the use of archive and folder content in their professional activities, and third, to involve a further community of authors, teachers, publishers and artists to evaluate the use of the archives and folders for the creation of material suitable for public access and educational use. Acknowledgements The authors gratefully acknowledge the material and assistance provided by the Aquarelle partnership, particularly the support of Alain Michard, the project co-ordinator. The views and opinions are however our own and do not necessarily represent the collective view of the Aquarelle partnership. Bibliography ADAM, The Information Gateway for Art, Design, Architecture and Media. http://www.adam.ac.uk Aquarelle, the project website is at http://aqua.inria.fr Bounne C and Vassilis Christophides, Martin Doerr, Eddy Fras, Irene Fundulaki, A Kementsietsides, Y. Velegrakis. (1997). Aquarelle Folder Server and Editor: technical description. Aquarelle Deliverable 6.4, available from the Aquarelle document repository. Bouthors V and Jean-Yves Dupuis, Nhan Tran Huu. (1997). Z39.50 Gateway for Mistral: technical description. Aquarelle Deliverable 5.2, available from the Aquarelle document repository. CIMI Profile Development Working Group. (1996). The CIMI Profile: Z39.50 Application Profile Specifications for Use in Project CHIO. Available at http://lcweb.loc.gov/z3950/agency/profiles/ ftp://ftp.cimi.org/pub/cimi/CIMI_Profile/ Dawson, D (1996). Data Structures in Cultural Databases: A survey. Aquarelle Deliverable 5.5, available from the Aquarelle document repository. Dublin Core Metadata Element Set: Reference Description, revision of January 15,1997. Available at http://purl.org/metadata/dublin_core_elements ELISE, Electronic Library Image Server for Europe http://severn.dmu.ac.uk/elise/ EC.DGXIII,Project1008. SCRAN (Scottish Cultural Resources Access Network). http://www.scran.ac.uk Taylor M and Mike Stapleton.(1997). Z39.50 version of Index+: technical description. Aquarelle Deliverable 5.3, available from the Aquarelle Document repository. VANEYCK http://www.hart.bbk.ac.uk/van_eyck.html, EC.DGXIII,Project1054. Weibel S and Jean Godby, Eric Miller, Ron Daniel. (1995). OCLC/NCSA Metadata Workshop Report. Available from http://www.oclc.org:5046/oclc/research/conferences /metadata/dublin_core_report.html Technical information The Aquarelle project built a prototype information retrieval service for searching across different cultural database systems with differing database architectures. A broker architecture was implemented in which a central Access Server received queries from user-clients, distributed these to defined remote data servers, collected the results and passed them back to the user-client. Each query passed through a series of transformations as they were encoded in various protocols. The principal protocols were HTTP, Z39.50 and SGML and the local protocols used at the data servers. There were two main types of data server, Archive Servers, containing primary information about objects and sites, and Folders, containing authored information usually as SGML documents.
Publication: G. L. Mallen and M. J. Stapleton, System Simulation, Ltd., UK: "Aquarelle : Networked Cultural Information" in Cultural Heritage Informatics 1999: selected papers from ichim99, Edited by David Bearman and Jennifer Trant. |