1. CWA Part 2 - Federated Terminological Resources
Contents
- CWA Part 2 - Federated Terminological Resources
- Semantic Interoperability Management: Principles
- Terminological Resource Network: Realization and Integration
- Introduction
- Internal Representation, Interconnection & Integration
- Public Services
- Syndicated Content
- Practical FAQ
- What is the easiest way of finding synonyms using the TRN, independently of the language?
- What is the difference between Superclass and Subclass relationships?
- Will getSimilarTerms also contain all elements of getSynonymTerms?
- How may I access the TRN and its API methods?
- May I issue a direct query to the underlying ontology database?
- What is the "id" of a term?
- Demonstrator Implementation
- Conclusions
- Annex
1.1. Semantic Interoperability Management: Principles
1.1.1. Introduction
In order to realize a universal access to eGovernment resources, the area of interoperability between information systems is a key pillar that needs to be achieved. Different authorities usually use different terms to describe resources, different interfaces to publish them and different ways of semantics to understand and interpret data that has been exchanged. Bridging this gap is a complex task that is to be achieved by eGov-Share. This section focuses on interoperability in the area of terminology. Just imagine two eGovernment services, one being located in the Netherlands and one in Germany and being related to the traffic area. Both might be classified based on a written category name. While the Dutch would be located in the area of the “Ministerie van verkeer en waterstaat” , the German one would be located in “Bundesministerium für Verkehr, Bau und Stadtentwicklung” . Obviously, if a person searches for specific services, he may not find them because of the different terms that are used. The same service might therefore be described using different terms for the same concepts. The well known semiotic triangle visualizes this relationship.
In addition to using different terms for the same service, the example that has been mentioned before also shows the possibility of having terms that are not 100% identical but only similar or overlapping in their meaning. For example, the two categories “Ministerie van verkeer en waterstaat” and “Bundesministerium für Verkehr, Bau und Stadtentwicklung” might contain the same service provided by the corresponding Government in order to monitor the current traffic situation on popular car routes. Those two services might be called “verkeersinformatie” and “verkehrsinformation”. Those two terms might therefore be considered to be in an “is equivalent” relationship since they are expressing the same element. However, the corresponding terms for the categories cannot be considered to be identical. While “Ministerie van verkeer en waterstaat” can roughly be translated into “Ministry for Traffic and Water Affairs”, the other one can be translated into “State ministry for traffic, construction and city development” and it therefore covering different concerns as displayed in the following figure.
1.1.2. Principles
In order to handle the challenges described in the last subsection, a Terminological Resource Network (TRN) can be used that allows the specification of relationships between terms. This will increase the semantic interoperability. For example, the terms mentioned in the last subsection could be in a “is equivalent” or in a “is similar” relationship. The overall goal is to define a data model that can be used as a base to achieve the following scenario: Precondition: It is assumed that several eGovernment Resources have been indexed and are listed in a joint registry as specified in Part 1 of this CWA. It is also assumed that they are described with different terms that are listed within a term registry that will be referred to as the ‘Terminological Resource Network’. For details, users may refer to CWA 15526 published by the CEN Workshop “European Network for Administrative Nomenclature” in May 2006.
- The user is using an international eGovernment Resource Registry as defined by Part 1 of this CWA in order to find a specific eGovernment Resource. The users is therefore typing in a search term, called “traffic”
- The system therefore automatically contacts the TRN to ask for additional terms that are identical or similar to the term “traffic”. The TRN will answer with a list of synonyms that has been compiled by analysing all terms and their relationship: traffic [EN]; verkehr [DE]; verkeer [NL].
- The eGovernment Resource Registry system may use this information to not only return eGovernment resources that are described with the term “traffic” but to also suggest resources that have been classified by the terms “verkehr” or “verkeer”. Depending on the languages of the users only those resources will be highlighted that are described in at least one of his languages. Of course the user may extend then search to include more languages at any time.
- The result is that the user will receive a list of all suitable eGovernment Resources independent of languages or terms that have been used to describe this resource.
In order to make this scenario possible, it is required
- to define how terms may be specified and related to each other in a flexible way and
- to define and maintain a significant number of terms in the TRN
While the first part will be defined in this section, the second part will be described in the following section called “Terminological Resource Network: Realization and Integration”.
1.1.3. Data Model and Specification
In order to manage terms and their semantic relationships, a semantic language will be used for description. The Web Ontology Language OWL is a suitable and powerful approach for describing semantics. As described by the W3C, “OWL is intended to be used when the information contained in documents needs to be processed by applications, as opposed to situations where the content only needs to be presented to humans. OWL can be used to explicitly represent the meaning of terms in vocabularies and the relationships between those terms. This representation of terms and their interrelationships is called an ontology. OWL has more facilities for expressing meaning and semantics than XML, RDF, and RDF-S, and thus OWL goes beyond these languages in its ability to represent machine interpretable content on the Web.” (see http://www.w3.org/TR/owl-features).
OWL allows to specify classes (called concepts) and their relationships as well as instances of classes (called individuals). The collection of all classes that are defined in an ontology is often referred to as a so called T-Box, while the collection of instances are called an A-Box.to be more specific, T-Box statements describe a system in terms of controlled vocabularies such as classes and properties, while A-Box statements are based on the T-Box specifications and contain statements about that vocabulary. In the following paragraphs, we define the model that is necessary to describe and retrieve the requests described in the earlier sections. We will therefore specify the T-Box elements. The following requirements are fulfilled by this:
- The Specification is kept minimalistic in size and simple in its complexity in order to be understandable by implementers
- Because of using OWL for serialization, it is possible to extend the specifications later if necessary.
- The specification needs to be able to express
- Terms (text)
- The natural language of a term
- A relationship to other terms The following specification shows the OWL elements used for this. The detailed OWL is listed in the annex:
What needs to be emphasized is that users will normally not have to deal with this data model since it is widely used internally. Instead of this, users will use the result, which means that they will ask the Terminological Resource Network for a specific term and for its relationships to other terms. Those terms will be stored in the format specified by the model above. Based on this model, a set of terms will be specified in section 2.2, which will also give some more hands-on examples.
1.2. Terminological Resource Network: Realization and Integration
1.2.1. Introduction
This section contains a more hands-on description of instances (A-Box) of terminological resources and their relationship. This is performed by showing real-world examples. This section also contains the description on how existing terminological data sources may be included, especially ebXML RR systems as defined in the ADNOM CWA. In order to create a Terminological Resource Network server nodes need a way to
- Integrate existing or new terms into the TRN
- Define relationships between terms
- Offer an easy way of retrieving a list of all terms stored in the TRN
- Provide a query interface allowing users to retrieve terms that are related to another term
The first two points will be discussed in the following section, while the last two points will be handled in the next section “Public Services”.
1.2.2. Internal Representation, Interconnection & Integration
1.2.2.1. Representation & Term Interconnection
The integration of new terms into the TRN is performed by creating an instance of the data model specification that has been defined in section 2.1. The following code shows an example for this for a term called "City Council".
<Term rdf:ID=" Term17">
<name>city Council</name>
<description xml:lang="en">
This term represents a council of a city
</description>
<language rdf:resource="#enus"/>
</Term>
<Language rdf:ID="enus">
<primaryCode>EN</primaryCode>
<subcode>US</subcode>
</Language>
Terms may be linked to other terms by specifying relationships:
<Term rdf:ID=" Term17">
<name>City Council</name>
<description xml:lang="en">This term represents a council of a city</description>
<language rdf:resource="#enus"/>
</Term>
<Term rdf:ID="Term_18">
<name>Municipality</name>
<language rdf:resource="#enus"/>
<relationship rdf:resource="#Synonym_2"/>
</Term>
<Term rdf:ID="Term_19">
<name>Stadtverwaltung</name>
<description xml:lang="de">Dieser Term repräsentiert die Stadtverwaltung</description>
<language rdf:resource="#dede"/>
<relationship rdf:resource="#Synonym_2"/>
</Term>
<Synonym rdf:ID="Synonym_2">
<destinationTerm rdf:resource="#Term17"/>
</Synonym>
<Language rdf:ID="enus">
<primaryCode>EN</primaryCode>
<subcode>US</subcode>
</Language>
<Language rdf:ID="dede">
<subcode>de</subcode>
<primaryCode>DE</primaryCode>
</Language>
This example has specified three terms, their languages and their relationships.
- The first Term is called "City Council" and has the English language
- The second Term is called "Municipality" and has the German language. In addition to this it as a Synonym relationship to “City Council”
- The third Term is called "Stadtverwaltung" and has the German language. In addition to this it as a Synonym relationship to “City Council”
The following figure visualized the relationship between those terms:
Figure 4: Example Instances
1.2.2.2. Integration of ADNOM Results and 3rd Party Terminological Resources
Of course third party results could be integrated into the TRN by simply importing them with an import tool. However, this would result in a duplicate data management and might now always be desired. The TRN therefore provides an alternative way of adding terms from third party resources. The only requirement is that the term is describable with a unique resource identifier URI. For example, the ADNOM CWA specified the usage of a federated ebXML Registry & Repository service for specifying and connecting terms. Those terms can be referenced by a URI that contains the location of the ADNOM server as well as the ID of the term stored in the ADNOM sever. For an example, let us assume that a term has been specified in an ADNOM server and is referenced with the URI http://www.cen.eu/adnom/23221/term42 We can now specify that a term in the TRN has a relationship with this ADNOM term. For example, we can say that it inherits from this ADNOM term:
<RemoteTerm rdf:ID="RemoteTerm_5">
<name>Region</name>
<url>http://www.cen.eu/adnom/23221/term42</url>
</RemoteTerm>
<Term rdf:ID="Term_6">
<name>State</name>
<relationship rdf:resource="#Inheritance_7"/>
</Term>
<Inheritance rdf:ID="Inheritance_7">
<destinationTerm rdf:resource="#RemoteTerm_5"/>
</Inheritance>
1.2.3. Public Services
The main functionality that is to be provided by the TRN is the capability to query the TRN in order to get terms that are related or synonyms to a specific term. The Webservice interface (see WSDL in the Annex) will therefore provide the following functionalities:
- getAllTerms – This will return a list of all terms contained in the TRN
- getAllTermsByLanguage– This will return a list of all terms for a specific language
- getSynonymTerms– This will return a list of synonyms for a specific term
- getSimilarTerms– This will return a list of similar terms for a specific term
- getOpositeTerms– This will return a list of opposite terms for a specific term
- getSuperclassTerms– This will return a list of superclass terms for a specific term
- getSubclassTerms– This will return a list of subclass terms for a specific term
In order to provide a highest possible flexibility the TRN will provide
- different ways of accessing functionalities
- different formats for receiving query results
1.2.3.1. Ways of accessing the TRN
In order to provide a maximum of compatibility, the TRN provides two different ways of accessing it. The first one is the usage of calling the TRN with invoking a simple URL via HTTP and receiving the output as a direct response. This methodology is often referred to as a RESTful approach (Representational State Transfer) as described in part 1 of the CWA. The second way to access the TRN via SOAP based WebServices as defined by the W3C including the provision of WSDL files for defining the method syntax information.
1.2.3.1.1. REST
The RESTful interface of the TRN allows users to access the TRN by invoking the TRN as follows:
Server: http://www.cen.eu/egovshare/trn/REST
GET {OP}
1.2.3.1.2. WebServices
Over Webservices, the following methods are exposed:
String[] getAllTerms(String format); String[] getAllTermsByLanguage(String code, String format); String[] getSynonymTerms(String term, String lang, boolean returnAllLanguages, String format); String[] getSimilarTerms(String term, String lang, boolean returnAllLanguages, String format); String[] getOpositeTerms(String term, String lang, boolean returnAllLanguages, String format); String[] getSuperclassTerms(String term, String lang, boolean returnAllLanguages, String format); String[] getSubclassTerms(String term, String lang, boolean returnAllLanguages, String format);
Those methods reflect to the corresponding functionalities described at the beginning of this section. String parameters are used to express the term and the language of the term. A Boolean parameter can be used to indicate whether the TRN should return all terms or only those terms that are in the same language as the term that is provided as a parameter. The following call shows an example SOAP request:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:typ=" http://www.cen.eu/egovshare/trn/ws/types">
<soapenv:Header/>
<soapenv:Body>
<typ:getAllTermsByLanguage>
<typ:code>EN-US</typ:code>
</typ: getAllTermsByLanguage >
</soapenv:Body>
</soapenv:Envelope>
1.2.3.2. Output formats supported by the TRN
Similar to the ways of accessing the TRN the output will also be provided in different formats. This will allow a flexible usage of the TRN for many different purposes. More precisely, the TRN provides output formats OWL, Simple XML and XTM.
1.2.3.2.1. OWL
This will return the terms as a result of a SparQL query to the RDF graph that is defined by the terms and specifications in OWL. The result will be compliant to the current W3C recommendation of SparQL Query Results in XML format as described at http://www.w3.org/TR/rdf-sparql-XMLres . An example of this format looks like this:
<?xml version='1.0' encoding='UTF-8'?>
<sparql xmlns='http://www.w3.org/2005/sparql-results#'>
<head>
<variable name='subject'/>
<variable name='name'/>
<variable name='language'/>
<variable name='languagecode'/>
</head>
<results>
<result>
<binding name='language'>
<uri>http://www.cen.eu/egovshare/trn.owl#enus</uri>
</binding>
<binding name='languagecode'>
<literal datatype='http://www.w3.org/2001/XMLSchema#string'>US</literal>
</binding>
<binding name='subject'>
<uri>http://www.cen.eu/egovshare/trn.owl#Term17</uri>
</binding>
<binding name='name'>
<literal datatype='http://www.w3.org/2001/XMLSchema#string'>Country Council</literal>
</binding>
</result>
. . .
</results>
</sparql>
1.2.3.2.2. Simple XML
This will provide all terms in a simple XML format that is specified by the following XSD:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name="Terms">
<xs:annotation>
<xs:documentation>A collection of terms</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element name="term" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
For example , a result in the simple XML format looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<Terms xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="terms.xsd">
<term lang="en">City Council</term>
<term lang="de">Stadtverwaltung</term>
…
</Terms>
1.2.3.2.3. SKOS
eGov-Share may also output the results in the W3C Simple Knowledge Organization System (SKOS ) . The results will be returned in a SKOS compliant format using the Turtle serialization.
ex1:CityCouncil rdf:type skos:Concept; skos:prefLabel "City Council"@en; skos:inScheme ex1:referenceLocationScheme. ex1: Stadtverwaltung rdf:type skos:Concept; skos:prefLabel "Stadtverwaltung"@de; skos:inScheme ex1: referenceLocationScheme. ex1:CityCouncil skos:exactMatch ex1: Stadtverwaltung.
Please note: SKOS is currently a working draft an may change during the course of the workshop.
1.2.3.2.4. XTM
This format will return all results in the XML Topic Maps standard. An example for the result looks like this:
<topic id="term_17">
<instanceOf><topicRef xlink:href="#term"/></instanceOf>
<baseName>
<baseNameString>City Council</baseNameString>
</baseName>
</topic>
. . .
1.2.3.2.5. Format selection
Selecting which format should be returned by the TRN query is performed by specifying the format with one of the following values:
- application/xml for Simple XML format
- application/skos for SKOS format
- application/xtm+xml for the XTM format
- application/owl+xml for the OWL format
- application/atom+xml for the ATOM format (see next section)
For WebService calls, the format can be passed along with the method call as a parameter. For RESTful calls, the format can be passed along with the HTTP call by specifying the ACCEPT parameter.
1.2.4. Syndicated Content
The TRN allows the usage of the syndication protocol of part 1 of the CWA. In order to use it, the TRN can be called with the application/atom+xml value as the format parameter. The returning values will look like this:
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Terms similar to City Council</title>
<link href="http://www.cen.eu/egovshare/trn/REST/query"/>
<updated>2008-10-11T12:00:00Z</updated>
<author>
<name>CEN Terminological Resource Network (TRN)</name>
</author>
<id>urn:uuid:11b46180-fde2-12dd-1d22-44557aab2f5</id>
<!-- topic map entry -->
<entry>
<title>City Council</title>
<link rel="direct" type="application/owl+xml" href="http://www.cen.eu/egovshare/trn/REST/
query?operation=getSimilarTerms&lang=EU-US&term=City%20Council"/>
<id>urn:uuid:4432c332-2ab3-4ebb-1d22-44557aab2f5</id>
<updated>2008-10-10T11:55:52Z</updated>
<summary> This term represents a council of a city</summary>
</entry>
. . .
</feed>
1.2.5. Practical FAQ
1.2.5.1. What is the easiest way of finding synonyms using the TRN, independently of the language?
You may use the API method like this:
getSynonymTerms("Country", "EN-US", true);
If the last parameter is "true" then all languages will be returned. If it is false then the method will only return terms in the same language as the term that you specify (“EN-US” in this example).
1.2.5.2. What is the difference between Superclass and Subclass relationships?
Let’s take the following example (in UML notation):
If you call getSuperclassTerms for the term “State” then you will receive the term "Region", while getSubclassTerms will result in "Federal State".
1.2.5.3. Will getSimilarTerms also contain all elements of getSynonymTerms?
Yes.
1.2.5.4. How may I access the TRN and its API methods?
The TRN provides a SOAP based WebService interface and a RESTful interface to access all methods. A WSDL specification maybe found in the annex.
1.2.5.5. May I issue a direct query to the underlying ontology database?
This is not officially specified and won’t be part of the Demonstrator implementation. However, it is recommended to add a SparQL query interface in future versions.
1.2.5.6. What is the "id" of a term?
The ‘id’ needs to be a unique ID and might be generated by you using a GUUID generator when creating a term. Technically it is identical to the URI of the OWL individual inside the TRN.
1.3. Demonstrator Implementation
1.3.1. Introduction and Scope
This section will describe the implementation of a demonstrator application for the Terminological Resource Network specification. It is aiming into demonstrating the overall functionality and to give implementers an example on how a realization of our specifications can be realized. The demonstrator aims in
- providing a proof-of-concept like implementation of the specification
- providing an implementation that can be used to check the applicability of the eGov-Share TRN specification by analyzing a working example
- demonstrating the possibility of realizing the specifications in a coherent way
- demonstrating the possibility of combining the different technologies and formats described in this specification
The demonstrator does not aim in
- providing a bullet proof and secure implementation
- providing a rich implementation of all features. For example, the demonstrator will be limited to one of the access methods and to one output format
- implementing a baseline for future implementations
- delivering a bug free and production ready environment for implementers
1.3.2. Architecture
It has to be emphasized that the following design and implementation is just one possible way to implement the specifications. As the specification itself is independent of a specific technology, implementers are free to realize an own implementation with a different architecture and a different technological base if necessary. The following figure shows the high-level architecture of the TRN demonstrator implementation:
1.3.3. Technology and Implementation Details
Since the time for the demonstrator implementation is very limited, the actual implementation only covers one of the access methods, which is the RESTful approach. In addition to this, only one output format will be realized with is the OWL one. Both is sufficient to demonstrate the general functionality and the TRN concept in particular. The implementation of the demonstrator is performed using the Java language. The demonstrator application is based on several open source components, which have proven a good stability and a powerful functionality. The following tools and frameworks have been used for the implementation:
Tomcat as a WebServer environment
Axis2 for providing the preparation of WebService functionalities
XMLBeans for providing XML ,parsing and serialization features
Spring as a leading Web Framework for realizing and coordinating the web application
JSP for providing a simplistic user interface
Sesame for providing semantic storage and query functionalities
1.3.4. Demonstrator Conclusions
The demonstrator implementation may be used to show the specifications of the Terminological Resource Network in a real-world environment. It is, however, limited to the minimum functionality that is necessary to demonstrate those purposes. As such, it is recommended to create a wider prototype implementation including all features that have been specified and including features that are currently out of scope of this CWA such as the access rights management and purification of invalid data. Part 4 of the CWA will describe this when defining roles and when describing the roadmap for future functionalities. However, the current version of the demonstrator is already a valuable source that can offer a good understanding on how the TRN could be integrated into existing landscapes and on how it can be realized with widely used technologies such as Java and Spring.
1.4. Conclusions
This part has shown a holistic concept of realizing a Terminological Resource Network. Elements that should be highlighted are:
- usage of widely accepted standards for expressing semantics (e.g. OWL)
support of two popular access mechanisms (WebServices and REST)
- support of multiple output formats (XTM, SKOS, XML, etc.)
- possibilities for integrating existing resource repositories (e.g. ADNOM)
- extendibility for additional formats and interfaces. Extendability of the data model (“Open World” consumption
- preparation of more complex functionalities because of the semantic background.
Considering the above points emphasizes the large flexibility of the TRN. This is a key success factor to enable a wide acceptance of the specification by various eGovernment institutions in Europe that might use different system landscapes and different regulations.
1.5. Annex
1.5.1. Annex 1: OWL code for TRN
See Annexes
1.5.2. Annex 2: WSDL specification
See Annexes
