1. CWA Part 2 - Federated Terminological Resources

1.1. Semantic Interoperability Management: Principles

1.1.1. Introduction

In order to realize a universal access to eGovernment resources, the area of interoperability between information systems is a key pillar that needs to be achieved. Different authorities usually use different terms to describe resources, different interfaces to publish them and different ways of semantics to understand and interpret data that has been exchanged. Bridging this gap is a complex task that is to be achieved by eGov-Share. This section focuses on interoperability in the area of terminology. Just imagine two eGovernment services, one being located in the Netherlands and one in Germany and being related to the traffic area. Both might be classified based on a written category name. While the Dutch would be located in the area of the “Ministerie van verkeer en waterstaat” , the German one would be located in “Bundesministerium für Verkehr, Bau und Stadtentwicklung” . Obviously, if a person searches for specific services, he may not find them because of the different terms that are used. The same service might therefore be described using different terms for the same concepts. The well known semiotic triangle visualizes this relationship.

In addition to using different terms for the same service, the example that has been mentioned before also shows the possibility of having terms that are not 100% identical but only similar or overlapping in their meaning. For example, the two categories “Ministerie van verkeer en waterstaat” and “Bundesministerium für Verkehr, Bau und Stadtentwicklung” might contain the same service provided by the corresponding Government in order to monitor the current traffic situation on popular car routes. Those two services might be called “verkeersinformatie” and “verkehrsinformation”. Those two terms might therefore be considered to be in an “is equivalent” relationship since they are expressing the same element. However, the corresponding terms for the categories cannot be considered to be identical. While “Ministerie van verkeer en waterstaat” can roughly be translated into “Ministry for Traffic and Water Affairs”, the other one can be translated into “State ministry for traffic, construction and city development” and it therefore covering different concerns as displayed in the following figure.

1.1.2. Principles

In order to handle the challenges described in the last subsection, a Terminological Resource Network (TRN) can be used that allows the specification of relationships between terms. This will increase the semantic interoperability. For example, the terms mentioned in the last subsection could be in a “is equivalent” or in a “is similar” relationship. The overall goal is to define a data model that can be used as a base to achieve the following scenario: Precondition: It is assumed that several eGovernment Resources have been indexed and are listed in a joint registry as specified in Part 1 of this CWA. It is also assumed that they are described with different terms that are listed within a term registry that will be referred to as the ‘Terminological Resource Network’. For details, users may refer to CWA 15526 published by the CEN Workshop “European Network for Administrative Nomenclature” in May 2006.

  1. The user is using an international eGovernment Resource Registry as defined by Part 1 of this CWA in order to find a specific eGovernment Resource. The users is therefore typing in a search term, called “traffic”
  2. The system therefore automatically contacts the TRN to ask for additional terms that are identical or similar to the term “traffic”. The TRN will answer with a list of synonyms that has been compiled by analysing all terms and their relationship: traffic [EN]; verkehr [DE]; verkeer [NL].
  3. The eGovernment Resource Registry system may use this information to not only return eGovernment resources that are described with the term “traffic” but to also suggest resources that have been classified by the terms “verkehr” or “verkeer”. Depending on the languages of the users only those resources will be highlighted that are described in at least one of his languages. Of course the user may extend then search to include more languages at any time.
  4. The result is that the user will receive a list of all suitable eGovernment Resources independent of languages or terms that have been used to describe this resource.

In order to make this scenario possible, it is required

  1. to define how terms may be specified and related to each other in a flexible way and
  2. to define and maintain a significant number of terms in the TRN

While the first part will be defined in this section, the second part will be described in the following section called “Terminological Resource Network: Realization and Integration”.

1.1.3. Data Model and Specification

In order to manage terms and their semantic relationships, a semantic language will be used for description. The Web Ontology Language OWL is a suitable and powerful approach for describing semantics. As described by the W3C, “OWL is intended to be used when the information contained in documents needs to be processed by applications, as opposed to situations where the content only needs to be presented to humans. OWL can be used to explicitly represent the meaning of terms in vocabularies and the relationships between those terms. This representation of terms and their interrelationships is called an ontology. OWL has more facilities for expressing meaning and semantics than XML, RDF, and RDF-S, and thus OWL goes beyond these languages in its ability to represent machine interpretable content on the Web.” (see http://www.w3.org/TR/owl-features).

OWL allows to specify classes (called concepts) and their relationships as well as instances of classes (called individuals). The collection of all classes that are defined in an ontology is often referred to as a so called T-Box, while the collection of instances are called an A-Box.to be more specific, T-Box statements describe a system in terms of controlled vocabularies such as classes and properties, while A-Box statements are based on the T-Box specifications and contain statements about that vocabulary. In the following paragraphs, we define the model that is necessary to describe and retrieve the requests described in the earlier sections. We will therefore specify the T-Box elements. The following requirements are fulfilled by this:

What needs to be emphasized is that users will normally not have to deal with this data model since it is widely used internally. Instead of this, users will use the result, which means that they will ask the Terminological Resource Network for a specific term and for its relationships to other terms. Those terms will be stored in the format specified by the model above. Based on this model, a set of terms will be specified in section 2.2, which will also give some more hands-on examples.

1.2. Terminological Resource Network: Realization and Integration

1.2.1. Introduction

This section contains a more hands-on description of instances (A-Box) of terminological resources and their relationship. This is performed by showing real-world examples. This section also contains the description on how existing terminological data sources may be included, especially ebXML RR systems as defined in the ADNOM CWA. In order to create a Terminological Resource Network server nodes need a way to

  1. Integrate existing or new terms into the TRN
  2. Define relationships between terms
  3. Offer an easy way of retrieving a list of all terms stored in the TRN
  4. Provide a query interface allowing users to retrieve terms that are related to another term

The first two points will be discussed in the following section, while the last two points will be handled in the next section “Public Services”.

1.2.2. Internal Representation, Interconnection & Integration

1.2.2.1. Representation & Term Interconnection

The integration of new terms into the TRN is performed by creating an instance of the data model specification that has been defined in section 2.1. The following code shows an example for this for a term called "City Council".

  <Term rdf:ID=" Term17">
    <name>city Council</name>
    <description xml:lang="en">
This term represents a council of a city
    </description>
          <language rdf:resource="#enus"/>
  </Term>
  <Language rdf:ID="enus">
        <primaryCode>EN</primaryCode>
        <subcode>US</subcode>
  </Language>

Terms may be linked to other terms by specifying relationships:

  <Term rdf:ID=" Term17">
    <name>City Council</name>
    <description xml:lang="en">This term represents a council of a city</description>
    <language rdf:resource="#enus"/>
  </Term>
    <Term rdf:ID="Term_18">
      <name>Municipality</name>
      <language rdf:resource="#enus"/>
      <relationship rdf:resource="#Synonym_2"/>
    </Term>
    <Term rdf:ID="Term_19">
      <name>Stadtverwaltung</name>
      <description xml:lang="de">Dieser Term repräsentiert die Stadtverwaltung</description>
      <language rdf:resource="#dede"/>
      <relationship rdf:resource="#Synonym_2"/>
    </Term>
      <Synonym rdf:ID="Synonym_2">
        <destinationTerm rdf:resource="#Term17"/>
      </Synonym>
  <Language rdf:ID="enus">
        <primaryCode>EN</primaryCode>
        <subcode>US</subcode>
  </Language>
      <Language rdf:ID="dede">
        <subcode>de</subcode>
        <primaryCode>DE</primaryCode>
      </Language>

This example has specified three terms, their languages and their relationships.

The following figure visualized the relationship between those terms:

Figure 4: Example Instances

1.2.2.2. Integration of ADNOM Results and 3rd Party Terminological Resources

Of course third party results could be integrated into the TRN by simply importing them with an import tool. However, this would result in a duplicate data management and might now always be desired. The TRN therefore provides an alternative way of adding terms from third party resources. The only requirement is that the term is describable with a unique resource identifier URI. For example, the ADNOM CWA specified the usage of a federated ebXML Registry & Repository service for specifying and connecting terms. Those terms can be referenced by a URI that contains the location of the ADNOM server as well as the ID of the term stored in the ADNOM sever. For an example, let us assume that a term has been specified in an ADNOM server and is referenced with the URI http://www.cen.eu/adnom/23221/term42 We can now specify that a term in the TRN has a relationship with this ADNOM term. For example, we can say that it inherits from this ADNOM term:

   <RemoteTerm rdf:ID="RemoteTerm_5">
        <name>Region</name>
        <url>http://www.cen.eu/adnom/23221/term42</url>
   </RemoteTerm>
  <Term rdf:ID="Term_6">
    <name>State</name>
    <relationship rdf:resource="#Inheritance_7"/>
  </Term>
  <Inheritance rdf:ID="Inheritance_7">
        <destinationTerm rdf:resource="#RemoteTerm_5"/>
  </Inheritance>

1.2.3. Public Services

The main functionality that is to be provided by the TRN is the capability to query the TRN in order to get terms that are related or synonyms to a specific term. The Webservice interface (see WSDL in the Annex) will therefore provide the following functionalities:

  1. getAllTerms – This will return a list of all terms contained in the TRN
  2. getAllTermsByLanguage– This will return a list of all terms for a specific language
  3. getSynonymTerms– This will return a list of synonyms for a specific term
  4. getSimilarTerms– This will return a list of similar terms for a specific term
  5. getOpositeTerms– This will return a list of opposite terms for a specific term
  6. getSuperclassTerms– This will return a list of superclass terms for a specific term
  7. getSubclassTerms– This will return a list of subclass terms for a specific term

In order to provide a highest possible flexibility the TRN will provide

  1. different ways of accessing functionalities
  2. different formats for receiving query results

1.2.3.1. Ways of accessing the TRN

In order to provide a maximum of compatibility, the TRN provides two different ways of accessing it. The first one is the usage of calling the TRN with invoking a simple URL via HTTP and receiving the output as a direct response. This methodology is often referred to as a RESTful approach (Representational State Transfer) as described in part 1 of the CWA. The second way to access the TRN via SOAP based WebServices as defined by the W3C including the provision of WSDL files for defining the method syntax information.

1.2.3.1.1. REST

The RESTful interface of the TRN allows users to access the TRN by invoking the TRN as follows:

        Server: http://www.cen.eu/egovshare/trn/REST
        GET {OP}

1.2.3.1.2. WebServices

Over Webservices, the following methods are exposed:

String[] getAllTerms(String format);
String[] getAllTermsByLanguage(String code, String format);
String[] getSynonymTerms(String term, String lang, boolean returnAllLanguages, String format);
String[] getSimilarTerms(String term, String lang, boolean returnAllLanguages, String format);
String[] getOpositeTerms(String term, String lang, boolean returnAllLanguages, String format);
String[] getSuperclassTerms(String term, String lang, boolean returnAllLanguages, String format);
String[] getSubclassTerms(String term, String lang, boolean returnAllLanguages, String format);

Those methods reflect to the corresponding functionalities described at the beginning of this section. String parameters are used to express the term and the language of the term. A Boolean parameter can be used to indicate whether the TRN should return all terms or only those terms that are in the same language as the term that is provided as a parameter. The following call shows an example SOAP request:

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" 
                                                xmlns:typ=" http://www.cen.eu/egovshare/trn/ws/types">
   <soapenv:Header/>
   <soapenv:Body>
      <typ:getAllTermsByLanguage>
         <typ:code>EN-US</typ:code>
      </typ: getAllTermsByLanguage >
   </soapenv:Body>
</soapenv:Envelope>

1.2.3.2. Output formats supported by the TRN

Similar to the ways of accessing the TRN the output will also be provided in different formats. This will allow a flexible usage of the TRN for many different purposes. More precisely, the TRN provides output formats OWL, Simple XML and XTM.

1.2.3.2.1. OWL

This will return the terms as a result of a SparQL query to the RDF graph that is defined by the terms and specifications in OWL. The result will be compliant to the current W3C recommendation of SparQL Query Results in XML format as described at http://www.w3.org/TR/rdf-sparql-XMLres . An example of this format looks like this:

<?xml version='1.0' encoding='UTF-8'?>
<sparql xmlns='http://www.w3.org/2005/sparql-results#'>
        <head>
                <variable name='subject'/>
                <variable name='name'/>
                <variable name='language'/>
                <variable name='languagecode'/>
        </head>
        <results>
                <result>
                        <binding name='language'>
                                <uri>http://www.cen.eu/egovshare/trn.owl#enus</uri>
                        </binding>
                        <binding name='languagecode'>
                                <literal datatype='http://www.w3.org/2001/XMLSchema#string'>US</literal>
                        </binding>
                        <binding name='subject'>
                                <uri>http://www.cen.eu/egovshare/trn.owl#Term17</uri>
                        </binding>
                        <binding name='name'>
                                <literal datatype='http://www.w3.org/2001/XMLSchema#string'>Country Council</literal>
                        </binding>
                </result>
                . . .
        </results>
</sparql>

1.2.3.2.2. Simple XML

This will provide all terms in a simple XML format that is specified by the following XSD:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" 
  elementFormDefault="qualified" attributeFormDefault="unqualified">
        <xs:element name="Terms">
                <xs:annotation>
                        <xs:documentation>A collection of terms</xs:documentation>
                </xs:annotation>
                <xs:complexType>
                        <xs:sequence>
                                <xs:element name="term" type="xs:string" minOccurs="0" maxOccurs="unbounded"/>
                        </xs:sequence>
                </xs:complexType>
        </xs:element>
</xs:schema>

For example , a result in the simple XML format looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<Terms xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
                  xsi:noNamespaceSchemaLocation="terms.xsd">
        <term lang="en">City Council</term>
        <term lang="de">Stadtverwaltung</term>
</Terms>

1.2.3.2.3. SKOS

eGov-Share may also output the results in the W3C Simple Knowledge Organization System (SKOS ) . The results will be returned in a SKOS compliant format using the Turtle serialization.

ex1:CityCouncil rdf:type skos:Concept;
   skos:prefLabel "City Council"@en;
   skos:inScheme ex1:referenceLocationScheme.
   
ex1: Stadtverwaltung rdf:type skos:Concept;
   skos:prefLabel "Stadtverwaltung"@de;
   skos:inScheme ex1: referenceLocationScheme.

ex1:CityCouncil skos:exactMatch ex1: Stadtverwaltung.

Please note: SKOS is currently a working draft an may change during the course of the workshop.

1.2.3.2.4. XTM

This format will return all results in the XML Topic Maps standard. An example for the result looks like this:

  <topic id="term_17">
    <instanceOf><topicRef xlink:href="#term"/></instanceOf>
    <baseName>
      <baseNameString>City Council</baseNameString>
    </baseName>
  </topic>
  . . .

1.2.3.2.5. Format selection

Selecting which format should be returned by the TRN query is performed by specifying the format with one of the following values:

For WebService calls, the format can be passed along with the method call as a parameter. For RESTful calls, the format can be passed along with the HTTP call by specifying the ACCEPT parameter.

1.2.4. Syndicated Content

The TRN allows the usage of the syndication protocol of part 1 of the CWA. In order to use it, the TRN can be called with the application/atom+xml value as the format parameter. The returning values will look like this:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Terms similar to City Council</title>
  <link href="http://www.cen.eu/egovshare/trn/REST/query"/>
  <updated>2008-10-11T12:00:00Z</updated>
  <author>
    <name>CEN Terminological Resource Network (TRN)</name>
  </author>
  <id>urn:uuid:11b46180-fde2-12dd-1d22-44557aab2f5</id>
  <!-- topic map entry -->
  <entry>
    <title>City Council</title>
    <link rel="direct" type="application/owl+xml" href="http://www.cen.eu/egovshare/trn/REST/
                                            query?operation=getSimilarTerms&lang=EU-US&term=City%20Council"/>
    <id>urn:uuid:4432c332-2ab3-4ebb-1d22-44557aab2f5</id>
    <updated>2008-10-10T11:55:52Z</updated>
    <summary> This term represents a council of a city</summary>
  </entry>
  . . . 
</feed>

1.2.5. Practical FAQ

1.2.5.1. What is the easiest way of finding synonyms using the TRN, independently of the language?

You may use the API method like this:

getSynonymTerms("Country", "EN-US", true);

If the last parameter is "true" then all languages will be returned. If it is false then the method will only return terms in the same language as the term that you specify (“EN-US” in this example).

1.2.5.2. What is the difference between Superclass and Subclass relationships?

Let’s take the following example (in UML notation):

If you call getSuperclassTerms for the term “State” then you will receive the term "Region", while getSubclassTerms will result in "Federal State".

1.2.5.3. Will getSimilarTerms also contain all elements of getSynonymTerms?

Yes.

1.2.5.4. How may I access the TRN and its API methods?

The TRN provides a SOAP based WebService interface and a RESTful interface to access all methods. A WSDL specification maybe found in the annex.

1.2.5.5. May I issue a direct query to the underlying ontology database?

This is not officially specified and won’t be part of the Demonstrator implementation. However, it is recommended to add a SparQL query interface in future versions.

1.2.5.6. What is the "id" of a term?

The ‘id’ needs to be a unique ID and might be generated by you using a GUUID generator when creating a term. Technically it is identical to the URI of the OWL individual inside the TRN.

1.3. Demonstrator Implementation

1.3.1. Introduction and Scope

This section will describe the implementation of a demonstrator application for the Terminological Resource Network specification. It is aiming into demonstrating the overall functionality and to give implementers an example on how a realization of our specifications can be realized. The demonstrator aims in

The demonstrator does not aim in

1.3.2. Architecture

It has to be emphasized that the following design and implementation is just one possible way to implement the specifications. As the specification itself is independent of a specific technology, implementers are free to realize an own implementation with a different architecture and a different technological base if necessary. The following figure shows the high-level architecture of the TRN demonstrator implementation:

1.3.3. Technology and Implementation Details

Since the time for the demonstrator implementation is very limited, the actual implementation only covers one of the access methods, which is the RESTful approach. In addition to this, only one output format will be realized with is the OWL one. Both is sufficient to demonstrate the general functionality and the TRN concept in particular. The implementation of the demonstrator is performed using the Java language. The demonstrator application is based on several open source components, which have proven a good stability and a powerful functionality. The following tools and frameworks have been used for the implementation:

1.3.4. Demonstrator Conclusions

The demonstrator implementation may be used to show the specifications of the Terminological Resource Network in a real-world environment. It is, however, limited to the minimum functionality that is necessary to demonstrate those purposes. As such, it is recommended to create a wider prototype implementation including all features that have been specified and including features that are currently out of scope of this CWA such as the access rights management and purification of invalid data. Part 4 of the CWA will describe this when defining roles and when describing the roadmap for future functionalities. However, the current version of the demonstrator is already a valuable source that can offer a good understanding on how the TRN could be integrated into existing landscapes and on how it can be realized with widely used technologies such as Java and Spring.

1.4. Conclusions

This part has shown a holistic concept of realizing a Terminological Resource Network. Elements that should be highlighted are:

  1. usage of widely accepted standards for expressing semantics (e.g. OWL)
  2. support of two popular access mechanisms (WebServices and REST)

  3. support of multiple output formats (XTM, SKOS, XML, etc.)
  4. possibilities for integrating existing resource repositories (e.g. ADNOM)
  5. extendibility for additional formats and interfaces. Extendability of the data model (“Open World” consumption
  6. preparation of more complex functionalities because of the semantic background.

Considering the above points emphasizes the large flexibility of the TRN. This is a key success factor to enable a wide acceptance of the specification by various eGovernment institutions in Europe that might use different system landscapes and different regulations.

1.5. Annex

1.5.1. Annex 1: OWL code for TRN

See Annexes

1.5.2. Annex 2: WSDL specification

See Annexes

egovpt_fg: CWA Part 2 (last edited 2008-10-07 14:01:12 by SvenAbels)