1. CWA Part 3 - Establishment of a set of Soft Cultural Elements
Contents
- CWA Part 3 - Establishment of a set of Soft Cultural Elements
- Abstract
- Scope
- Rationale
- State of the Art
- Cultural Elements in eGovernment
- Factual Elements
- Rationale
- Typographic Conventions
- Number formatting
- Monetary formatting
- Other Style Guide related conventions covered in the CLDR or that could easily fit in that structure
- Other Style Guide related conventions that would not readily fit in the CLDR structure
- Other Style Guide related conventions that would benefit from a repository other than CLDR
- Personal names
- Soft Cultural Elements
- Development of a formal structure
- References
NOTE: This document is presently in the process of a major restructuring, ongoing
1.1. Abstract
In an eGovernment context such as Europe, those accessing the resources can come from a wide range of linguistic and cultural backgrounds. It will thus be necessary to provide variants of the eGovernment resources that can be understood by people from these different linguistic and cultural backgrounds. Provision of the resources in a range of languages is an obvious requirement, but simple translation of source documents will not be sufficient to ensure that cross-cultural confusions and mis-interpretation of the style and approach of the author can easily occur. The Unicode Common Locale Data Repository (CLDR) provides resources that can be used to ensure that resources designed for a particular target group conform to correct cultural expectations.
CWA Part 3 proposes additions to the CLDR that address issues that are particularly relevant to the European eGovernment context. This is a context where resources will be available in a range of languages and where these resources may be accessed by people who are not familiar with all of the cultural expectations that native speakers of that particular lanugage variant take for granted. Part 3 also takes important parts of the "Interinstitutional style guide" of the Publications office of the European Union, which is the recommended way to write official documents for the European Union and codes these in a form that should be suitable for the CLDR.
1.2. Scope
Part 3 of the CWA will specify:
- factual cultural elements that are particularly relevant in a European eGovernment context that complement and extend those already in the Unicode CLDR;
- soft cultural elements that are potentially suitable for inclusion in the Unicode CLDR;
- a formalized description of cultural elements that is integrated into the general ontology of part 1a;.
Part 3 is elaborated in close collaboration with the Unicode Consortium, notably with the TC on the Common Locale Data Repository, and in discussion with LISA.
Note: The contents of CWA Part 3 have been discussed on September 9th in a Birds of a Feather (BOF) session at the 32nd Unicode Conference in San Jose (http://www.unicodeconference.org/program-d.htm#bof). The result of that BOF was positive, and the CLDR TC is expecting a proposal from the workshop. This will be carried out by Erkki I. Kolehmainen, who is also a member of the CLDR TC.
1.3. Rationale
To achieve its purpose, it is essential that any service is localized to the needs of its intended users. For a service that may be offered to a wide range of users, this will necessitate offering the service in a number of variants each of which is targetted at people with a specific linguistic and cultural background. Frequently people believe that translating the words used in a service is an adequate form of localization, but this is not the case as it fails to address the other cultural expectations that the intended users bring to their interaction with the service. Failure to address these other cultural factors can result in a range of misinterpretations of the service that can vary from a mis-assessment of how formal the service interaction is meant to be, to being offended by how the service addresses its users, to serious situations where numeric information can be dramatically misunderstood.
It is an unfortunate fact that many of today's eGovernment services are developed without sufficient awareness for cultural diversity. Whereas this might have been acceptable in the past for early prototype services or ones where the known user-base is very culturally homogeneous, the trend throughout Europe (and elsewhere) is for very culturally diverse communities and for services that may have a geographically very broad base of users. This reasons for this general lack of cultural awareness in eGovernment services exist on a number of levels, including:
- eGovernment Services: individual services make unnecessarily specific assumptions about local legal requirements and / or cultural or semantic user expectations
- Data models: data models reflect legal requirements and / or cultural and semantic expectations
- User Interfaces: language and cultural preferences are hard-coded into the user interfaces, i.e., the systems have not been internationalized, in order for them to be readily localizable
In all of these cases, lack of awareness heavily impacts sharing and reusing of existing services and / or user interface solutions in Europe while at the same time impacting the usability of applications even within one country. Solutions are potentially more complex, though, as the eGovernment data on which services operate and which they have to exchange is necessarily heavily reflecting local expectations.
Part 3 proposes a number of potential solutions that can help to reduce or eliminate the negative effects that arise from a failure to correctly address the many cultural conventions that must be considered in the localization of eGovernment services.
1.4. State of the Art
Current operating systems and / or major desktop or web applications are already configured to adapt significantly to the user's culturally specific requirements. As a matter of course, they switch the way dates, numbers or currencies are displayed, adapt the description of menus to the users' language, change the keyboard etc.
In doing so, systems build on so-called locale data that captures many frequently shared preferences ranging from date formats to translations for frequently used terms. Indeed, locale data currently largely consists of two different types of locale preferences:
- language-related preferences such as names of months, days of the week or yes-strings
- country or region -specific preferences such as currency or number formatting
These two types are largely orthogonal. On the one hand, languages are used very often across several countries, and on the other hand, many, if not most countries use more than one language on their territory. A number of preferences, notably keyboards and typographic conventions, are influenced by both language and regional preferences.
1.4.1. Locales
Locale data can be expressed in a number of widely-used formats including the POSIX format (ISO/IEC 9945 / IEEE 1003) and, more recently, the CLDR's Locale Data Markup Language (LDML) (cf. below). Strictly speaking, the term locale refers in this context only to the identifier associated with the data itself.
1.4.2. Language Identification, BCP 47
It is important to know in which language the content is presented, because its processing – e.g., sorting, searching and matching – and rendering, including hyphenation, are often language dependent.
The language codes are defined in ISO 639, the script codes in ISO 15924, and the country or region codes are defined in ISO 3166 and the United Nations M.49. Based on these, the Internet Engineering Task Force (IETF), specifically its Language Tagging Registry Update (LTRU) working group, has defined BCP 47 (Best Current Practice) for identifying the language of the contents at the required precision. BCP 47 currently consists of RFC 4646 and RFC 4647. These language tags are used in a number of other standards, such as HTTP, HTML, XML and PNG. .
NOTE: If a language has a two-letter code in ISO 639-1, it is to be used, and not the three-letter code in ISO 639-2.
Each language tag is composed of one or more "subtags" separated by hyphens. The Language Subtag Registry, maintained by IANA, lists the current valid public subtags. Subtags may also be identified for private use. The same structure (with an underscore in lieu of the hyphen) is also used in CLDR to identify the locales.
Language tags often consist of just a language subtag, or a language subtag and a region subtag. For example, fr represents French, and consists of a single language subtag (from ISO 639-1), while fr-BE represents Belgian French, and consists of the language subtag fr followed by the region subtag BE (from ISO 3166-1). Both of these would match the request fr, whereas only fr-BE would match a request for fr-BE.
1.4.3. The Common Locale Data Repository (CLDR)
The Unicode Common Locale Data Repository is an industry driven, yet fully open direct continuation of a similar OpenI18N effort. The structure and XML data content are defined by Locale Data Markup Language (LDML), Unicode Technical Standard (UTS) #35. LDML is a format used not only for CLDR, but also for general interchange of locale data, such as Microsof's .NET. LDML is specified by the CLDR Technical Committee.
The data is delivered to the using systems in several different formats, including ICU (International Components for Unicode) and POSIX. It is up to the using systems to decide which elements they use for their localized versions, and to which extent they provide flexibility to individual users to either override the default values or to select variations thereof.
Data submission for any language is free for all (registering required). For the vetting of the input data, a number of vetters have been assigned with a specific number of votes for the languages for which they have been authorized. CLDR bug reporting is free for all; the bug reports are processed by the CLDR TC in its weekly phone meetings. Data submission and vetting are essentially the responsibility of the relevant user communities, and once they have assumed this responsibility and acted accordingly, the cost of implementing basic level support (i.e., not including a fully localized user interface) for even a small user group is no longer prohibitive.
An updated release of LDML will be defined for each release of CLDR, defining the new functionality and any clarifications/corrections.
The data categories for translation include the names of languages, scripts, territories, currencies, types (for calendars and collations), variants, time zones, etc. The calendar info includes both translations and formatting. Formatting is also defined, e.g., for numbers and monetary amounts. Furthermore, the standard and auxiliary exemplar character sets are defined, as well as the applicable collation.
In addition to the base locale data, supplemental information covers e.g. the language/territory match with related population information.
The inheritance rules are such that redundant data does not have to be entered. Thus, e.g., for French in Belgium, the French values (in France) will be used unless overriding values are being provided; the ultimate fall-back is the root. Also, since there are several regional and minority languages that don't have anywhere near all the names defined, the names in e.g., the prevailing majority language could be used in lieu of the missing ones, since they would be more meaningful to the users than the code in the root.
1.4.4. Limitations of the CLDR
CLDR is intended to be the record of preferred default practices that are familiar to the ordinary users and with which they feel comfortable. It is not intended to educate the users in what could be seen by some as ideal. As a consequence the CLDR currently has a number of conscious limitations. This is largely related to the self-imposed concentration on linguistic data (cf. UTS#35, section 2) and the tendency to identify differences between locales primarily with language differences. This decision is not arbitrary, but reflects that linguistic data is by far the best understood type of locale data and the one easiest and most precise to describe, in addition to some formatting rules. Since CLDR is intended to help implementations, if reliable data is not available in sufficient quantity, implementers will not bother to deal with it. On the other hand, if the data is not being used for implementations, submitters and vetters will not bother to enhance it. Thus, incorporation of any new elements usually requires a rather strong commitment by the system houses to use them.
The approach taken in CWA Part 3 is that the cultural diversity that transcends language and regional categories is just as important and needs to be captured accordingly.
We argue below that a type of locale data, that is being called "Soft Cultural Elements" and relates to users' non-language cultural expectations, needs to be taken into account. These expectations do not necessarily correspond to the users' language or country. They may instead be either broader – e.g. for expectations shared across Europe – or more specific – e.g. for expectations typical only to a given region.
1.5. Cultural Elements in eGovernment
In spite of the fact that CLDR is supposedly recording current practices only, it can be used for specific user groups also as guidance on how things should be done. As an example, eGovernment, in particular, has very strong requirements for interoperability both at national and at least Pan-European levels. Since they can only be realised by utilizing common practices, CLDR should contain such values as deemed necessary, and they should be expressly identified (as "scope" in Topic Map terms) either as the (European) eGov variant for each language, or as a specific language locale for the region European Union ("XU" in terms of BCP 47, since the use of "EU" has been formally restricted to the currency code EUR). In both cases ordinary users would not be affected.
The proposals of CWA Part 3 are described below under two major categories:
- Factual Elements: that are context dependent, but their concrete values are well defined for a given setting. These typically are specialisations of elements that are already described in the CLDR.
- Soft Elements: that are difficult to describe with precision and will always abstract from potentially diverging personal views. These are a new category that it is proposed should be added to the CLDR.
1.6. Factual Elements
Plan:
- Underline the need for context-specific factual information ("variations" in CLDR-terms, scope in TM terms)
- Categorize typographic conventions into:
- Elements that exist in CLDR, but need eGov-specific variations (example: decimal separator for UK)
- Elements that do not currently exist in the CLDR / in LDML, but should (with rationale)
- Elements that are valuable resources for human users, but not suitable for the CLDR
- Integrate the issue of name ordering into those categories (three types of names: formal, normal and informal)
1.6.1. Rationale
The Unicode CLDR already contains a large number of elements (e.g. selected typographical conventions) that can be unambiguously described and used to enable localized applications to use versions of those elements that are appropriate for the chosen locale. All of these elements are intended to represent what a person from that locale would naturally expect to see. Care is taken to avoid a situation where the CLDR imposes a usage policy that differs from common usage.
However, there are instances where the content of localized services are expected to conform to a usage policy that differs from that commonly understood in a locale. Such a policy is often referred to as a "House Style". Two common examples of such house styles include:
- multi-national corporations that wish to adopt a common house style that can be applied in every country in which they operate;
- official documents in public administrations that span a number of different locales.
It is this latter situation that is particularly relevant in the context of eGovernment. There are many eGovernment administrative regions that span many countries and also those that apply to countries that have multiple languages and cultures within a single country. In such cases, a common house style can reduce or eliminate confusions caused by the application of the common usage conventions from the individual locales to documents and services that apply across this range of locales.
One major example of such a house style is the "Interinstitutional style guide" of the Publications office of the European Union. This is written to assist authors of official documents and translators to use conventions that conform to the official house style policy rather than the common usage conventions that the author or translator may be more familiar with. In this Workshop it is proposed that the CLDR elements of house styles in general, and those that relate to the European eGovernment context in particular, are integrated into the CLDR hierarchy either as variants of the relevant locale data for that culture or as specific locales applying to the region European Union. Adopting this approach would have a number of benefits:
- these resources would be, for the first time, available in a form that could be used in a way identical to the well-supported ways in which other CLDR resources can be used (e.g., using software and tools that are designed around the CLDR);
- this would open up the market for the suppliers of eGovernment services by making readily usable resources available to companies from within and outside the relevant eGovernment administration area;
- these resources could be principally owned and maintained as a niche within the CLDR structure by the responsible administration that is setting the house policies, yet be linked to from the standard CLDR pages to maximise their availability. Although this approach doesn't lend itself to the normal use of the CLDR Survey Tool to supply and vet the data, it should have a minimum impact on those responsible for maintaining the CLDR. These resources would be assigned either variant or region codes to ensure that they are distinguished in a clear way from the resources intended for general usage. This Workshop is the ideal vehicle to propose the introduction of such a new type of CLDR usage and to assist in the adoption of such a proposal by the Unicode Consortium.
House styles also formalize many typographic conventions that are not at present covered in the CLDR. Some of these conventions lend themselves to be added to the CLDR, whereas others such as the proper use of highlighting or tables are mainly or exclusively information targeted at human users and cannot readily be covered in the CLDR context. This information constitutes however a valuable information resource in eGov-Share and is accordingly covered by the ontology of cultural elements.
1.6.2. Typographic Conventions
Typographic conventions are a large class of soft cultural elements. They are closely related to current locale data in that many of them can usually be unambiguously described with strategies that are essentially the same as traditional cultural elements. The CLDR contains the specification of various forms of typographic conventions that represent the normal convention in the appropriate locale. However, when multilingual information resources are published in a number of different locales, differences between the conventions used in each local can cause confusion if someone attempts to read a resource that was designed for a different locale. For this reason, organizations may decide to proscribe conventions to be used for the production of multilingual information resources that reduce such ambiguities. The Publications Office of the European Union maintains an "Interinstitutional style guide" that lists precisely this type of information for most major European languages. However, as this information is not in a machine processable format, it is not possible to use these conventions in the same way as can be done for the CLDR data.
1.6.2.1. Number formatting
One area where typographic conventions have great significance is in the formatting of numbers. If the significance of the characters that separate the different elements of a number are misunderstood then the number can be seriously misinterpreted. If the number formatting separators used are interpreted by a person familiar with the standard usage in the locale then the numbers should be interpreted correctly. All of this usage is already contained in the "number/ symbols/ group" and the "numbers/ symbols/ decimal" elements for each locale of the CLDR.
In Europe, several countries use "." as the "group separator" (e.g. for separating the digits greater than a thousand from those less than a thousand) and several others use a space (preferably a no-break-space). The United Kingdom is exceptional in that it uses "," as the "group separator". This usage is potentially very confusing to those from other European countries as this is the character that most other countries use as the "decimal separator". The United Kingdom is also unique in that it uses "." as the decimal separator, which is the "group separator" in many other countries. (There are also variations in other countries, e.g., ":" has defined usage as a decimal separator in Sweden.) In a multi-locale region such as Europe these major differences between the normal usage of these separators can cause confusion for a multilingual citizen of the European Union reading a text that is written according to the existing United Kingdom "English" conventions. This confusion caused by the different number presentation conventions may also cause some potentially serious misinterpretations to occur. For example, the number 123,456 can be interpreted as a number slightly less than 124or one greater than 123 thousand depending on whether the "," is interpreted as a "decimal separator" or as a "group separator". As English is the most commonly spoken second language in Europe, there is a high risk that citizens of Europe will encounter significant difficulties if they interpret numbers according to their own locale conventions when reading English language documents that have been written using the standard English locale convention that is currently in the CLDR.
To avoid the mistakes that can occur from the misinterpretation of the "decimal separator" and "group separator" within their multi-locale region, the Publications Department of the European Union have defined conventions for the "group separator" and "decimal separator" that should be used for official documents when produced in the different languages of the European Union. These conventions attempt to minimise the risks of confusion that can occur between the differerent national conventions in separator usage whilst at the same time minimizing the deviations from the normal locale specific conventions.
The Interinstitutional Style Guide defines the following variations to the standard conventions of the CLDR:
- The "number/ symbols/ group" separator is standardized as a space for all locales in Europe. This is a variation for those languages that have "." as the "group separator" and for the United Kingdom.
- The "number/ symbols/ decimal" separator defined in the CLDR is to be used for all locales except for the United Kingdom. For the United Kigdom there are two variations according to the context in which the numbers are being used. These are:
- Variation 1: The character "," should be used for the "number/ symbols/ decimal" separator for "all English-language editions of the Official Journal" and its use is described as "acceptable for multilingual publications, statistical works and works where the tables are composed once for all language versions".
- Variation 2: The standard CLDR convention "." should be used for all other situations.
To ensure that the correct conventions are used, the context of usage must be identified (e.g. is this being used in an English-language edition of the Occial Journal or is it being used in a table that will only be composed once and used in a number of versions of the document for different European locales). Once the context is known, the special European conventions or the CLDR conventions will be used according to the above rules - with the regular CLDR conventions being used if not specified in the above rules.
1.6.2.2. Monetary formatting
For the amounts, the number formatting applies.
In official use, the currency symbol €, when permitted, is floating in Englishg, i.e., immediately preceding the amount, similar to the UK pound "£", US dollar "$" and Japanese yen "¥" signs. However, in general use in most other countries than the UK, the currency symbols in current general use – even "€" – follow the amount. In these cases, the currency symbol is separated from the amount by a space (preferably a no-break-space).
The currency code is normally used instead of the symbol in transactions and legal texts. In legal text, the code may precede (in English, Latvian and Maltese) or follow (in all other countries) the amount with a separating space. In general use, however, the currency code often precedes the amount. The currency codes defined in ISO 4217 also cover other value elements than regular currencies.
If spelled out, the singular form "euro" is to be used with any amount in formal EU documents in most languages – certain countries have applied for an exception - but the form "euros" is recommended for more informal text.
Thus, variation will also occur in all of these CLDR definitions.
1.6.2.3. Other Style Guide related conventions covered in the CLDR or that could easily fit in that structure
In addition to the number formatting above, e.g., the placement of the minus sign for negative numbers and also the formatting of scientific (exponential) numbers and percent and permille expressions have been defined.
Two levels of quotation mark pairs, outer and inner, are currently defined, and they are expected to be used alternatively. In addition, e.g., list element separators have been defined.
The CLDR contains only the most commonly used paper size for each locale. It contains the very minimum information on the measurement system in use, but the expansion of this has een already discussed.
It should be noted that the CLDR contains the commonly used names of countries and regions instead of the official names (which could, in principle, be included using the chosen variant mechanism).
The spacing before and after the punctuation marks is not currently defined in the CLDR, although it is language and culture dependent.
1.6.2.4. Other Style Guide related conventions that would not readily fit in the CLDR structure
Many of the typing conventions, including the strategies of emphasis, could not easily be covered by locale definitions.
The way to use references, quotations, footnotes and tables is currently too text processing system dependent to be covered by locale data. These, however, need to be defined in text form for common understanding.
The current proof correction marks are for manual use only, although they, too, need to be unified, especially when centralized publication facilities are being used.
The list of popular fonts is subject to constant evolution.
1.6.2.5. Other Style Guide related conventions that would benefit from a repository other than CLDR
The Style Guide defines a preferred structure for publications with specific guidance for several topics, such as:
- Design of covers and information bits on it
- ditto for title pages
- Preliminary pages and end-matter (SG, 5.3)
- Preferred design of tables of contents
- Preferred design of bibliographies
- Preferred design of the index
- Layout and presentation of main text (SG, 5.4)
- Preferred design of lists (numbered, unnumbered etc.)
This information would best be made available to all interested users in the form of templates. It is understood that at least initially they would have to be text processing system dependent, but progress is to be expected in this field. These localized templates could possibly also be included in the install packs, since they would have a significant market.
1.6.3. Personal names
ISO TC 37 "Terminology and other language and content resources" is in the process of starting a new work item on proper names that could conceivably touch all of the following areas related to personal names.
The use of personal names can be addressed in parallel from three points of view:
1.6.3.1. How to spell a person's name right
The right for proper spelling of personal names has been addressed by a number of worldwide or multi-lateral and even bi-lateral treaties between European countries. The implementation of these treaties is, however, in its infancy, even for a treaty that has been signed already in 1973 (CIEC/ICCS convention No. 14). The legal implications of the resulting lack of interoperability between various registers, particularly the public registers, are significant. Since the Universal Character Set (ISO/IEC 10646 and The Unicode Standard) have removed the technological obstacles that were prevalent in the past, there is no justification not to proceed in defining the multilingual European Latin character repertoire for eGovernment and the transliteration schemes for at least the scripts that are used for the official languages of the European Union. Attempts have been made to define European repertoires, notably in CWA 13873:2000, Multilingual European Subsets in ISO/IEC 10646-1, but that would clearly need to be updated and refined.
1.6.3.2. How to express the name properly in a given context
The names in Europe have different structures and often consist of multiple elements that are either required or can be left out in a given context. The elements include, in addition to or in lieu of the "ordinary" given and family names, patronymicons and/or matronymicons and various common name affixes. The ordering of these is often dependent on the usage environment: formal, normal or informal. Furthermore, capitalization of the first letter of a name component is culture dependent, and in certain cultures the name or the primary identifying element is routinely written in capital letters in both formal and normal usage.
Whether and how the name will change after marriage is defined in the national laws, and the rules are far from identical. It can only be stated that a person's name may change either at his or her own initiative or following formal regulations covering the changes in the person's family status.
An initial approach to name ordering has been under discussion in the CLDR technical committee, but no decisions have been made yet. Experimentation in this area would be most welcome, and it could contribute significantly to the ISO TC 37 work mentioned earlier.
1.6.3.3. How to address a person properly in a given context
The addressing of a person at the right level of formalism in a given situation can be a delicate question. It is relatively easy to define the rules (CLDR-like) for addressing a person in a given language/country in situations categorized as formal, normal or informal, but the identification of these situations falls into the area of soft cultural elements. Too much or too little formality are both seen as offending by many recipients. If the titles of the person are known, he or she expects them to be used properly.
The salutations used in the beginning and at the end of various types of letters vary also considerably from country to country and language to language. The underlying issues are here similar to but somewhat less severe than those in the case of addressing a person.
Edit that part: [: Factual Elements:Factual Elements]
1.7. Soft Cultural Elements
1.7.1. Methodology
Factual elements are context dependent, but their concrete values are well defined for a given setting. Soft cultural elements on the other hand are almost by definition difficult to describe with precision and will always abstract from potentially diverging personal views. Furthermore, their description must be formalized to the extend that information about them becomes machine-readable and interchangeable.
Following the theory of structural anthropology [1] the approach chosen is following structuralist lines, though not dogmatically. The positive (+) or negative (-) judgement on certain key properties in selected elements are identified in the scope of a given cultural setting and optionally a certain context. This procedure is exemplarily demonstrated for colour conventions and then applied to other categories of soft cultural elements.
This CWA does not make any claims as to the epistemological nature of these binary opposition. In the scope of this CWA they serve exclusively as tools for a formal description of those soft cultural elements.
[1] Claude Lévi-Strauss: Anthropologie structurale(1958)
1.7.2. Description of Soft Cultural Elements
A soft cultural element codifies a value judgement that a certain culture or subculture predominantly shares in a given context. These can be preferences for certain ways of interaction — e.g. formal vs. informal or personal vs. institutional —, for certain moral values, colours, gestures etc.. Virtually always, such judgements will be context dependent, as a certain behaviour can be considered to be perfectly admissible, say, in a private context and be deeply offensive if performed in public in the very same cultural setting.
“Predominantly” implies that such a value judgement may not be shared by all members of that culture to an equal degree. For example, even in cultural settings that generally prefer formal modes of interaction some individuals and even whole organizations will instead favour informal interactions.
Value judgements of the predominant culture in a given administrative unit may differ significantly from that of subcultures — e.g., of immigrant communities — that live in the same geographic area. These communities can constitute cultural settings of their own, possibly (but not necessarily) using a different language. A cultural setting can thus be related to an administrative unit, but be more specific (e.g. describing the culture of the Muslim community in the UK or that of the Christian community in Turkey). When the following sample tables lists countries as representing cultural settings, this is thus an oversimplification that helps to stress the fact that the sample data in this CWA is for illustrative purposes only.
1.7.3. Collecting data
The most significant issues in bringing together this information are the scope of the content of the resource and the way in which it is structured. At a simplistic level, the scope of the resource is to include “significant” positive and negative associations of meaning for any cultural setting. The two major issues to be decided in delivering this scope are how an association could be judged to be “significant” and also the ways in which cultural setting could be defined. The significance issue can be resolved by use of the existing Unicode voting methods for agreeing that items should be included in the CLDR and / or by classifying existing such information in the context of a larger network of eGovernment resources. The choice of how to define cultural settings is more complex and will require individual judgement. In some regions, quite a number of the value judgements are primarily related to a specific religion. Another set of associations may be specific to a ethnicity. It is possible and in some eGovernment scenarios certainly advisable to add religions and ethnicities to the linguistic, national and regional settings that are already identified in the CLDR. In particular, a government body may want to address specific religious or ethnic communities in its area with specifically targeted publications or services.
Adding this information to the CLDR itself might be inadvisable, however, for the following reasons:
- issues of ethnicity and religion are often associated with strong opinions that might make agreement on data related to these factors extremely difficult;
- when the ways that this new CLDR resource is likely to be used are considered, it is difficult to imagine that many services will be localized to meet the needs of a specific religious or ethnicities.
It is probable that services will primarily be localized for the same language, national and regional groupings that are already identified in the CLDR. The only practical way in which the issues of racial and religious derived colour associations can be addressed in system design is to take account of the predominant religions and ethnicities that are prevalent in a particular country or region. Fortunately, the existing sources of information related to value judgements in regions and countries already reflect these cultural and religious influences.
The data that the sample tables contain is for illustrative purposes only and is neither complete nor necessarily correct.
1.7.4. Colour Conventions
In each culture, there are specific meanings associated with colours and these meanings frequently vary between different cultures. There are a number of colours that have common meanings across a wide range of cultures. For example, bright red has a worldwide association with danger by 96% of a diverse worldwide sample and with war by 88% [1]. However, the prospect of recording every possible association as part of the soft cultural registry is an enormous task and one that may be subject to a lot of dispute as many of the possible meanings associated with colours may not be universally accepted. What would be useful would be to record those colour associations that have quite strong negative, and to a lesser extent, positive associations. It would also be highly valuable to indicate the sometimes large cross-cultural differences in associated meaning.
A resource that captures significant associations of negative and positive meanings with colours and indicates how these associations differed across cultures provides valuable support to the design of visually presented services. This can be used to avoid using colours in ways that accidentally give offence and, to a lesser extent would assist in creating positive emotions for the service users. This resource could, at one level, be used to automatically identify potential cultural mismatches between the design of a localized version of a service and the cultural expectations of the planned audience for that service. It can also be used in design support tools to provide real-time guidance to creators of eGovernment publications.
Having decided on the scope of this new colour related resource, the final issue is to decide on a way of representing this information in the CLDR and the context of the eGovernment ontology. The table below uses colour as the primary organising dimension. Such a structure would allow a designer or a software application to use the table to generate two lists of countries, one with negative associations and one with positive ones, for each colour that is used in the service. This list of countries could then be compared with those for which the service has been localized and any warnings of negative associations (or highlighting of positive associations) could be generated. This approach is paradigmatic for the general structuralist analysis and description of soft cultural elements:
Colour |
Value judgement |
Code |
Country |
Selected Interpretations |
Red |
- |
RDN |
Madagascar |
Burial |
|
|
|
South Africa |
Mourning |
|
|
|
Ghana |
Mourning |
|
|
|
Egypt |
Death |
|
|
|
China |
Bloodshed - war |
|
+ |
RDP |
China |
Most popular colour |
|
|
|
India |
Life, action, gaiety |
|
|
|
Indonesia |
Luck |
Pink |
- |
PNN |
--- |
--- |
|
+ |
PNP |
India |
Happy, hopeful |
|
|
|
Japan |
Health, happiness |
|
|
|
Singapore |
Happy, feminine |
Purple |
- |
PRN |
Peru |
Not favoured - avoid |
|
+ |
PRP |
United Kingdom |
Prestige |
|
|
|
United States |
Creativity, exciting |
The above is only a small subset of the possible set of colour associations. However, some examples have been included to illustrate that it is possible that the same colour can have both strong negative and strong positive associations for the same culture (e.g. see the colour red for China). It may also be the case that a colour may only have strong positive associations and no negative associations, which appears to be the case for the colour pink.
An alternative way of using this new resource would be for the service designer, or software tool, to specify the regions or countries for which the service has been localized and receive a list of colours to avoid (those with negative associations) and those with very positive associations. A table with the colour and culture columns reversed would meet this need. In the CLDR there is a precedent for representing the same resource in two alternate presentations - "Territory-Language Information" and "Language-Territory Information" and this application seems to be another good situation where this approach is justified.
An issue that deserves a significant debate is the degree to which it is important to reflect a few strong religious colour associations that may pose a significant risk that failing to consider them might offend potential users of a service that is targeting users in any geographic region. The introduction of religious groupings would be a new departure for the CLDR and might, as suggested above, be an issue that would raise concern and controversy.
Another important factor that needs to be taken into account in populating a table of colour associations such as the one above is whether it is anticipated that the colour association data will mainly be used in the context of leisure related services or business related services. The reason for considering this factor relates to a study of the use of factory machines in mainland China [2] which indicated that in the working context of a factory environment the workers were perfectly familiar with the international standard (IEC73) meaning of "danger" for the colour red but, outside that context, they associated it with the traditional Chinese meaning of "Luck".
- [1] Chiijiiwa, Professor Hideaki, Encyclopedia on Color Cognition of the World's Youth. Japan: Kawade Shoboh Shinsha. 1999.
- [2] Röse, K. (2005): Intercultural Human-Machine Systems: Empirical Study of User Requirements in Mainland China. In: Usability and Internationalization of Information Technology, Ed. Aykin, N: Lawrence Erlbaum Associates, London, 2005,
1.7.5. Propriety
Cultures differ widely in their concepts of propriety, especially with regards to decency and permissible behaviour. Differences can be very pronounced: in a culture such as the Tuareg it is unusual for a man to show his face to outsiders and deviating behaviour is considered offensive. Partial or even full nudity and / or display of skin can be acceptable in some cultural settings, but heavily ostracized in others.
1.7.5.1. Nudity / Dressing of persons in images
In the eGovernment setting, a particularly sensitive subset of propriety is the reaction to images of persons on web pages, in documents or in relation to software. An image that may be entirely innocuous in the scope of one cultural setting — e.g., men and women displaying their torsos on the web page of a German local authority alongside the swimming pool's opening hours — may be deeply offensive in other cultural settings, possibly even within the same country. On the other hand, using certain types of dress such as a head scarf may convey political and / or religious messages that are positively connotated in some cultures and negatively in others (potentially again within the boundaries of the same country).
Category of nudity / dressing |
Value judgement |
Context |
Cultural Setting |
Display of intimate parts |
- |
Organizational publications |
Germany |
|
|
Organizational publications |
UK |
|
|
eHealth publications |
Yemen |
|
+ |
eHealth publications |
Germany |
Display of skin of the torso |
- |
Organizational publications |
Saudi Arabia |
Use of head scarf |
- |
Communication |
UK |
|
|
Organizational publications |
UK |
|
|
Organizational publications |
Turkey |
|
+ |
Communication |
Yemen |
Formal clothing |
+ |
Organizational publications |
Italy |
1.7.5.2. Propriety in texts
Much of what has been said relating to propriety in images in eGovernment publications also is relevant to their textual contents. The textual motives are more varied, though, and do not at present lend themselves to the same level of formalization.
1.7.6. Formality
Cultures differ markedly in the level of formality they use in various forms of communications in certain contexts. This is mirrored in many situations — the preferred choice of clothing in certain contexts, the way of saluting or taking leave of somebody, the preference for certain meals over others, etc.. In many of these cases these preferences persist independently of the concrete way of expressing them.
1.7.6.1. Formality in addressing people
In this CWA we concentrate on one special case, the preference for formality used in written communication, in particular in letters and emails, when addressing a person. For expediency we postulate the existence of three different name forms for addressing a person:
formal form of address (e.g. “The Right Honorable XY” or “Herr Prof. Dr. Z”)
neutral form of address (e.g. “Mr. XY” or “Herr Z”)
informal form of address (e.g. “John” or “Hans”)
The concrete format of the name, possibly involving the person's honorifics and styles, as well as name ordering differs widely from culture to culture, as is discussed in more detail above.
We pitch these three forms of address against three contexts and list the preference for using one of them in that context:
- familiar: the addressee is related or close friends;
normal: contact with the addressee is in a “normal” business scenario;
- formal: contact with the addressee is in formal setting and / or the addressee is much more senior in age or rank.
NOTE: If the addressee is a child, the context is normally familiar by default.
The choice of name form in those contexts differs markedly across cultures. In the US, for example, it is quite common to send business letters to virtual strangers on a first name basis, i.e. using an informal version of their name. Doing the same in many other countries would be considered extremely offensive.
Version of Name |
Context |
Value judgement |
Cultural Setting |
Informal |
Formal |
- |
Germany |
|
|
|
Japan |
|
|
+ |
US |
Informal |
Familiar |
+ |
Germany |
Neutral |
Formal |
- |
Germany |
|
|
+ |
US |
Neutral |
Familiar |
+ |
Germany |
|
|
+ |
Italy |
|
|
- |
US |
Formal |
Familiar |
- |
US |
NOTE: The three-partite taxonomy of names and contexts is based on European requirements. For other cultural environments it may be too coarse grained.
1.7.6.2. T-V distinction
Related, but not identical to the choice of the correct form of a name in a given context is the choice of second-person personal pronoun in interacting with somebody else. While some languages such as English have only one second-person personal pronoun (at least, in their present state), namely “you”, many other cultures / languages have two or, indeed, several pronouns that a speaker chooses depending on his relationship to the addressee. This situation is called the T-V distinction, after the Latin pronouns tu and vos. It is well known, e.g., in French (”tu” vs. “vous”), German (”du” vs. “Sie”) and Italian (”tu” vs. “lei”), but in fact, most European languages use a variation of the t-v distinction, though its use has eroded over the last decades in all but the most formal settings in some countries such as Norway and Sweden. Several non-European cultures such as Japan have a much more elaborate system of t-v distinction with many more gradations of formality.
In cultures that have the t-v distinction, the correct choice of pronoun is a matter of basic politeness in both oral and written communication. Not using the correct pronoun can be anything between inconsiderate and extremely rude. It is often, but by no means always coupled to the chosen version of a name.
We apply the same three-partite taxonomy of contexts, but link to a list of second-person personal pronouns, ordered by increasing formality. For our sample table, this CWA concentrates on systems with two second-person personal pronouns (the most frequent case in Europe), but the mechanism scales to more elaborate systems.
Second-person pronoun |
Context |
Value judgement |
Cultural Setting |
Pronoun (for information) |
1 |
Formal |
- |
Germany |
du |
|
|
|
France |
tu |
|
|
|
Italian |
tu |
|
|
+ |
Norway |
du |
|
|
|
Sweden |
du |
|
Familiar |
+ |
Germany |
du |
|
|
|
France |
tu |
|
|
|
Italian |
tu |
|
|
|
Norway |
du |
|
|
|
Sweden |
du |
2 |
Formal |
- |
Norway |
De |
|
|
|
Sweden |
ni |
|
|
+ |
German |
Sie |
|
|
|
France |
Vous |
|
Familiar |
- |
Norway |
De |
|
|
|
Sweden |
ni |
|
|
|
German |
Sie |
|
|
|
France |
Vous |
[1] Brown, R. and A. Gilman (1960) “The Pronouns of Power and Solidarity” in American Anthropologist 4 (6): 24-39
1.7.6.3. Role of hierarchies
Along similar lines, the role of the individual in its institutional hierarchy differs markedly across cultures [1]. The focus can be on the role of the individual in the fabric of governmental organizations or on the organizations themselves. Again, such changes can reflect on the correct format of addressing somebody and permeate many aspects of the communication.
Visibility of hierarchy |
Value judgement |
Cultural Setting |
High |
+ |
Japan |
|
|
Korea |
|
- |
Sweden |
[1] Hall, E.T. (1989) Beyond Culture. Anchor Books, New York 1989
1.7.7. Role of Privacy vs Transparency
Privacy and transparency are both valued positively in most cultures in government contexts, while often being antithetical to one another. Transparency is considered essential, amongst others, to fight corruption and to ensure the fairness of governmental and political decision taking processes. Privacy, on the other hand, is valued to protect the individual sphere of citizens, politicians and government employees alike. The relative weighting of these values differs markedly between cultures.
In the context of this CWA we concentrate on the effect of the weighting on the access to government resources. In some countries, notably Sweden, the great majority of government documents including tax declarations and job applications to government posts are disclosed to the public. In others government documents are considered private by default, and the release of a citizen's tax declaration to the public would be a serious breach of relevant privacy laws.
Arbitrarily, the following table used “+” to mark a preference for privacy over transparency and “-” for the inverse situation.
Value judgement on privacy |
Context |
Cultural Setting |
- |
Documents originating from citizens |
Sweden |
|
Application process for government positions |
Norway |
|
Non-security related documents |
Sweden |
|
Non-security related documents |
Germany |
+ |
Documents originating from citizens |
Germany |
|
Security related documents |
Sweden |
|
Security related documents |
Germany |
[1] Posner, R. A. (1981). “The economics of privacy”. The American Economic Review, 71(2), 405-409
1.8. Development of a formal structure
1.8.1. Introduction
The class of soft cultural elements is a subclass of the general resource class defined in the reference ontology (part 1a). It can be expressed in a number of different representation including Topic Maps (XTM), OWL or a special extension to the CLDR's Local Data Markup Language (LDML). Alongside the class of soft cultural elements proper we have a class of factual elements that complement current CLDR data and a class of document templates. All of these are also subclasses of the general resource class.
The following exposition of the classes relevant for expressing cultural elements extend the reference ontology. Only new classes, properties and associations are listed.
NOTE: As in the general eGovernment reference ontology, the prefix http://psi.egovpt.org/types/ is omitted from all identifiers to facilitate rendition. It must be prefixed to all unqualified IDs.
1.8.2. Soft Cultural Element
Soft Cultural Element is the abstract base class for the soft cultural elements proper and a direct child of the general resource class.
Classname |
ID |
Subclass of |
Soft cultural element |
soft-cultural-element |
resource |
Property name |
Short description |
ID |
Alternate ID |
Typical value domains |
CLDR ID |
Pointer to the Soft Cultural Element in the CLDR (if existing) |
cldr-id |
|
xsd:anyURI |
Relationship (in addition to the inherited ones):
Name |
Short description |
Type-ID |
Player type 1 |
Role type 1 |
Player type 2 |
Role type 2 |
codifies |
The cultural element codifies a certain value judgement |
codifies |
soft-cultural-element |
soft-cultural-element |
soft-cultural-element/value-judgement |
soft-cultural-element/value-judgement |
1.8.3. Privacy
Preferences for certain privacy options such as strict privacy requirements vs. transparency
Classname |
ID |
Subclass of |
Privacy |
soft-cultural-element/privacy |
soft-cultural-element |
1.8.4. Formality
Preferences for a formal behaviour in certain circumstances
Classname |
ID |
Subclass of |
Formality |
soft-cultural-element/formality |
soft-cultural-element |
Property (in addition to the inherited ones):
Property name |
Short description |
ID |
Alternate ID |
Typical value domains |
type |
Type of formal behaviour that is seen as positive or negative |
soft-cultural-element/type |
|
Description of the type of soft cultural elements according to a given taxonomy |
1.8.5. Colour code
Preferences for certain colours or colour schemes in given circumstances
Classname |
ID |
Subclass of |
Colour code |
soft-cultural-element/colour-code |
soft-cultural-element |
Property (in addition to the inherited ones):
Property name |
Short description |
ID |
Alternate ID |
Typical value domains |
Colour code |
Code of the colour according to a given taxonomy |
soft-cultural-element/colour-code |
|
|
1.8.6. Propriety
Acceptability of certain behaviour in given circumstances, notably relating to standards of decency
Classname |
ID |
Subclass of |
Propriety |
soft-cultural-element/propriety |
soft-cultural-element |
Property (in addition to the inherited ones):
Property name |
Short description |
ID |
Alternate ID |
Typical value domains |
Category |
Categorization of the type of propriety (e.g. decency) according to a given taxonomy |
soft-cultural-element/category |
|
|
1.8.7. Factual Element
A class of elements that describe assertions on culturally determined facts, e.g. use of typographic conventions or rules for greeting people. In type, these elements are related to “traditional” CLDR data. In addition to the explicitly enumerated subclasses other existing CLDR locale data can be seen as subclasses of this type.
Classname |
ID |
Subclass of |
Factual element |
factual-element |
resource |
Properties (in addition to the inherited ones):
Property name |
Short description |
ID |
Alternate ID |
Typical value domains |
Values |
Values for that element according to a subclass-specific taxonomy |
factual-element/value |
|
|
Relationship (in addition to the inherited ones):
Name |
Short description |
Type-ID |
Player type 1 |
Role type 1 |
Player type 2 |
Role type 2 |
scoped-by |
The factual element is scoped by / valid in a certain cultural setting |
scoped-by |
factual-element |
factual-element |
cultural-setting |
cultural-setting |
1.8.8. Typographic Convention
Typographic conventions that are being used in a given cultural setting
Classname |
ID |
Subclass of |
Typographic convention |
factual-element/typographic-convention |
factual-element |
1.8.9. Name ordering
Rules for the ordering of the components of a personal name (e.g. sequences of given name / family name or vice versa)
Classname |
ID |
Subclass of |
Name ordering |
factual-element/name-ordering |
factual-element |
1.8.10. Personal Name Structure
Structures for the handling of personal names
Classname |
ID |
Subclass of |
Personal name structure |
factual-element/personal-name-structure |
factual-element |
Property name |
Short description |
ID |
Alternate ID |
Typical value domains |
formality |
Description of the formality level that this personal name structure is used in according to a taxonomy |
factual-element/formality |
|
|
1.8.11. Document Template
Instances of the class document template refer to machine-readable and machine-processable document templates that correspond to a set of typographic preferences (e.g. preferred fonts, margins etc.).
Classname |
ID |
Subclass of |
Document Template |
document-template |
resource |
Properties (in addition to the inherited ones):
Property name |
Short description |
ID |
Alternate ID |
Typical value domains |
Document format |
Document format according to a subclass-specific taxonomy |
factual-element/value |
|
css, “ott” (the OpenDocument text template), OOXML text templates etc. |
1.8.12. Value Judgement
A value judgement on given behaviours or facts in the scope of a specific cultural setting. The value judgement can be context-dependent.
Classname |
ID |
Subclass of |
Value judgement |
value-judgement |
|
Property name |
Short description |
ID |
Alternate ID |
Typical value domains |
Value judgement |
Context-dependent value judgement of a certain fact judged by a certain standard of behaviour in a given cultural setting |
value-judgement |
|
Often one of + or - |
Relationship:
Name |
Short description |
Type-ID |
Player type 1 |
Role type 1 |
Player type 2 |
Role type 2 |
scoped-by |
a value judgement scoped by a context |
scoped-by |
value-judgement |
value-judgement |
context |
context |
scoped-by |
a value judgement in the scope of a given cultural setting |
scoped-by |
value-judgement |
value-judgement |
cultural-setting |
cultural-setting |
1.8.13. Cultural Setting
An instance of the cultural setting class represents a given cultural environment (e.g. the British culture or the culture of the Muslim community in France). Other cultural elements are related to this.
Classname |
ID |
Subclass of |
Cultural setting |
value |
|
Property name |
Short description |
ID |
Alternate ID |
Typical value domains |
Name |
Name of the cultural setting (possibly scoped by language) |
name |
|
|
Description |
Free-text description of the cultural setting (possibly scoped by language) |
description |
|
|
Variant |
Specific variant of a cultural setting (if closely related to another cultural setting) |
description |
|
|
Relationship:
Name |
Short description |
Type-ID |
Player type 1 |
Role type 1 |
Player type 2 |
Role type 2 |
uses |
a cultural setting uses a language |
uses |
soft-cultural-element |
soft-cultural-element |
language |
language |
is-related-to |
a cultural setting is related to one or more administrative unit(s) (not necessarily states) |
is-related-to |
cultural-setting |
cultural-setting |
administrative-unit |
administrative-unit |
is-related-to |
a cultural setting is related to another cultural setting |
is-related-to |
cultural-setting |
cultural-setting |
soft-cultural-element |
soft-cultural-element |
1.8.14. Language
Classname |
ID |
Subclass of |
Language |
language |
|
Property name |
Short description |
ID |
Alternate ID |
Typical value domains |
Name |
Name of the language (possibly scoped by language) |
name |
|
|
BCP47 tag |
Tag of the language according to BCP47 |
bcp47-tag |
|
|
1.8.15. Context
Classname |
ID |
Subclass of |
Context |
context |
|
Property name |
Short description |
ID |
Alternate ID |
Typical value domains |
Category |
Classification of the context according to a given taxonomy |
category |
|
|
Description |
Free-text description of the context |
description |
|
|
1.8.16. Simplified Overview of the Ontology
1.9. References
SG: Interinstitutional style guide of the Publications office of the European Union, EU style guides
Unicode TR35: Unicode Technical Standard #35 Unicode Locale Data Markup Language (LDML)


