Factual Elements
Plan:
- Underline the need for context-specific factual information ("variations" in CLDR-terms, scope in TM terms)
- Categorize typographic conventions into:
- Elements that exist in CLDR, but need eGov-specific variations (example: decimal separator for UK)
- Elements that do not currently exist in the CLDR / in LDML, but should (with rationale)
- Elements that are valuable resources for human users, but not suitable for the CLDR
- Integrate the issue of name ordering into those categories (three types of names: formal, normal and informal)
Rationale
The Unicode CLDR already contains a large number of elements (e.g. selected typographical conventions) that can be unambiguously described and used to enable localized applications to use versions of those elements that are appropriate for the chosen locale. All of these elements are intended to represent what a person from that locale would naturally expect to see. Care is taken to avoid a situation where the CLDR imposes a usage policy that differs from common usage.
However, there are instances where the content of localized services are expected to conform to a usage policy that differs from that commonly understood in a locale. Such a policy is often referred to as a "House Style". Two common examples of such house styles include:
- multi-national corporations that wish to adopt a common house style that can be applied in every country in which they operate;
- official documents in public administrations that span a number of different locales.
It is this latter situation that is particularly relevant in the context of eGovernment. There are many eGovernment administrative regions that span many countries and also those that apply to countries that have multiple languages and cultures within a single country. In such cases, a common house style can reduce or eliminate confusions caused by the application of the common usage conventions from the individual locales to documents and services that apply across this range of locales.
One major example of such a house style is the "Interinstitutional style guide" of the Publications office of the European Union. This is written to assist authors of official documents and translators to use conventions that conform to the official house style policy rather than the common usage conventions that the author or translator may be more familiar with. In this Workshop it is proposed that the CLDR elements of house styles in general, and those that relate to the European eGovernment context in particular, are integrated into the CLDR hierarchy either as variants of the relevant locale data for that culture or as specific locales applying to the region European Union. Adopting this approach would have a number of benefits:
- these resources would be, for the first time, available in a form that could be used in a way identical to the well-supported ways in which other CLDR resources can be used (e.g., using software and tools that are designed around the CLDR);
- this would open up the market for the suppliers of eGovernment services by making readily usable resources available to companies from within and outside the relevant eGovernment administration area;
- these resources could be principally owned and maintained as a niche within the CLDR structure by the responsible administration that is setting the house policies, yet be linked to from the standard CLDR pages to maximise their availability. Although this approach doesn't lend itself to the normal use of the CLDR Survey Tool to supply and vet the data, it should have a minimum impact on those responsible for maintaining the CLDR. These resources would be assigned either variant or region codes to ensure that they are distinguished in a clear way from the resources intended for general usage. This Workshop is the ideal vehicle to propose the introduction of such a new type of CLDR usage and to assist in the adoption of such a proposal by the Unicode Consortium.
House styles also formalize many typographic conventions that are not at present covered in the CLDR. Some of these conventions lend themselves to be added to the CLDR, whereas others such as the proper use of highlighting or tables are mainly or exclusively information targeted at human users and cannot readily be covered in the CLDR context. This information constitutes however a valuable information resource in eGov-Share and is accordingly covered by the ontology of cultural elements.
Typographic Conventions
Typographic conventions are a large class of soft cultural elements. They are closely related to current locale data in that many of them can usually be unambiguously described with strategies that are essentially the same as traditional cultural elements. The CLDR contains the specification of various forms of typographic conventions that represent the normal convention in the appropriate locale. However, when multilingual information resources are published in a number of different locales, differences between the conventions used in each local can cause confusion if someone attempts to read a resource that was designed for a different locale. For this reason, organizations may decide to proscribe conventions to be used for the production of multilingual information resources that reduce such ambiguities. The Publications Office of the European Union maintains an "Interinstitutional style guide" that lists precisely this type of information for most major European languages. However, as this information is not in a machine processable format, it is not possible to use these conventions in the same way as can be done for the CLDR data.
Number formatting
One area where typographic conventions have great significance is in the formatting of numbers. If the significance of the characters that separate the different elements of a number are misunderstood then the number can be seriously misinterpreted. If the number formatting separators used are interpreted by a person familiar with the standard usage in the locale then the numbers should be interpreted correctly. All of this usage is already contained in the "number/ symbols/ group" and the "numbers/ symbols/ decimal" elements for each locale of the CLDR.
In Europe, several countries use "." as the "group separator" (e.g. for separating the digits greater than a thousand from those less than a thousand) and several others use a space (preferably a no-break-space). The United Kingdom is exceptional in that it uses "," as the "group separator". This usage is potentially very confusing to those from other European countries as this is the character that most other countries use as the "decimal separator". The United Kingdom is also unique in that it uses "." as the decimal separator, which is the "group separator" in many other countries. (There are also variations in other countries, e.g., ":" has defined usage as a decimal separator in Sweden.) In a multi-locale region such as Europe these major differences between the normal usage of these separators can cause confusion for a multilingual citizen of the European Union reading a text that is written according to the existing United Kingdom "English" conventions. This confusion caused by the different number presentation conventions may also cause some potentially serious misinterpretations to occur. For example, the number 123,456 can be interpreted as a number slightly less than 124or one greater than 123 thousand depending on whether the "," is interpreted as a "decimal separator" or as a "group separator". As English is the most commonly spoken second language in Europe, there is a high risk that citizens of Europe will encounter significant difficulties if they interpret numbers according to their own locale conventions when reading English language documents that have been written using the standard English locale convention that is currently in the CLDR.
To avoid the mistakes that can occur from the misinterpretation of the "decimal separator" and "group separator" within their multi-locale region, the Publications Department of the European Union have defined conventions for the "group separator" and "decimal separator" that should be used for official documents when produced in the different languages of the European Union. These conventions attempt to minimise the risks of confusion that can occur between the differerent national conventions in separator usage whilst at the same time minimizing the deviations from the normal locale specific conventions.
The Interinstitutional Style Guide defines the following variations to the standard conventions of the CLDR:
- The "number/ symbols/ group" separator is standardized as a space for all locales in Europe. This is a variation for those languages that have "." as the "group separator" and for the United Kingdom.
- The "number/ symbols/ decimal" separator defined in the CLDR is to be used for all locales except for the United Kingdom. For the United Kigdom there are two variations according to the context in which the numbers are being used. These are:
- Variation 1: The character "," should be used for the "number/ symbols/ decimal" separator for "all English-language editions of the Official Journal" and its use is described as "acceptable for multilingual publications, statistical works and works where the tables are composed once for all language versions".
- Variation 2: The standard CLDR convention "." should be used for all other situations.
To ensure that the correct conventions are used, the context of usage must be identified (e.g. is this being used in an English-language edition of the Occial Journal or is it being used in a table that will only be composed once and used in a number of versions of the document for different European locales). Once the context is known, the special European conventions or the CLDR conventions will be used according to the above rules - with the regular CLDR conventions being used if not specified in the above rules.
Monetary formatting
For the amounts, the number formatting applies.
In official use, the currency symbol €, when permitted, is floating in Englishg, i.e., immediately preceding the amount, similar to the UK pound "£", US dollar "$" and Japanese yen "¥" signs. However, in general use in most other countries than the UK, the currency symbols in current general use – even "€" – follow the amount. In these cases, the currency symbol is separated from the amount by a space (preferably a no-break-space).
The currency code is normally used instead of the symbol in transactions and legal texts. In legal text, the code may precede (in English, Latvian and Maltese) or follow (in all other countries) the amount with a separating space. In general use, however, the currency code often precedes the amount. The currency codes defined in ISO 4217 also cover other value elements than regular currencies.
If spelled out, the singular form "euro" is to be used with any amount in formal EU documents in most languages – certain countries have applied for an exception - but the form "euros" is recommended for more informal text.
Thus, variation will also occur in all of these CLDR definitions.
Other Style Guide related conventions covered in the CLDR or that could easily fit in that structure
In addition to the number formatting above, e.g., the placement of the minus sign for negative numbers and also the formatting of scientific (exponential) numbers and percent and permille expressions have been defined.
Two levels of quotation mark pairs, outer and inner, are currently defined, and they are expected to be used alternatively. In addition, e.g., list element separators have been defined.
The CLDR contains only the most commonly used paper size for each locale. It contains the very minimum information on the measurement system in use, but the expansion of this has een already discussed.
It should be noted that the CLDR contains the commonly used names of countries and regions instead of the official names (which could, in principle, be included using the chosen variant mechanism).
The spacing before and after the punctuation marks is not currently defined in the CLDR, although it is language and culture dependent.
Other Style Guide related conventions that would not readily fit in the CLDR structure
Many of the typing conventions, including the strategies of emphasis, could not easily be covered by locale definitions.
The way to use references, quotations, footnotes and tables is currently too text processing system dependent to be covered by locale data. These, however, need to be defined in text form for common understanding.
The current proof correction marks are for manual use only, although they, too, need to be unified, especially when centralized publication facilities are being used.
The list of popular fonts is subject to constant evolution.
Other Style Guide related conventions that would benefit from a repository other than CLDR
The Style Guide defines a preferred structure for publications with specific guidance for several topics, such as:
- Design of covers and information bits on it
- ditto for title pages
- Preliminary pages and end-matter (SG, 5.3)
- Preferred design of tables of contents
- Preferred design of bibliographies
- Preferred design of the index
- Layout and presentation of main text (SG, 5.4)
- Preferred design of lists (numbered, unnumbered etc.)
This information would best be made available to all interested users in the form of templates. It is understood that at least initially they would have to be text processing system dependent, but progress is to be expected in this field. These localized templates could possibly also be included in the install packs, since they would have a significant market.
Personal names
ISO TC 37 "Terminology and other language and content resources" is in the process of starting a new work item on proper names that could conceivably touch all of the following areas related to personal names.
The use of personal names can be addressed in parallel from three points of view:
How to spell a person's name right
The right for proper spelling of personal names has been addressed by a number of worldwide or multi-lateral and even bi-lateral treaties between European countries. The implementation of these treaties is, however, in its infancy, even for a treaty that has been signed already in 1973 (CIEC/ICCS convention No. 14). The legal implications of the resulting lack of interoperability between various registers, particularly the public registers, are significant. Since the Universal Character Set (ISO/IEC 10646 and The Unicode Standard) have removed the technological obstacles that were prevalent in the past, there is no justification not to proceed in defining the multilingual European Latin character repertoire for eGovernment and the transliteration schemes for at least the scripts that are used for the official languages of the European Union. Attempts have been made to define European repertoires, notably in CWA 13873:2000, Multilingual European Subsets in ISO/IEC 10646-1, but that would clearly need to be updated and refined.
How to express the name properly in a given context
The names in Europe have different structures and often consist of multiple elements that are either required or can be left out in a given context. The elements include, in addition to or in lieu of the "ordinary" given and family names, patronymicons and/or matronymicons and various common name affixes. The ordering of these is often dependent on the usage environment: formal, normal or informal. Furthermore, capitalization of the first letter of a name component is culture dependent, and in certain cultures the name or the primary identifying element is routinely written in capital letters in both formal and normal usage.
Whether and how the name will change after marriage is defined in the national laws, and the rules are far from identical. It can only be stated that a person's name may change either at his or her own initiative or following formal regulations covering the changes in the person's family status.
An initial approach to name ordering has been under discussion in the CLDR technical committee, but no decisions have been made yet. Experimentation in this area would be most welcome, and it could contribute significantly to the ISO TC 37 work mentioned earlier.
How to address a person properly in a given context
The addressing of a person at the right level of formalism in a given situation can be a delicate question. It is relatively easy to define the rules (CLDR-like) for addressing a person in a given language/country in situations categorized as formal, normal or informal, but the identification of these situations falls into the area of soft cultural elements. Too much or too little formality are both seen as offending by many recipients. If the titles of the person are known, he or she expects them to be used properly.
The salutations used in the beginning and at the end of various types of letters vary also considerably from country to country and language to language. The underlying issues are here similar to but somewhat less severe than those in the case of addressing a person.
