The abstract question of different markets and profit margins is certainly interesting, but the real fun starts when specific elements requiring translation or modification to fit into a foreign culture are identified and isolated. Ranging from the obvious -- such as the language of the error messages -- to the subtle -- such as how many lines fit on a typical printout -- the specifics of international software ultimately represent slices of the culture for which they are targeted.
The general idea here is one that sociologists call cultural context. Cultural context refers to the realization that all elements of a particular culture or society must be viewed from the point of view of that culture, rather than that of the viewer. Examples of this abound, with some amusing ones being in the area of cuisine. How many times have you heard horrifying stories of the parts of animals that people eat in remote locales? Surely people in other parts of the world find it quite shocking to find that U.S. residents commonly eat products fried in lard, which is just boiled animal fats, for example?
Clearly, then, to fit into a culture, software must also be designed to fit in to the cultural context of the user. A difficult task, this implies a comprehensive understanding of far more than simply the language and notational conventions. Indeed, because of this complication, many of the most successful international products end up being localized in the target country, rather than at a central facility in the United States or elsewhere.
This approach to international markets can be seen with the Pacific Rim consumer electronics products as well; while the units themselves are primarily designed in the home countries (Japan, Korea, Taiwan, etc), they are custom fit for the target market cultural context. Japanese telephones, for example, have U.S. phone digit/letter equivalents shown, as well as packaging and documentation in English. Of course, therein lies a pitfall too; many of the documents for consumer products are written in English by Japanese native speakers, rather than U.S. or English native speakers, leading to obfuscated, awkward prose. It is easy to extrapolate and imagine someone in a non-Western nation looking at an error message from a program such as "line 40: cannot grok 'k++' here," without any clue whatsoever about the meaning of the message.
What then needs to be encapsulated in this concept of cultural context? Just about everything, from the basic language to the slang and colloquialisms, to various notational conventions, and more. Let us take a closer look.
When first learning how to write, I was puzzled over why certain words got the first letter capitalized while other words did not. If I talked about a specific person, for example, his Mother, the upper case is used, but the more vague someone's mother stayed as all lower case. Of course, what we are talking about are proper nouns, which now rarely present a problem in my writing.
There's a subtle activity that was taught while learning what proper nouns were, however; how to capitalize a letter. Seems pretty simple, doesn't it? You just hold down the shift key on the keyboard and type, right? Well, not really. Similar to many other facets of international software, shifting case, or transliteration as it is more formally known, is fraught with pitfalls.
Many languages simply do not have the concept of upper and lower case letters at all, such as Hebrew and Arabic languages, as well as many of the Asian languages. Other languages have subtle constraints involved with transliteration, constraints that aren't likely to be known outside of that country.
An example of this is French versus Canadian French. If you have the word ècole and want to transliterate it to all upper case, is it the same in both languages? Logic says "yes." Canadian French is almost completely identical to the French they speak and write in France. But the languages are not identical. In France, the notational convention would be to retain the accent when the first letter is transliterated, leading to ÈCOLE, but in Canada, the French speaking population simply drop diacritical marks, such as the grave, when transliterated, leading to the French Canadian ECOLE. Ironically, most French speaking people drop the accent on capitalized words because they're not available on typewriters and computers, even those manufactured in France.
Another interesting example is the name of one of the largest cities in Switzerland: Zürich. In German, the word is correctly capitalized ZÜRICH, retaining the umlaut, but in Swiss German, they capitalize it as ZUERICH instead.
Although only used in a small subset of applications, justification of text is a feature that your product might well have. If so, how do you hyphenate words in a foreign language?
Originally hyphenation arose as a way to align the right columns of textual information, in an identical evolution to proportional spacing. The idea was since words naturally broke at syllabic boundaries, they could then be split at those boundaries and shown on two lines. For example, a typesetter coming across a document that contained "antidisestablishmentarianism" might well break into a cold sweat thinking about how to crush that into a two-inch- wide column in a newspaper. Being able to split it into syllables, however, gets us to "an-ti-dis-es- tab-lish-men-tar-i-an-ism", which now has ten different possible spots to break the word gracefully. While many hyphenation programs use simpler rules to figure out where words can be broken (rules such as "double consonants can always be split; 'comment' to 'com-ment'") all hyphenation capable software inherently uses English-based hyphenation rules.
These rules do not work globally. An interesting example is in German, where the word for cuckoo "kuckuk" is not hyphenated as "kuc-kuk" as one might expect, but rather as "kuk-kuk". The actual spelling of the word changes because of the presence of the hyphen. Imagine how complex that algorithm needs to be.
Another area that is more difficult than it initially may seem is spelling. Most modern computer operating environments offer a wide variety of document spelling checkers, from those incorporated into sophisticated packages such as Microsoft Word to the Unix spell command.
Yet spelling is another cultural context-specific facet of internationalization. Even in English, spelling rules are more complex than simply: "is it in my dictionary of correctly spelt words?" For example, should that last sentence have the word "spelled" or "spelt"? It depends on which dictionary you use.
More subtle examples are when slang and colloquialisms creep into use, especially nonsense words such as "cowabunga" and speech- imitating phrases such as "fer shure," rather than the correctly spelt "for sure." The point is that "fer shure" might be correctly spelt, or spelled, for a possible cultural context.
Capturing this knowledge in a spelling checker is almost impossible, and is shown by the proliferation of spelling packages. Options available in spelling checkers now include being able to ignore words in all uppercase and words that are abbreviations (a facet that is not going to be explored herein).
With other languages it is even more curious. For example, on December 18 of 1990, Portugal, its former African Colonies -- Cape Verde, Angola, Mozambique, Guinea-Bissau, and Sao Tome e Principe -- and Brazil agreed to adopt identical spelling for their local dialects of Portuguese. Their hope is that the accord will create a unified market for Portuguese-language books.
Sorting, or collation, is something that almost all software packages bump into at one point or another. This can show up in surprising places too. Would you have guessed that the Unix ls command, which lists files in a directory, requires a sophisticated sorting algorithm for it to work correctly?
Collation is a fascinating problem with many languages, ranging from the relatively straightforward addition of letters such as the Ò in Spanish (see note) to the formidable challenge of sorting proper names in Japanese.
What's even more confusing is that many languages actually view two-character pairs as a single character. In Spanish, the 'ch' in "chico" is viewed as a single letter, which should correctly collate between 'c' and 'd' in a Spanish program. Similarly, ñ should collate between 'm' and 'n'.
English isn't bereft of these curiosities either. Where do numbers sort to in a list? The top? How about upper versus lower case words? That is, where should "Smith" collate to; between "small" and "smythe" or before them both, since it is upper case? (In this particular example, numbers and upper case letters almost always sort before lower case letters due to ASCII character ordering. The simple rule used in most sorting software is that if the ASCII representation of a letter is a smaller number than another, the letter is bibliographically lower, or earlier, in the alphabet.)
The new lexical ambiguity of non-alphabetic characters has led to an interesting phenomenon where lists, such as the indices of books, are now sorted differently than they might have been a hundred years ago when the ordering would have been done by hand. The reason for this change is simply that sorting algorithms on computers have not adequately modeled the lexical ordering of earlier indices.
Cultural context jumps into the fray, with subtle requirements of international software. In Japan, tradition has it that lists of names are sorted by the rank or importance of each person, as well as alphabetically within each rank. To properly fit in, then, an electronic mail system for a Japanese firm should properly sort the list of names in the distribution list by their rank and importance in the company. An impossible task for a traditional collation algorithm.
Of all the different elements of internationalization, the most obviously different are notational conventions for date, time, numbers, and so on. Indeed, the simple move to metric can be quite jarring. When traveling throughout the world notational conventions can prove confusing too. If you are in England and you see "11/10/90" written down, is it November 10th or October 11th?
At the same time, the difference in notational conventions is one of the most exciting facets of international software, where correctly matching a particular cultural context can reap immediate rewards regarding the international look and feel of a software package. Indeed, there are many programs that change the notational conventions for different countries, but ignore the more subtle (and more difficult) variations examined in this chapter.
4.1.5.1 Numbers
Almost all countries in the world use Arabic-based numbers, namely 1, 2, 3, 4, 5, 6, 7, 8, 9, and 0. Further, numeric values are "base 10" too; 124 is 1*10*10 + 2*10 + 4. Numeric notational differences come in when numbers are extended to fractional values or add break characters or punctuation to help understand very large numbers.
In the United States, the 'radix' point, or character between the whole part of a number and the fractional, by convention, is a dot. Seven-and-a- half can then be represented as "7.5" and understood. The radix changes in Europe, however, where "and" becomes a comma; "7,5" isn't a list of two numbers as you might expect in the United States, but rather is another way of noting seven-and- a-half.
In the U.S. the comma is used for numeric notation, as breaks within very large numbers. For example, two-million might well be represented as 2,000,000 in the U.S., with it commonly understood that the commas are there as a notational convenience and have no actual numeric value. In much of Europe, however, since they use the comma as a radix point, they clearly cannot also use it to separate very large numbers because the result would be ambiguous. Occasionally, very large numbers might have quotes as separators: 2'000'000, but there remains ambiguity with numbers such as 3,443: is that approximately three and a half, or over three thousand? Instead, the European notational convention is that dots are used in this context, resulting in a complete reversal of the notation. The number 3,000.50 in the U.S. would be represented as either 3.000,50, 3'000,50, or 3000,50 in Europe.
Although this notational difference may seem straightforward, there are some subtle problems that can be caused here. For example, if you want to market a spreadsheet in France, not only do you need to take into account the differences in numeric notation, but you might well have to relabel some output features too (it doesn't make sense to talk about "lining up the decimal point" on a column of numbers if the decimal point isn't a point, does it? Interestingly, the French translation of 'decimal point' is virgule, which literally means 'comma.')
Another interesting variation between the U.S. and European convention is what a "billion" represents, numerically. In the U.S., a billion is a thousand million, or 1,000,000,000. In European countries, however, a billion is a million million, a significantly larger quantity: 1,000,000,000,000.
As with much of the varied cultural context involved with international software, numeric notation is straightforward to cope with, but the tendrils of the U.S. culture can be embedded deeply in the design of software, interfaces, and even documentation. A straight translation is rarely, if ever, an appropriate solution.
4.1.5.2 Currency
Just as numbers have different notation based on cultural and language context, so does the notation to represent amounts of money vary throughout the world too. Even in the U.S., in fact, there is a fair amount of difference; consider 50¢ versus $0.50. Not only are you seeing a decimal point radix on the latter, but you are seeing examples of both a postfix and prefix currency delimiter (after the value, as in 50¢, and before the value, as in $0.50).
Throughout the world there are a wide variety of different notations for currency, including prefix notation -- £5 representing five pounds British -- infix notation -- 5$50 representing five and fifty in Portugal -- and postfix notation -- 50¥ representing fifty yen in Japan. Confusingly, some countries try to adapt similar notation to others, with subtle differences. For example, Australian currency is denoted in Australia as 500$AU and Canadian money, in Canada, is referred to as CD$500.
4.1.5.3 Time and Date
One of the first set sof nouns learned in a new language seems to be the days of the week and months of the year. Indeed, this is a very useful item of information, learning that if a la casa en Sabado refers to being at the house on Saturday. More than just the names of the days of the week and months being different, though, the actual notation representing dates can vary widely too.
You've already seen, for example, the difference in simple abbreviated month- day-year notation; again, does 11/12/90 represent November 12th, 1990, or December 11th? In fact, that varies based on interpretation, with the common U.S. notation being Month/Day/Year (leading to November 12) and common European notation as Day/Month/Year (December 11). Even with this simple numeric notation, however, there are further variations. For example, official U.S. Government documents typically are dated Year/Month/Day. For example: 90/12/11 (or was that 90/11/12?).
In the United States, there are a wide variety of date formats that could be needed in a software program, including: August 3, 1990, Aug 3, 1990, Aug 3 '90, 3 Aug 90, Friday, Aug 3, and so on.
Even more confusing, some countries traditionally use non- Western (Gregorian) calendars. Japan, China, and Israel all have their own way of keeping track of the date. For example, Israeli dates might well refer to years in the 5700 decade (known in the U.S., Europe, and elsewhere as 1900).
Time notations are equally varied, with some cultures encouraging the inclusion of seconds -- 11:40:33 -- others preferring 24-hour, military time -- 21:30 -- and yet others using a different delimiter between times -- 11.30. Adding to the confusion, notation also includes 11:30.40 to represent seconds added to a time.
When combined, the date and time formats can result in quite a tremendous variety of forms, which is especially troubling for software that must read user input and extrapolate the specified date or time entered. Spreadsheets, for example, often are required to offer this feature, allowing users to add time/date information to their numeric information. Clearly, adding language and cultural support for Spanish in a spreadsheet is quite a bit more than simply changing the prompts.
In addition to all the different notational elements, other features of software and hardware interfaces can require modification to fit in with a foreign culture too. For example, individual colors have widely different meanings; white represents purity and hope in the U.S., but in Japan, white is the color of death. Red, by contrast, represents danger in the U.S., but happiness in China.
Graphical elements are also subject to local interpretation. On the Apple Macintosh, the trashcan icon that we are familiar with would have a completely different meaning to people from an African or Middle Eastern culture, to the point where they might not understand the symbolism, making the interface considerably more difficult to use. By the same token, many consumer electronics from the Pacific Rim come with instruction booklets that feature illustrations Westerners find offensive or overly cute. For example, consider figure 4.1, where the manufacturer is warning the consumer not to plug the unit into an inappropriate power outlet.
While it might be an accepted and enjoyable method of conveying information to customers in the Japanese culture, cute cartoon-like illustrations are not as widely appreciated and accepted in the United States.
With internationalization, even the most subtle features can prove to be culturally sensitive. When looking at a printout, it is clear that the words and typeface will change to reflect the local culture (you wouldn't want to print Chinese using a Cyrillic font), but even where the page number is placed can vary. Indeed, I recall talking once with a purveyor of international software who boasted about page numbering from their product always being in English. When it was suggested that customers might want the word "page" in the appropriate language, the response was "I doubt it."
Even the size of the paper can vary, with 8.5x11 being the U.S. standard, and 8x13 being a British/European standard. If your program must be able to add a page number (in the right language, please!) half an inch from the bottom then it clearly needs to know where the bottom of the page actually is.
Actually, while that is true, it really glosses over one of the most annoying features of most word processing and desktop publishing packages; everything is oriented and computed around the inch measure. Point size on fonts, for example, are computed as 1/72". Outside of the United States, paper size is rarely stated by measurements, so 8x13 paper is referred to as A4, and if size is mentioned, it is in terms of centimetres.
Finally, a challenging cultural variance is the order that characters are displayed. In Western languages the standard method is reading from the top to the bottom, left to right, lines of text. That is not consistently true for all written languages. Changing the order of text can prove tremendously difficult for software packages. Hebrew and the Arabic languages, for example, are line oriented similar to English, but read right to left (which has the interesting result that books are 'printed in reverse;' one starts by opening up what you would consider the 'back cover' and reading towards the 'front cover').
Chinese and Japanese, sharing a written language, can be written in almost any direction, but are most often written left to right or top to bottom, column oriented rather than line oriented. Imagine figuring out how to prompt a user for input with that notational convention. In fact, these cultures have moved towards a more tenable middle ground, with the Japanese having a number of different language variants including Katakana and Romaji, which are Japanese words line-oriented rather than column-oriented, and Japanese words phonetically spelt out in Roman (English) letters, respectively.
It should be clear that the task of successfully internationalizing either software, hardware, or even documentation is challenging. With differences in language, notational conventions, word ordering, color cues, and even variation in icons, the amount of knowledge required is substantial.