Party Vibe

Register

Welcome To

How to convert Nordic characters on various computers.

Forums Life Learning & Education How to convert Nordic characters on various computers.

  • This topic is empty.
Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
    Posts
  • The University servers in Sweden have long since been taken down as has the authors own personal website (I even tried to locate a newer version to credit him but suspect his name may be widely duplicated) – although this is indexed on the old dejanews via google it isn’t easy to find and is still of use to anyone who experiments with emulators of European computers, older equipment using text mode consoles or even a lot of newer specialist items such as small thermal printers that still use older character encoding standards.

    From: j…@lysator.liu.se (SCN FAQ-robot)
    Subject: The Nordic graphemes FAQ (the s.c.nordic FAQ)
    Date: 1998/10/25
    Message-ID: <909306010@tingeltangel.lysator.liu.se>
    X-Deja-AN: 404817159
    Content-Transfer-Encoding: 8bit
    URL: http://www.lysator.liu.se/nordic/scn/faq18.html
    Organization: Linköping University, Sweden
    Content-Type: text/plain; charset=iso-8859-1
    Mime-Version: 1.0
    Newsgroups: soc.culture.nordic

    The article below belongs to the www-pages at the soc.culture.nordic FAQ
    web-site. Regarding its accuracy, it must be stated that also _if_ the
    text initially were quite up to date, this might have changed.

    The www-version at http://www.lysator.liu.se/nordic/scn/faq18.html
    may look slightly different due to sparsely added links and illustrations.
    For newer browsers (able to handle tables) the page is also available at a
    faster www-server: http://www2.lysator.liu.se/nordic/scn/faq18.html

    The www-version has been html-ized, however not “web-ized” – the texts are
    not edited to comply to web-readability findings. The pages are supposed to
    get printed out by readers who find them interesting.

    Feel free to propose changed wordings for paragraphs and sections in need
    of that. Also relevant links are more than welcome, in particular links to
    serious www-pages which are not supposed to change address too often. :->>

    – – – – – – – – – – – – – – – – – – – – – – –

    Subject: 1.8 What are Nordic graphemes?

    (by Tor Slettnes)

    Nordic graphemes can in this context be described as:

    Graphical representations of the letters that exist in the various
    Nordic (i.e. Icelandic, Norwegian, Danish, Swedish and Finnish)
    alphabets, beyond those that exist in the English alphabet.

    Each of the Nordic written languages uses some additional letters
    compared to English. These are, in order of appearance in the
    alphabets:
    Letter: Languages used: Pronounced like: character:
    ________________________________________________________________

    a acute is ‘ou’ in “loud” á
    eth is ‘th’ in “there” ð
    e acute is (dk, no, se, fi) ‘ea’ in “yeah” é
    i acute is ‘e’ in “he” í
    o acute is ‘o’ in “home” ó
    u acute is ‘ou’ in “you” ú
    y acute is ‘e’ in “he” ý
    thorn is ‘th’ in “thumb” þ
    ae is ‘i’ in “hi” æ
    dk, no ‘a’ in “bad” æ
    o-slash dk, no ‘i’ in “bird” ø
    a-ring dk, no, se (fi) ‘o’ in “bored” å
    a diaeresis se, fi ‘a’ in “bad” ä
    o diaeresis se, fi, is ‘i’ in “bird” ö
    u diaeresis (se, fi, dk, no) ‘ue’ in french “rue” ü

    A set of parentheses around the country code indicates that the letter
    is rarely used in the corresponding language, typically only for loan
    words or names originating from another language. Other accents, such
    as ^ (circumflex) and accent-grave are now and then used in foreign
    names and words in all Nordic languages.

    In Denmark and Norway the alphabet is ordered:
    a b c d e f g h i j k l m n o p q r s t u v w x y z æ ø å

    For Finland and Sweden the order is:
    a b c d e f g h i j k l m n o p q r s t u v w x y z å ä ö

    If your curiosity isn’t satisfied by the pronounciation guide above,
    there are more extensive comments in the various language sections of
    this faq.

    1.8.1 How are these represented in Usenet postings and E-mail?

    The “mother” of all modern character sets for computers is the
    original ASCII character set, now renamed to US-ASCII. (ASCII =
    “American Standard Code for Information Interchange”). This is a 7-bit
    set containing the characters needed to write American English without
    accents or special letters, and little more. No “foreign” letters are
    included.

    Various standards exist for representing extra characters, some of
    which are: Digraph, LaTeX, ISO-646, ISO-8859-1, and the IBM codepages
    437, 850, and 865. All of these sets, except the IBM codepages, are
    usually considered acceptable on soc.culture.nordic, e-mail, and the
    internet in general.

    Digraphs are two-character combinations used for simplicity, and are
    often the most universally understood notation on soc .culture
    .nordic. However, when using these to non-Nordics, one should be
    careful to explain that these are digraphs, not two separate
    characters. Also, some information may get lost by using digraphs,
    since a filtering program will not be able to determine whether it is
    really a digraph or two separate characters.

    LaTeX notation comes from the typesetting program by the same name,
    where a sequence starting with ” may be substituted with a given
    character. For instance, the a-ring is written as “aa” or “{aa}” in
    LaTeX.

    ISO-646 (really ISO-646-NO and ISO-646-SE) are 7-bit sets similar to
    US-ASCII, but with national characters substituted in place of the
    following characters: {, |, }, [, , ]. This is the oldest one of the
    “true representation” standards mentioned here; it was used in e.g.
    the Nordic versions of the CP/M operating system, prior to MS-DOS.
    Today, it is mostly used in Sweden and Finland (although the ordering
    of the letters, for the sake of compability with the Danish /Norwegian
    /German equivalents, are not correct in these languages).

    ISO-8859-1, also called ISO Latin-1, is the first of several 8-bit
    character sets described in International Standards Organization’s
    document 8859 <http://czyborra.com/charsets/iso8859.html>. (ISO is the
    maintainer of the meter, the kilogram, etcetera.) This sets include
    all characters needed for all West European languages, leave Sámi and
    Esperanto. Latin-1 is a superset of US-ASCII, hence all ASCII
    characters maintain their original position in this set. Rather than
    trying to accomodate positioning in any spesific language, the letters
    in ISO-8859-1 are ordered according to the alphabetical position of
    their US-ASCII lookalikes. Latin-1 is supported through modern
    standardizations like MIME (RFC 1521).

    The IBM codepages 437, 850, 861 and 865 are used on Personal Computers
    in “text” mode, and is also the default set on many MS-Windows ®
    communication programs. Out of the Big Blue, they were created to
    provide text-based PC programs with a means to create low-cost
    graphics, and the addition of extra characters came as a nice side
    effect. (Certain Nordic characters were not represented in the
    original codepage 437, with the consequence that in Iceland, Denmark
    and Norway, computers would occasionally be sold with cp 861 or 865 in
    the hardware. Today, alternative codepages can be downloaded to the
    video card via software). The Danish /Norwegian character o-slash is
    not represented in cp 437, and in 850 /861 /865 it is positioned with
    the dangerous code 155 (9B hex) — “Upper Escape”. Certain terminal
    types will interpret this code as the initial character of a escape
    command, and may e.g. clear the screen depending on the next letter.
    Further, it is incompatible with the established 8-bit standard
    Latin-1, and should be avoided.

    The various notations of the Nordic graphemes follow:
    Letter Digraph LaTeX ISO-646 ISO-8859-1
    HTML Octal Char
    _________________________________ _____________________________________

    a acute A’ ‘{A} – alt-0193 Á Á 301 Á
    a’ ‘{a} – alt-0225 á á 341 á
    eth TH – alt-0208 Ð Ð 320 Ð
    th – alt-0240 ð ð 360 ð
    e acute E’ ‘{E} – alt-0201 É É 311 É
    e’ ‘{e} – alt-0233 é é 351 é
    i acute I’ ‘{I} – alt-0205 Í Í 315 Í
    i’ ‘{i} – alt-0237 í í 355 í
    o acute O’ ‘{O} – alt-0211 Ó Ó 323 Ó
    o’ ‘{o} – alt-0243 ó ó 363 ó
    u acute U’ ‘{U} – alt-0218 Ú Ú 332 Ú
    u’ ‘{u} – alt-0250 ú ú 372 ú
    y acute Y’ ‘{Y} – alt-0221 Ý Ý 335 Ý
    y’ ‘{y} – alt-0253 ý ý 375 ý
    thorn TH – alt-0222 Þ &THORN ; 336 Þ
    th – alt-0254 þ þ 376 þ

    u diaeresis U” “{U} ^ alt-0220 Ü Ü 334 Ü
    u” “{u} ~ alt-0252 ü ü 374 ü
    ae AE {AE} [ alt-0198 Æ Æ 306 Æ
    ae {ae} { alt-0230 æ æ 346 æ
    o-slash OE {OE} alt-0216 Ø Ø 330 Ø
    oe {oe} | alt-0248 ø ø 370 ø
    a-ring AA {AA} ] alt-0197 Å Å 305 Å
    aa {aa} } alt-0229 å å 345 å
    a diaeresis A” “{A} [ alt-0196 Ä Ä 304 Ä
    a” “{a} { alt-0228 ä ä 344 ä
    o diaeresis O” “{O} alt-0214 Ö Ö 326 Ö
    o” “{o} | alt-0246 ö ö 366 ö

    The ISO-646 charsets for Denmark/Norway
    <http://www.kostis.net/charsets/iso646.no.html> [ iso-646-NO ] and
    Finland/Sweden <http://www.kostis.net/charsets/iso646.se.html>
    [ iso-646-SE ] are in practice obsolete, and there never existed one
    for Icelandic, but you may run into older 7-bits text files using
    them. It is to be noted that ‘Ü’ is not represented in iso-646-NO for
    Denmark/Norway.

    1.8.2 Pros and cons of the different representations

    If you have been a reader of this group for a while, you may have
    noticed that discussion about characters and their representations
    occasionally accounts for quite a bit of bandwidth. It often does not
    take more than a question about the issue from a new reader, or
    someone posting an article with an IBM character set, to get a new
    thread going on the issue. Some want to keep 7-bit ISO-646 (be aware
    that they may call it “true ASCII”, although strictly speaking, is
    not), since 7-bit codes will always get though with any setup; others
    want ISO-Latin-1 since it is more universal; and yet others promote
    digraphs as the greatest common denominator between the two.

    Some pros and cons for each set:
    Character set: Advantages: Disadvantages:
    __________________________________________________________________

    Digraphs * Requires 7-bit only * Ambiguous
    (“oe” or “o-slash”?)
    * Non-optimal compromise

    LaTeX * Non-ambiguous 7-bit * Made for typesetting;
    representation. somewhat cryptic for
    regular text.
    * Non-optimal compromise

    ISO-646-SE, * Only 7-bit “true” * Different standards
    ISO-646-DK representation. for each language
    <[]{|}> * No data loss even * Getting harder to
    with old hardware/ find font support
    software/setup. (Dying out).
    * Shadows the brace,
    sqare bracket, pipe,
    and backslash chars.

    ISO Latin 1 * Utilizes all 8 bits * Requires 8-bit clean
    (ISO-8859-1) in a byte; yet avoids connection; older
    <ÐÞÆØÅÄÖðþæøåäö..> dangerous codes. systems may cause
    * Universal for all data loss.
    Western European * May require some
    languages. setup.
    * Supported by ISO and * In case of stripping,
    MIME; true subset of becomes “FXEDVfxedv”;
    Unicode. difficult to read.

    IBM CodePages * Uses all 256 codes; * Uses all 256 codes;
    Machintosh set more characters incl. dangerous ones.
    * Often used in PC * Incompatible with
    environments such as the “de-facto” 8-bit
    BBS’es. standard ISO-8859-1

    __________________________________________________________________

    1.8.3 How do I set up support for 7-bit ISO-646 representation?
    ({|}, [])

    The ISO-646 sets are still supported via varoius fonts and translation
    filters. Possible measures to set up support for them are:
    * For the “terminal” program shipped with Windows 3.x, simply select
    “Denmark/Norway”, “Sweden” or “Finland” from the Translations item
    in the “Terminal Preferences” dialogue box.
    * For MS-Kermit, use the command “set term charcter-set language”,
    where “language” is one of “Finnish”, “Swedish”, or “Norwegian”.
    * For other DOS and Windows communication programs, visit its local
    translation tables and insert appropriate translations for ‘[‘,
    ”, ‘]’, ‘{‘, ‘|’, ‘}’.
    * For Unix based news readers, either find a ISO-646 font, or pipe
    your newsreader through one of the following commands (Provided
    the font you use is ISO-8859-1):

    Denmark/Norway: tr ‘\]{|}’ ‘330305346370345’
    Sweden/Finland: tr ‘\]{|}’ ‘326305344366345’

    For instance, in your .cshrc file, insert the following line:

    alias rn “rn | tr ‘\]{|}’ ‘330305346370345’”

    The character ‘[‘ should not be translated, because it is used in ANSI
    escape sequences.

    Note that if you use this kind of translation, you will no longer see
    any of the characters ‘]{|}’; in most cases this outweighs the
    benefits from seeing the national letters.

    1.8.4 How do I set up support for 8-bit ISO-8859-1 representation?
    (æøåäö, ÆØÅÄÖ)

    The ISO-8859-1 (Latin 1) set is currently the most common character
    representation standard on soc.culture.nordic, and is also quite
    frequent in e.g. soc.culture.german, personal e-mail etc. However, on
    many systems, the ability to view these characters is not provided as
    “default”, so you may need to configure some things on your own.
    * If you are reading news through a modem, you need to make sure
    that your modem connection is 8 data bits. (The most common
    parameters are “8N1” – 8 data bits, no parity bits, and one stop
    bit).
    * For DOS text mode communication programs, you need a ISO->IBM
    translation table. Tables for Telemate, Telix and Procomm Plus can
    be found in the file “xlate.zip”, available at various FTP sites.
    * For MS Windows ® communication programs, select an ANSI or
    ISO-Latin-1 font. For MS-Kermit, use “set term char latin”. For
    Procomm Plus for Windows, select vt220 or vt320 emulation. Be sure
    that bit 8 is not stripped.
    * For MS Windows ® you can also generate 8-bit characters globally
    by choosing “US-International” keyboard layout via the
    “International” dialogue box in the Control Panel. For instance,
    ‘ä’ (a diaeresis) is generated by pressing “a, i.e. double quote
    followed by lowercase a.
    A note to Windows programmers: Let the underlying keyboard
    drivers, run-time libararies etc. take care of keyboard input.
    Only be sure that the 8th bit is not stripped/masked away.
    * If your newsreader is UNIX-based, insert the following command in
    your .login or .profile file:

    stty -istrip pass8

    * If your modem connection is 7 bits (and cannot be changed to 8
    bits), you can have ISO-Latin-1 characters translated to “[]{|}”
    before they are sent over the modem. Pipe your reader through the
    “tr” command, similar to above:

    tr ‘306330305304326346370345344366’ ‘[\][\{|}{|’

    * If you use the “emacs” editor, version 19.x, and have a
    ISO-Latin-1 display font, insert the following line in your .emacs
    file:

    (standard-display-european t)

    Also, if you have a keyboard with international characters that
    you want to be able to use directly, or if you in another way are
    able to generate 8-bit codes directly from your keyboard, insert
    the following line:
    (set-input-mode (car (current-input-mode))
    (nth 1 (current-input-mode))
    0)
    Note that in cases where the Meta key is represented by setting
    the 8th (high) bit, (ie. if you are not using X-windows), this
    line will disable the Meta key, so you will subsequently have to
    use “ESC x” to generate “M-x”.
    Otherwise, insert the following line:

    (load-library “iso-insert”)

    A new keymap, 8859-1, has now been assigned to the key sequence
    “C-x 8”. You can assign this to another sequence, e.g. C-t, by
    inserting:

    (global-set-key “C-t” 8859-1-map)

    Some strokes from this map:
    C-x 8 d gives ð (eth)
    C-x 8 t gives þ (thorn)
    C-x 8 a e gives æ (ae)
    C-x 8 / o gives ø (o-slash)
    C-x 8 a a gives å (a-ring)
    C-x 8 ” a gives ä (a diaeresis)
    C-x 8 ” o gives ö (o diaeresis)
    C-x 8 ‘ a gives á (a acute)
    C-x 8 ‘ i gives í (i acute)

    1.8.5 References

    For an index to other literature on internationalization, try:
    <http://www.vlsivie.tuwien.ac.at/mike/i18n.html>

    I am: Tor Slettnes.
    ___________________________________
    _________________________________________________________________

    – Is the text above really reliable?
    – See the discussion in section 1.2.2!
    _________________________________________________________________

    © Copyright 1996-98 by Tor Slettnes.

    You are free to quote this page as long as you mention the URL.
    This page was last updated September 28th.

    e-mail: j…@lysator.liu.se
    s-mail: Majeldsvägen 8a, 587 31 LINKÖPING, Sweden
    www: 404 Not Found – Lysator ACS

    And in laymans terms just wtf is all that about then buddy?

    until the mid 2000s many computer systems (even those sold to Europe) struggled with any processing languages other than American English – in many countries the Communications Ministry insisted that the US vendors should provide full character sets for the local language; shortly after which the Eurozone and EU and its tech industries became more important

    æ and ø were originally used in English and exported to Denmark from Northumbria around the time of King Edwin 🙂

    Until Unicode UTF-8 (which is what is used today on sites like this) became widespread there were lots of different “standards” for encoding these extra characters which often caused problems when sending documents across borders even in countries that use the same letters. Until broadband was widespread some computers were accessed via often noisy phone lines so modem links were configured for 7 bit encoding for text as the extra bit within an octet could then be used for error checking.

    On some computers ø was allocated to a code that would (depending on what came next) clear the screen on remote terminals, make text change colour or appear in odd places, which is a nuisance and on safety critical systems could be dangerous.

    There are still many systems in use (especially for embedded industrial control systems) that use older encoding, lots of emulators for 1980s European computers and even some webpages I found online online from Denmark that use the 1980s encoding (they appear to have been left there from the very first internet service available in Copenhagen, I wasn’t even sure if I should post the links as they looked like some of the system may still be live!)

    Þ and þ are the same as Ð and ð but each can only be used in specific ways, Þ and þ can only be used at the beginning of a word, whereas Ð and ð is only used either second, mid or last as character in a words structure. Both mean ´th´.

    Soemtimes I wish I just never asked lol.

    lol

0

Voices

4

Replies

Tags

This topic has no tags

Viewing 6 posts - 1 through 6 (of 6 total)
  • You must be logged in to reply to this topic.

Forums Life Learning & Education How to convert Nordic characters on various computers.