• Character Maps + Unicode (97+)

    Author
    Topic
    #371194

    The recent post on inserting the small, raised o (degree symbol) reminded me of a question I’ve wanted to ask (perhaps this is more of an Office question but I’ll ask it here since the other post was on the Word lounge).

    Is there some material that clearly explains the organization of the material in the Insert Symbol | Symbols dialog with the Font and Subset drop down boxes? How was it known where the degree symbol was? (I would have done it the hard way.) When I scroll the character map far enough, I get a different subset. Why is it that the Font drop down only has a few “fonts” whereas I have many fonts installed, as witnessed by the font drop down in Word itself?

    Is there some relationship to the information in this character map to unicode coding? I remember Klaus Linke would always have the answer to what the coding was for a unicode symbol. Where did this info come from?

    confused

    TIA

    Fred

    Viewing 1 reply thread
    Author
    Replies
    • #589407

      Hi Fred,

      That’s a lot of questions…

      First off, there are two kinds of fonts: regular fonts and decorative fonts.

      All regular fonts use the same codes for the same characters (Unicode).
      The characters are organized in subsets (code pages) for easier reference.
      Since the designers of Unicode wanted to keep upgrading to Unicode easy, they kept the old ASCII codes (1 to 127) and Latin-1 codes (160-255). Therefore you find the degree sign in the subset “Latin-1”, and not in a subset with a more descriptive name (BTW, there are special Unicode characters for

      • #589414

        Hi Klaus,

        Long time no hear. Hope all is well.

        Let’s omit decorative fonts from further discussion. I’m not overly concerned with them bcs, IMHO:
        – I’m not sure why one would use these much. Maybe put in a blank box on a paper form for a checkbox? Maybe a special bullet character when trying to be fancy? That’s about all I’ve used these for.

        SO for regular fonts: what is the purpose of Word showing a font called “(normal text)” w/o the quotes with a number of subsets and then having additional fonts like “Arial”, “TNR”, etc? They all have the same subsets.

        Still not sure why the font box shows only a subset of installed fonts. Is the answer in your next to last para? (the performance issues). But why bother at all in Word 2002 if the subsets are all the same? Why can’t I select a symbol and have its appearance in the form of the current font?

        >The characters are organized in subsets (code pages) for easier reference.
        certainly one might find the subset names descriptive as a guide in searching. However, I wouldn’t know the diff between Latin Extended-A and …-B. I never (hardly ever?) use the subset drop down.
        What is a WGL4 font? never heard the term.

        Fred

        • #589443

          > Long time no hear. Hope all is well.

          Yes, thank you. Too much work and no play makes Klaus a dull boy.

          > re “(normal text)”

          I never understood why MS didn’t call that “(current font)”. It’s the font at the start of the current selection (that is, the font that the text would get if you would type something).

          The characters you see in the “Insert > Symbol” dialog, and the subsets, depend on the font you choose. The WGL4 fonts (see below) all contain more or less the same characters, so you won’t see many differences between “Arial”, “Times New Roman”, …

          BTW, you can type in any font name of a font that is installed into the font box of the dialog, and it will display the available characters (not just those listed in the dropdown).

          > re: why are not all fonts listed? / performance issues

          When you open the “Insert > Symbol” dialog for the first time, Word has to make a list of all installed fonts, look up each single character in every font, and create the table of available characters and subsets. So it often takes a few seconds.

          >re: subsets/code pages

          Some additional information on the subsets is for example available from the http://www.unicode.org website, or in the printed version. Latin Extended-A and -B contain for example additional latin characters used in different languages, with characters for European languages in Latin Extended-A, and for African and more exotic languages in -B.
          Somewhere on the Unicode web site, there is a text file (Namelis.txt, about 600 kB) for download, which lists the codes and names of the characters, alternative names, some indication of usage, and similar characters. It’s nice to have if you often need special characters, and have an idea of how they are called (For example, you could have searched this file for “degree” to get the code of the degree sign). Once you know the code, they are pretty easy to locate in the “Insert > Symbol” dialog, because they appear in sorted order, and if you select a character in the dialog, the code is shown in the status bar of the document window (great idea, isn’t it?).

          > re: Why can’t I select a symbol and have its appearance in the form of the current font?

          Because most fonts only contain 200-600 of the 38.885 characters in Unicode Version 2 (and even more in Version 3). That is why I suggested to choose “Arial Unicode MS” from the dropdown if you have it installed, to see a full list of all Unicode characters.

          > re: WGL4 fonts
          MS (respectively Monotype who supplies the fonts) used to have different fonts for the western, russian, greek, turkish … versions of Windows. They were combined into single fonts with about 600 characters pretty early on. Earlier versions of Windows/Word like Win 3.11/Word6 would just take the characters needed for the current language out of the font file, and give this “imaginary” font a new name. So in Word6 or Word95, you used to see fonts “Arial Cyr” or “Arial Baltic” or “Times New Roman Tur” … (at least this is how I remember it).
          MS calls them WGL4 fonts (Windows Glyph List 4). The link to Alan Wood’s Unicode website I gave in my last post has more information on those.

          Have to get back to work,
          cheersKlaus

          • #589501

            Hi Klaus,

            Thks for the insights. You certainly are a fountain of knowledge on fonts. I had checked the unicode site about a month or 2 ago but couldn’t get very far.

            If I understood correctly, it sounds like what I would call “normal fonts” (eg, Arial, TNR) could actually have a few different characters in their maps (“more or less the same characters, so you won’t see many differences”). Hence the justification for listing them? If this is so, why have “(normal text)”? There may be nothing normal.

            I’m working on my 97 machine at home right now. I did type in a font name and it did show me the map for it. Also, “normal text” seems to be whatever font was usd at the insertion point (is that what you said). Strangely enough, the set of fonts listed in Insert Symbol did not include Arial or TNR here but they were included in the list at work on a 2000 machine. This seems odd.

            Fred

      • #589549

        Thanks for the link to Alan Wood’s page.

        It was news to me that Word 2002 had the ALT-X feature that toggles between displaying the character and displaying its Unicode value.

        I notice the feature can be fooled though — if I type “23” then press ALT-X, I don’t see the Unicode value for “3”, I see the pound sign (Unicode 0023).

    • #589544

      A slight tangent… There are alternatives to the Character Map that appears with “Insert”…”Symbol”. One is the Extended Character Map, which shows all fonts available, and is bigger (more readable) on screen. It is not a Word add-in, but can be used with Word. Freeware. http://aritechdev.hypermart.net/ecm.htm%5B/url%5D

      • #589692

        > character map/Extended Character Map

        With Word2000, I use Microsoft Visual Keyboard, a utility that shows you the current keyboard layout.

        It’s window is resizable, and you can send characters directly into your application.

        But you have to be aware that both this and Character map only show you a miniscule part of all the characters that are available. But if you do multilingual documents, it’s nice to have: If you type russian, it shows you the keys for the cyrillic letters, same for greek…

        Microsoft has licensed this utility from another firm. If you use Word2002, you can try the “On Screen Keyboard” from the same programmers, from the Windows XP Accessibility menu.

        cheersKlaus

    Viewing 1 reply thread
    Reply To: Character Maps + Unicode (97+)

    You can use BBCodes to format your content.
    Your account can't use all available BBCodes, they will be stripped before saving.

    Your information: