Character Encoding Systems
I’ve long known about so-called “extended ASCII”, where you can type many characters not on the keyboard, by holding down the ALT key and then pressing numbers on the NUM pad.
Yet I would often see different lists, some with different glyphs in different places, and with “box drawing” elements even (You could see all of this when running the charmap). Usually, you type a four digit code beginning with “0” to get the character, and HTML For Dummies had the list for this one, which is what I was using. It’s also what you type in HTML using the # & ; code.
But if you leave off the 0, you get other characters, including the box drawings.
This is actually the other set. Among those, I encountered two slightly different versions, when someone had said on a page that 233 was “Ú” (capital U with acute accent), yet I kept getting Θ (Greek capital theta).
It was hard to find info on what was what, or a page comparing all of these sets. I found out that the category is what is called the “code page“. The main three are Windows CP 1252 (the one with the four digit ALT code), DOS CP 437 (the other one I was getting without the “0”, and DOS CP 850 (this one apparently is only for Western Europe).
There’s also Unicode, which maps all the glyphs that exist to a hexadecimal code, and it’s grown to use five digits. I liked this, because it includes enclosed (bulleted) alphanumerics that I can use in subway discussion to denote route bullets. (They can be typed using the HTML code with the decimal or hex Unicode number). Only not all systems are compatible with it yet.
“Circled” numbers and “negative circled” numbers (white letter on solid circle, like the actual route bullets) were already included in the “Wingdings” font, but circled letters have been added in the Unicode 2400’s block. The “Enclosed Alphanumeric Supplement” in the 1F100’s block includes the negative letters (solid bullet) letters, but these don’t even show up in any of my browsers.
Past 255, the code pages then repeat endlessly. So CP 437 #257 is ☺, just like #1, as is #513. Since CP 1252 starts with control characters, then 256-281 will be the invisible control characters, and #289 starts over with !, like #33.
Unicode for the first 256 is the same as CP1252, except for 128-159 (80-9F) which are more control characters. In CP1252, these are mostly foreign characters that are located elsewhere in Unicode.
Unicode begins its foreign and other characters with #256 (hex 100: “Ā”) and up.
So here are the three systems, with the Decimal and Hexadecimal numbers
Hex. | CP 1252 | CP 437 | CP850 | |
ALT + 0dec | ALT + dec | |||
0 | 0000 | NULL | ||
1 | 0001 | START OF HEADING | ☺ | |
2 | 0002 | START OF TEXT | ☻ | |
3 | 0003 | END OF TEXT | ♥ | |
4 | 0004 | END OF TRANSMISSION | ♦ | |
5 | 0005 | ENQUIRY | ♣ | |
6 | 0006 | ACKNOWLEDGE | ♠ | |
7 | 0007 | BELL | • | |
8 | 0008 | BACKSPACE | ◘ | |
9 | 0009 | HORIZONTAL TAB | ○ | |
10 | 000A | LINE FEED | ◙ | |
11 | 000B | VERTICAL TAB | ♂ | |
12 | 000C | FORM FEED | ♀ | |
13 | 000D | CARRIAGE RETURN | ♪ | |
14 | 000E | SHIFT OUT | ♬ | |
15 | 000F | SHIFT IN | ☼ | |
16 | 0010 | DATA LINK ESCAPE | ► | |
17 | 0011 | DEVICE CONTROL 1 | ◄ | |
18 | 0012 | DEVICE CONTROL 2 | ↕ | |
19 | 0013 | DEVICE CONTROL 3 | ‼ | |
20 | 0014 | DEVICE CONTROL 4 | ¶ | |
21 | 0015 | NEGATIVE ACKNOWL | § | |
22 | 0016 | SYNCHRONOUS IDLE | ▬ | |
23 | 0017 | END OF TRANSM. BLOCK | ↨ | |
24 | 0018 | CANCEL | ↑ | |
25 | 0019 | END OF MEDIUM | ↓ | |
26 | 001A | SUBSTITUTE | → | |
27 | 001B | ESCAPE | ← | |
28 | 001C | FILE SEPARATOR | ∟ | |
29 | 001D | GROUP SEPARATOR | ↔ | |
30 | 001E | RECORD SEPARATOR | ▲ | |
31 | 001F | UNIT SEPARATOR | ▼ | |
32 | 0020 | SPACE | ||
33 | 0021 | ! | ! | ! |
34 | 0022 | “ | “ | “ |
35 | 0023 | # | # | # |
36 | 0024 | $ | $ | $ |
37 | 0025 | % | % | % |
38 | 0026 | & | & | & |
39 | 0027 | ‘ | ‘ | ‘ |
40 | 0028 | ( | ( | ( |
41 | 0029 | ) | ) | ) |
42 | 002A | * | * | * |
43 | 002B | + | + | + |
44 | 002C | , | , | , |
45 | 002D | – | – | – |
46 | 002E | . | . | . |
47 | 002F | / | / | / |
48 | 0030 | 0 | 0 | 0 |
49 | 0031 | 1 | 1 | 1 |
50 | 0032 | 2 | 2 | 2 |
51 | 0033 | 3 | 3 | 3 |
52 | 0034 | 4 | 4 | 4 |
53 | 0035 | 5 | 5 | 5 |
54 | 0036 | 6 | 6 | 6 |
55 | 0037 | 7 | 7 | 7 |
56 | 0038 | 8 | 8 | 8 |
57 | 0039 | 9 | 9 | 9 |
58 | 003A | : | : | : |
59 | 003B | ; | ; | ; |
60 | 003C | < | < | < |
61 | 003D | = | = | = |
62 | 003E | > | > | > |
63 | 003F | ? | ? | ? |
64 | 0040 | @ | @ | @ |
65 | 0041 | A | A | A |
66 | 0042 | B | B | B |
67 | 0043 | C | C | C |
68 | 0044 | D | D | D |
69 | 0045 | E | E | E |
70 | 0046 | F | F | F |
71 | 0047 | G | G | G |
72 | 0048 | H | H | H |
73 | 0049 | I | I | I |
74 | 004A | J | J | J |
75 | 004B | K | K | K |
76 | 004C | L | L | L |
77 | 004D | M | M | M |
78 | 004E | N | N | N |
79 | 004F | O | O | O |
80 | 0050 | P | P | P |
81 | 0051 | Q | Q | Q |
82 | 0052 | R | R | R |
83 | 0053 | S | S | S |
84 | 0054 | T | T | T |
85 | 0055 | U | U | U |
86 | 0056 | V | V | V |
87 | 0057 | W | W | W |
88 | 0058 | X | X | X |
89 | 0059 | Y | Y | Y |
90 | 005A | Z | Z | Z |
91 | 005B | [ | [ | [ |
92 | 005C | \ | \ | \ |
93 | 005D | ] | ] | ] |
94 | 005E | ^ | ^ | ^ |
95 | 005F | _ | _ | _ |
96 | 0060 | ` | ` | ` |
97 | 0061 | a | a | a |
98 | 0062 | b | b | b |
99 | 0063 | c | c | c |
100 | 0064 | d | d | d |
101 | 0065 | e | e | e |
102 | 0066 | f | f | f |
103 | 0067 | g | g | g |
104 | 0068 | h | h | h |
105 | 0069 | i | i | I |
106 | 006A | j | j | j |
107 | 006B | k | k | k |
108 | 006C | l | l | l |
109 | 006D | m | m | m |
110 | 006E | n | n | n |
111 | 006F | o | o | o |
112 | 0070 | p | p | p |
113 | 0071 | q | q | q |
114 | 0072 | r | r | r |
115 | 0073 | s | s | s |
116 | 0074 | t | t | t |
117 | 0075 | u | u | u |
118 | 0076 | v | v | v |
119 | 0077 | w | w | w |
120 | 0078 | x | x | x |
121 | 0079 | y | y | y |
122 | 007A | z | z | z |
123 | 007B | { | { | { |
124 | 007C | | | | | | |
125 | 007D | } | } | } |
126 | 007E | ~ | ~ | ~ |
127 | 007F | DELETE | DEL or ⌂ | DEL |
128 | 0080 | € | Ç | Ç |
129 | 0081 | unused | ü | ü |
130 | 0082 | ‚ | é | é |
131 | 0083 | ƒ | â | â |
132 | 0084 | „ | ä | ä |
133 | 0085 | … | à | à |
134 | 0086 | † | å | å |
135 | 0087 | ‡ | ç | ç |
136 | 0088 | ˆ | ê | ê |
137 | 0089 | ‰ | ë | ë |
138 | 008A | Š | è | è |
139 | 008B | ‹ | ï | ï |
140 | 008C | Œ | î | î |
141 | 008D | unused | ì | ì |
142 | 008E | Ž | Ä | Ä |
143 | 008F | unused | Å | Å |
144 | 0090 | unused | É | É |
145 | 0091 | ‘ | æ | æ |
146 | 0092 | ’ | Æ | Æ |
147 | 0093 | “ | ô | ô |
148 | 0094 | ” | ö | ö |
149 | 0095 | • | ò | ò |
150 | 0096 | – | û | û |
151 | 0097 | — | ù | ù |
152 | 0098 | ˜ | ÿ | ÿ |
153 | 0099 | ™ | Ö | Ö |
154 | 009A | š | Ü | Ü |
155 | 009B | › | ¢ | ø |
156 | 009C | œ | £ | £ |
157 | 009D | unused | ¥ | Ø |
158 | 009E | ž | ₧ | × |
159 | 009F | Ÿ | ƒ | ƒ |
160 | 00A0 | NO-BREAK SP | á | á |
161 | 00A1 | ¡ | í | í |
162 | 00A2 | ¢ | ó | ó |
163 | 00A3 | £ | ú | ú |
164 | 00A4 | ¤ | ñ | ñ |
165 | 00A5 | ¥ | Ñ | Ñ |
166 | 00A6 | ¦ | ª | ª |
167 | 00A7 | § | º | º |
168 | 00A8 | ¨ | ¿ | ¿ |
169 | 00A9 | © | ⌐ | ® |
170 | 00AA | ª | ¬ | ¬ |
171 | 00AB | « | ½ | ½ |
172 | 00AC | ¬ NOT SIGN | ¼ | ¼ |
173 | 00AD | SOFT HYPHEN | ¡ | ¡ |
174 | 00AE | ® | « | « |
175 | 00AF | ¯ | » | » |
176 | 00B0 | ° | ░ | ░ |
177 | 00B1 | ± | ▒ | ▒ |
178 | 00B2 | ² | ▓ | ▓ |
179 | 00B3 | ³ | │ | │ |
180 | 00B4 | ´ | ┤ | ┤ |
181 | 00B5 | µ | ╡ | Á |
182 | 00B6 | ¶ | ╢ | Â |
183 | 00B7 | · | ╖ | À |
184 | 00B8 | ¸ | ╕ | © |
185 | 00B9 | ¹ | ╣ | ╣ |
186 | 00BA | º | ║ | ║ |
187 | 00BB | » | ╗ | ╗ |
188 | 00BC | ¼ | ╝ | ╝ |
189 | 00BD | ½ | ╜ | ¢ |
190 | 00BE | ¾ | ╛ | ¥ |
191 | 00BF | ¿ | ┐ | ┐ |
192 | 00CO | À | └ | └ |
193 | 00C1 | Á | ┴ | ┴ |
194 | 00C2 | Â | ┬ | ┬ |
195 | 00C3 | Ã | ├ | ├ |
196 | 00C4 | Ä | ─ | ─ |
197 | 00C5 | Å | ┼ | ┼ |
198 | 00C6 | Æ | ╞ | ã |
199 | 00C7 | Ç | ╟ | Ã |
200 | 00C8 | È | ╚ | ╚ |
201 | 00C9 | É | ╔ | ╔ |
202 | 00CA | Ê | ╩ | ╩ |
203 | 00CB | Ë | ╦ | ╦ |
204 | 00CC | Ì | ╠ | ╠ |
205 | 00CD | Í | ═ | ═ |
206 | 00CE | Î | ╬ | ╬ |
207 | 00CF | Ï | ╧ | ¤ |
208 | 00D0 | Ð | ╨ | ð |
209 | 00D1 | Ñ | ╤ | Ð |
210 | 00D2 | Ò | ╥ | Ê |
211 | 00D3 | Ó | ╙ | Ë |
212 | 00D4 | Ô | ╘ | È |
213 | 00D5 | Õ | ╒ | ı / € (modified) |
214 | 00D6 | Ö | ╓ | Í |
215 | 00D7 | × | ╫ | Î |
216 | 00D8 | Ø | ╪ | Ï |
217 | 00D9 | Ù | ┘ | ┘ |
218 | 00DA | Ú | ┌ | ┌ |
219 | 00DB | Û | █ | █ |
220 | 00DC | Ü | ▄ | ▄ |
221 | 00DD | Ý | ▌ | ¦ |
222 | 00DE | Þ | ▐ | Ì |
223 | 00DF | ß | ▀ | ▀ |
224 | 00E0 | à | α | Ó |
225 | 00E1 | á | ß | ß |
226 | 00E2 | â | Γ | Ô |
227 | 00E3 | ã | π | Ò |
228 | 00E4 | ä | Σ | õ |
229 | 00E5 | å | σ | Õ |
230 | 00E6 | æ | µ | µ |
231 | 00E7 | ç | τ | þ |
232 | 00E8 | è | Φ | Þ |
233 | 00E9 | é | Θ | Ú |
234 | 09EA | ê | Ω | Û |
235 | 00EB | ë | δ | Ù |
236 | 00EC | ì | ∞ | ý |
237 | 00ED | í | φ | Ý |
238 | 00EE | î | ε | ¯ |
239 | 00EF | ï | ∩ | ´ |
240 | 00F0 | ð | ≡ | soft hyphen |
241 | 00F1 | ñ | ± | ± |
242 | 00F2 | ò | ≥ | ‗ |
243 | 00F3 | ó | ≤ | ¾ |
244 | 00D4 | ô | ⌠ | ¶ |
245 | 00F5 | õ | ⌡ | § |
246 | 00F6 | ö | ÷ | ÷ |
247 | 00F7 | ÷ | ≈ | ¸ |
248 | 00F8 | ø | ° | ° |
249 | 00F9 | ù | ∙ (larger) | ¨ |
250 | 00FA | ú | · (small) | · |
251 | 00FB | û | √ | ¹ |
252 | 00FC | ü | ⁿ | ³ |
253 | 00FD | ý | ² | ² |
254 | 00FE | þ | ■ | ■ |
255 | 00FF | ÿ | NBSP | No-break Space |