Skip to content

Character Encoding Systems

April 19, 2014

I’ve long known about so-called “extended ASCII”, where you can type many characters not on the keyboard, by holding down the ALT key and then pressing numbers on the NUM pad.

Yet I would often see different lists, some with different glyphs in different places, and with “box drawing” elements even (You could see all of this when running the charmap). Usually, you type a four digit code beginning with “0” to get the character, and HTML For Dummies had the list for this one, which is what I was using. It’s also what you type in HTML using the # & ; code.

But if you leave off the 0, you get other characters, including the box drawings.
This is actually the other set. Among those, I encountered two slightly different versions, when someone had said on a page that 233 was “Ú” (capital U with acute accent), yet I kept getting Θ (Greek capital theta).

It was hard to find info on what was what, or a page comparing all of these sets. I found out that the category is what is called the “code page“. The main three are Windows CP 1252 (the one with the four digit ALT code), DOS CP 437 (the other one I was getting without the “0”, and DOS CP 850 (this one apparently is only for Western Europe).

There’s also Unicode, which maps all the glyphs that exist to a hexadecimal code, and it’s grown to use five digits. I liked this, because it includes enclosed (bulleted) alphanumerics that I can use in subway discussion to denote route bullets. (They can be typed using the HTML code with the decimal or hex Unicode number). Only not all systems are compatible with it yet.

“Circled” numbers and “negative circled” numbers (white letter on solid circle, like the actual route bullets) were already included in the “Wingdings” font, but circled letters have been added in the Unicode 2400’s block. The “Enclosed Alphanumeric Supplement” in the 1F100’s block includes the negative letters (solid bullet) letters, but these don’t even show up in any of my browsers.

Past 255, the code pages then repeat endlessly. So CP 437 #257 is ☺, just like #1, as is #513. Since CP 1252 starts with control characters, then 256-281 will be the invisible control characters, and #289 starts over with !, like #33.
Unicode for the first 256 is the same as CP1252, except for 128-159 (80-9F) which are more control characters. In CP1252, these are mostly foreign characters that are located elsewhere in Unicode.
Unicode begins its foreign and other characters with #256 (hex 100: “Ā”) and up.

So here are the three systems, with the Decimal and Hexadecimal numbers

Hex. CP 1252 CP 437 CP850
    ALT + 0dec ALT + dec  
0 0000 NULL
1 0001 START OF HEADING
2 0002 START OF TEXT
3 0003 END OF TEXT
4 0004 END OF TRANSMISSION
5 0005 ENQUIRY
6 0006 ACKNOWLEDGE
7 0007 BELL
8 0008 BACKSPACE
9 0009 HORIZONTAL TAB
10 000A LINE FEED
11 000B VERTICAL TAB
12 000C FORM FEED
13 000D CARRIAGE RETURN
14 000E SHIFT OUT
15 000F SHIFT IN
16 0010 DATA LINK ESCAPE
17 0011 DEVICE CONTROL 1
18 0012 DEVICE CONTROL 2
19 0013 DEVICE CONTROL 3
20 0014 DEVICE CONTROL 4
21 0015 NEGATIVE ACKNOWL §
22 0016 SYNCHRONOUS IDLE
23 0017 END OF TRANSM. BLOCK
24 0018 CANCEL
25 0019 END OF MEDIUM
26 001A SUBSTITUTE
27 001B ESCAPE
28 001C FILE SEPARATOR
29 001D GROUP SEPARATOR
30 001E RECORD SEPARATOR
31 001F UNIT SEPARATOR
32 0020 SPACE
33 0021 ! ! !
34 0022
35 0023 # # #
36 0024 $ $ $
37 0025 % % %
38 0026 & & &
39 0027
40 0028 ( ( (
41 0029 ) ) )
42 002A * * *
43 002B + + +
44 002C , , ,
45 002D
46 002E . . .
47 002F / / /
48 0030 0 0 0
49 0031 1 1 1
50 0032 2 2 2
51 0033 3 3 3
52 0034 4 4 4
53 0035 5 5 5
54 0036 6 6 6
55 0037 7 7 7
56 0038 8 8 8
57 0039 9 9 9
58 003A : : :
59 003B ; ; ;
60 003C < < <
61 003D = = =
62 003E > > >
63 003F ? ? ?
64 0040 @ @ @
65 0041 A A A
66 0042 B B B
67 0043 C C C
68 0044 D D D
69 0045 E E E
70 0046 F F F
71 0047 G G G
72 0048 H H H
73 0049 I I I
74 004A J J J
75 004B K K K
76 004C L L L
77 004D M M M
78 004E N N N
79 004F O O O
80 0050 P P P
81 0051 Q Q Q
82 0052 R R R
83 0053 S S S
84 0054 T T T
85 0055 U U U
86 0056 V V V
87 0057 W W W
88 0058 X X X
89 0059 Y Y Y
90 005A Z Z Z
91 005B [ [ [
92 005C \ \ \
93 005D ] ] ]
94 005E ^ ^ ^
95 005F _ _ _
96 0060 ` ` `
97 0061 a a a
98 0062 b b b
99 0063 c c c
100 0064 d d d
101 0065 e e e
102 0066 f f f
103 0067 g g g
104 0068 h h h
105 0069 i i I
106 006A j j j
107 006B k k k
108 006C l l l
109 006D m m m
110 006E n n n
111 006F o o o
112 0070 p p p
113 0071 q q q
114 0072 r r r
115 0073 s s s
116 0074 t t t
117 0075 u u u
118 0076 v v v
119 0077 w w w
120 0078 x x x
121 0079 y y y
122 007A z z z
123 007B { { {
124 007C | | |
125 007D } } }
126 007E ~ ~ ~
127 007F  DELETE DEL or ⌂ DEL
128 0080 Ç Ç
129 0081  unused ü ü
130 0082 é é
131 0083 ƒ â â
132 0084 ä ä
133 0085 à à
134 0086 å å
135 0087 ç ç
136 0088 ˆ ê ê
137 0089 ë ë
138 008A Š è è
139 008B ï ï
140 008C Œ î î
141 008D  unused ì ì
142 008E Ž Ä Ä
143 008F  unused Å Å
144 0090  unused É É
145 0091 æ æ
146 0092 Æ Æ
147 0093 ô ô
148 0094 ö ö
149 0095 ò ò
150 0096 û û
151 0097 ù ù
152 0098 ˜ ÿ ÿ
153 0099 Ö Ö
154 009A š Ü Ü
155 009B ¢ ø
156 009C œ £ £
157 009D  unused ¥ Ø
158 009E ž ×
159 009F Ÿ ƒ ƒ
160 00A0 NO-BREAK SP á á
161 00A1 ¡ í í
162 00A2 ¢ ó ó
163 00A3 £ ú ú
164 00A4 ¤ ñ ñ
165 00A5 ¥ Ñ Ñ
166 00A6 ¦ ª ª
167 00A7  § º º
168 00A8 ¨ ¿ ¿
169 00A9 © ®
170 00AA ª ¬ ¬
171 00AB « ½ ½
172 00AC ¬ NOT SIGN ¼ ¼
173 00AD  SOFT HYPHEN ¡ ¡
174 00AE ® « «
175 00AF ¯ » »
176 00B0 °
177 00B1 ±
178 00B2 ²
179 00B3 ³
180 00B4 ´
181 00B5 µ Á
182 00B6 Â
183 00B7  · À
184 00B8 ¸ ©
185 00B9 ¹
186 00BA º
187 00BB »
188 00BC ¼
189 00BD ½ ¢
190 00BE ¾ ¥
191 00BF ¿
192 00CO À
193 00C1 Á
194 00C2 Â
195 00C3 Ã
196 00C4 Ä
197 00C5 Å
198 00C6 Æ ã
199 00C7 Ç Ã
200 00C8 È
201 00C9 É
202 00CA Ê
203 00CB Ë
204 00CC Ì
205 00CD Í
206 00CE Î
207 00CF Ï ¤
208 00D0 Ð ð
209 00D1 Ñ Ð
210 00D2 Ò Ê
211 00D3 Ó Ë
212 00D4 Ô È
213 00D5 Õ ı  / € (modified)
214 00D6 Ö Í
215 00D7 × Î
216 00D8 Ø Ï
217 00D9 Ù
218 00DA Ú
219 00DB Û
220 00DC Ü
221 00DD Ý ¦
222 00DE Þ Ì
223 00DF ß
224 00E0 à α Ó
225 00E1 á ß ß
226 00E2 â Γ Ô
227 00E3 ã π Ò
228 00E4 ä Σ õ
229 00E5 å σ Õ
230 00E6 æ µ µ
231 00E7 ç τ þ
232 00E8 è Φ Þ
233 00E9 é Θ Ú
234 09EA ê Ω Û
235 00EB ë δ Ù
236 00EC ì ý
237 00ED í φ Ý
238 00EE î ε ¯
239 00EF ï ´
240 00F0 ð soft hyphen
241 00F1 ñ ± ±
242 00F2 ò
243 00F3 ó ¾
244 00D4 ô
245 00F5 õ §
246 00F6 ö ÷ ÷
247 00F7 ÷ ¸
248 00F8 ø ° °
249 00F9 ù ∙ (larger) ¨
250 00FA ú · (small) ·
251 00FB û ¹
252 00FC ü ³
253 00FD ý ² ²
254 00FE þ
255 00FF ÿ NBSP  No-break Space
Advertisements
Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: