1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462
|
The Linux Console Tools
Yann Dirson, dirson@debian.org
11 August 1998
____________________________________________________________
Table of Contents
1. Status of this document
1.1 Other documents
2. What the Linux Console Tools are
3. Understanding the big picture of the console
4. What is Unicode
5. Understanding and setting up the keyboard driver
5.1 How it works
5.2 See also
6. Understanding and setting up the screen driver
6.1 Unicode is everywhere
6.1.1 Screen Font Maps
6.1.2 SFM Fallback tables
6.2 The unicode screen-mode
6.3 The byte screen-mode
6.3.1 Charset slots
6.4 Special UCS2 codes
6.5 About the old 8-bit ``screen maps''
6.6 See also
7. The libraries
7.1 libconsole,
7.2 libcfont,
7.3 libctutils,
8. The future of the console driver and of the Linux Console Tools
______________________________________________________________________
11.. SSttaattuuss ooff tthhiiss ddooccuummeenntt
This is an introduction to the Linux Console Tools package. You
should refer to the manpages for more details.
11..11.. OOtthheerr ddooccuummeennttss
Files in the doc/contrib/ directory are unsupported, and may be
obsolete, but are provided just in case someone needs them.
README.{acm,sfm,keytables} give some info on the respective included
data files.
kbd.FAQ.* is the Console and Keyboard HOWTO by Andries Brouwer, as
included in kbd 0.96a. It would need some corrections, though.
22.. WWhhaatt tthhee LLiinnuuxx CCoonnssoollee TToooollss aarree
The Linux Console Tools are a set of programs allowing the user to
setup/customize your console (restricted meaning: screen + keyboard
only). It is derived from version 0.94 of the kbd package, and has
benefited from most (hopefully all) features introduced in kbd until
version 0.96a.
The Linux Console Tools are still under development, but using it just
as a replacement for kbd should be quite safe, at it fixes many bugs
kbd has.
33.. UUnnddeerrssttaannddiinngg tthhee bbiigg ppiiccttuurree ooff tthhee ccoonnssoollee
The console driver is currently made of 2 sub-drivers: the keyboard
driver, and the screen driver. Basically, the keyboard driver sends
characters to your application, then the application does its own job,
and sends to the screen driver the characters to be displayed.
44.. WWhhaatt iiss UUnniiccooddee
Traditionnaly, character encodings use 8 bits, and thus are limited to
256 characters. This causes problems because:
1. it's not enough for some languages;
2. people speaking languages using different encodings have to choose
which one they use, and have to switch the system's state when
changing the language, which makes it difficult to mix several
languages in the same file;
3. etc...
Thus the UCS (Universal Character Set), also know as _U_n_i_c_o_d_e was
created to handle and mix all of our world's scripts. This is a
32-bit (4 bytes) encoding, otherwise known as UCS4 because of the size
of its characters, which is normalised by ISO as the 10646-1 standard.
The most widely used characters from UCS are contained in the UCS2
16-bit subset of UCS; this is the subset used by the Linux console.
For convenience, the UTF8 encoding was designed as a variable-length
encoding (with 8 bytes of maximum length) with ASCII compatibility;
all chars that have a UCS4 encoding can be expressed as a UTF8
sesquence, and vice-versa.
The Unicode consortium <http://unicode.org> defines additional
properties for UCS2 characters.
See: unicode(7), utf-8(7).
55.. UUnnddeerrssttaannddiinngg aanndd sseettttiinngg uupp tthhee kkeeyybbooaarrdd ddrriivveerr
55..11.. HHooww iitt wwoorrkkss
The keyboard driver is made up several levels:
+o the keyboard hardware, which turns the user's finger moves into so-
called _s_c_a_n_c_o_d_e_s (Disclaimer: this is not really part of the
software driver itself; no support is provided for bugs in this
domain ;-). An event (key pressed or released) generates from 1 to
6 _s_c_a_n_c_o_d_e_s.
+o a mechanism turning _s_c_a_n_c_o_d_e_s into _k_e_y_c_o_d_e_s using a translation-
table which you can access with the getkeycodes(8) and
setkeycodes(8) utilities. You will only need to look at that if
you have some sort of non-standard (or programmable ?) keys on your
keyboard. AFAIK, these keycodes are the same among a set of
keyboards sharing the same hardware, but differing in the symbols
drawn on the keys.
+o a mechanism turning _k_e_y_c_o_d_e_s into _c_h_a_r_a_c_t_e_r_s using a _k_e_y_m_a_p. You
can access this _k_e_y_m_a_p using the loadkeys(1) and dumpkeys(1)
utilities.
The keyboard driver can be in one of 4 modes (which you can access
using kbd_mode(1)), which will influence what type of data
applications will get as keyboard input:
+o the scancode (K_RAW) mode, in which the application gets scancodes
for input. It is used by applications that implement their own
keyboard driver. For example, X11 does that.
+o the keycode (K_MEDIUMRAW) mode, in which the application gets
information on which keys (identified by their keycodes) get
pressed and released. AFAIK, no real-life application uses this
mode.
+o the ASCII (K_XLATE) mode, in which the application effectively gets
the characters as defined by the _k_e_y_m_a_p, using an 8-bit encoding.
In this mode, the Ascii_0 to Ascii_9 keymap symbols allow to
compose characters by giving their decimal 8bit-code, and Hex_0 to
Hex_F do the same with (2-digit) hexadecimal codes.
+o the Unicode (K_UNICODE) mode, which at this time only differs from
the ASCII mode by allowing the user to compose UTF8 unicode
characters by their decimal value, using Ascii_0 to Ascii_9 (who
needs that ?), or their hexadecimal (4-digit) value, using Hex_0 to
Hex_9. A keymap can be set up to produce UTF8 sequences (with a
U+XXXX pseudo-symbol, where each X is an hexadecimal digit), but be
warned that these UTF8 sequences will also be produced even in
ASCII mode. I think this is a bug in the kernel.
BBEE WWAARRNNEEDD that putting the keyboard in RAW or MEDIUMRAW mode will make
it unusable for most applications. Use showkey(1) to get a demo of
these special modes, or to find out what scancodes/keycodes are
produced by a specific key.
55..22.. SSeeee aallssoo
keytables(5), setleds(1), setmetamode(1).
66.. UUnnddeerrssttaannddiinngg aanndd sseettttiinngg uupp tthhee ssccrreeeenn ddrriivveerr
66..11.. UUnniiccooddee iiss eevveerryywwhheerree
66..11..11.. SSccrreeeenn FFoonntt MMaappss
In recent (as of 1998/08/11) kernels, the screen driver is based on
16-bit unicode (UCS2) encoding, which means that every console-font
loaded sshhoouulldd be defined using a _u_n_i_c_o_d_e _S_c_r_e_e_n _F_o_n_t _M_a_p (SFM for
short), which tells, for each character in the font, the list of UCS2
characters it will render. (-- SFM's were formerly called ``Unicode
Map'', or ``unimap'' for short, but this term should be dropped, as
now what they called ``screen maps'' uses Unicode as well: it probably
confuses many many people--)
66..11..22.. SSFFMM FFaallllbbaacckk ttaabblleess
Starting with release 1997.11.13 of the Linux Console Tools,
consolechars(8) now understands _S_F_M _f_a_l_l_b_a_c_k _t_a_b_l_e_s. Before that,
SFM's should contain at the same time the Unicode of the characters it
was primarily meant to render, as well as any approximations the user
would like to. These fallback tables allow to only put the primary
mappings in the SFM provided with the font-file, and to _s_e_p_a_r_a_t_e_l_y
keep a list telling _`_`_i_f _n_o _g_l_y_p_h _f_o_r _t_h_a_t _c_h_a_r_a_c_t_e_r _i_s _a_v_a_i_l_a_b_l_e _i_n
_t_h_e _c_u_r_r_e_n_t _f_o_n_t_, _t_h_e_n _t_r_y _t_o _d_i_s_p_l_a_y _i_t _w_i_t_h _t_h_e _g_l_y_p_h _f_o_r _t_h_i_s _o_n_e_,
_o_r _e_l_s_e _t_h_e _o_n_e _f_o_r _t_h_a_t _o_n_e_, _o_r _._._._'_'. This permits to keep in one
only place all possible fallbacks, and everyone will be able to choose
which fallback tables (s)he wants. Have a look at
data/consoletrans/*.fallback for examples.
A fallback-table file is made of fallback entries, each entry being on
its own line. Empty lines, and lines beginning with the # comment
character are ignored.
A fallback entry is a series of 2 or more UCS2 codes. The first one is
the character for which we want a glyph; the following ones are those
whose glyph we want to use when no glyph designed specially for our
character is available. The order of the codes defines a priority
order (own glyph if available, then second char's, then the third's,
etc.)
If a SFM was to be loaded, fallback mappings are added to this map
before it is loaded. If there was not (ie. a font without SFM was
loaded, and no --sfm option was given to consolechars, or the --force-
no-sfm option was given), then the current SFM is requested from the
kernel, the fallback mappings are added, and the resulting SFM is
loaded back into the kernel.
Note that each fallback entry is checked against the original SFM, not
against the SFM we get by adding former fallback entries to the
original SFM (the one read from a file, or given by the kernel); this
applies even to entries in different files, and thus the order of -k
options has no effect. If you want some entries to be influenced by
previous ones, you will have to use different fallback files, and to
load them with several consecutive invocations of consolechars -k.
66..22.. TThhee uunniiccooddee ssccrreeeenn--mmooddee
There are basically 2 screen-modes (byte mode and UTF mode). The
simpler to explain is the UTF mode, in which the bytes received from
the application (ie. written to the console screen) are interpreted as
UTF8 sequences, which are converted in the ``equivalent UCS2 codes'',
and then looked-up in the SFM to determine the glyphs used to display
each character.
Switching to and from UTF mode is done by sending to the screen the
escape sequences <ESC>%G and <ESC>%@ respectively. You may use the
unicode_start(1) and unicode_stop(1) scripts instead, as they also
change the keyboard mode, and let you optionally change the screen-
font.
Use vt-is-UTF8(1) to find out whether active VT is in UTF mode.
66..33.. TThhee bbyyttee ssccrreeeenn--mmooddee
The byte mode is a bit more complicated, as it uses an additional map
to transform the byte-characters sent by the application into UCS2
characters, which are then treated as told above. This map I call the
Application Charset Map (ACM), because it defines the encoding the
application uses, but it used to be called a ``screen map'', or
``console map'' (this comes from the time where the screen driver
didn't use Unicode, and there was only one Map down there).
Although there is only one ACM active at a given time, there are 4 of
them at any time in the kernel; 3 of them are built-in and never
change, and they define the IBM codepage 437 (the i386's default, and
thus the kernel's default even on other archs), the DEC VT100 charset,
and the ISO latin1 charset; the 4th is user-definable, and defaults on
boot to the ``straight to font'' mapping, decribed below under
``Special UCS2 codes''.
The consolechars(1) command can be used to change the ACM, as well as
the font and its associated SFM.
66..33..11.. CChhaarrsseett sslloottss
The Linux Console Driver has 2 slots for charsets, labeled _G_0 and _G_1.
Each of these slots contains a reference to one of the 4 kernel ACMs,
3 of which are predefined to provide the _c_p_4_3_7, _i_s_o_0_1, and _v_t_1_0_0
_g_r_a_p_h_i_c_s charsets. The 4th one is user-definable; this is the one you
can set with consolechars --acm and get with consolechars --old-acm.
Versions of the Linux Console Tools prior to 1998.08.11, as well as
all versions of kbd at least until 0.96a, were always assuming you
wanted to use the G0 slot, pointing to the user-defined ACM. You can
now use the charset utility to tune your charset slots.
You will note that, although each VT has its own slot settings, there
is only one user-defined ACM for use by all the VTs. That is, whereas
you can have tty1 using _G_0_=_c_p_4_3_7 and _G_1_=_v_t_1_0_0, at the same time as
tty2 using _G_0_=_i_s_o_0_1 and _G_1_=_i_s_o_0_2 (user-defined), you ccaannnnoott have at
the same time tty1 using _i_s_o_0_2 and tty2 using _i_s_o_0_3. This is a
limitation of the linux kernel.
Note that you can emulate such a setting using the filterm utility,
with your console in UTF8-mode, by telling filterm to translate screen
output on-the-fly to UTF8.
You'll find ffiilltteerrmm in the kkoonnwweerrtt package, by Marcin Kowalczyk, which
is available from his home page <http://qrczak.home.ml.org/>.
66..44.. SSppeecciiaall UUCCSS22 ccooddeess
There are special UCS2 values you should care about, but the present
list is probably not exhaustive:
+o codes C from U+F000 to U+F1FF are not looked-up in the SFM, and
directly accesses the character in font-position C & 0x01FF (yes, a
font can be 512-chars on many hardware platforms, like VGA). This
is refered to as the _s_t_r_a_i_g_h_t _t_o _f_o_n_t zone.
+o code U+FFFD is the _r_e_p_l_a_c_e_m_e_n_t _c_h_a_r_a_c_t_e_r, usually at font-position
0 in a font. It is displayed by the kernel each time the
application requested a unicode character that is not present in
the SFM. This allows not only the driver to be safe in Unicode
mode, but also prevents displaying invalid characters when the ACM
on a particular VT contains characters not in the current font !
66..55.. AAbboouutt tthhee oolldd 88--bbiitt ````ssccrreeeenn mmaappss''''
There was a time where the kernel didn't know anything about Unicode.
In this ancient time, Application Charset Maps were called ``screen
maps'', and just mapped the application's characters into font
positions. The file format used for these 8bit ACM's is still
supported for backward compatibility, but should not be used any more.
The old way of using custom ACM's didn't know about unicode, so the
ACM had to depend on the font. Now, as each VT chooses its own ACM
(from the 4 ones in the kernel at a given time), and as the console-
font is common to all VT's, we can use a charset even if the font
can't display all of its characters; it will then display the
replacement character (U+FFFD).
66..66.. SSeeee aallssoo
psfaddtable(1), psfgettable(1), psfstriptable(1), showfont(1).
77.. TThhee lliibbrraarriieess
There are several shared libraries installed by the Linux Console
Tools. They were at first meant just to share code betwwen the
various utilities (kbd has lots of duplicated code), but they could be
used as a base to build new tools.
However, they are not yet ready for production use (hence the version
number 0.0.0), and are absolutely not complete nor coherent at the
time.
Here is a summary of what they are meant to become:
77..11.. <<llcctt//kkssyymmss..hh>> lliibbccoonnssoollee,, <<llcctt//ccoonnssoollee..hh>> ++
is a meant to be a collection of:
+o wrappers around the kernel-level functionnalities, which should be
as kernel-version-independant as reasonable;
+o higher-level interfaces to these functionnalities.
Maybe this goal overlaps with some part of libggi (see ``The
future''), but I didn't investigate that for now.
77..22.. lliibbccffoonntt,, <<llcctt//ffoonntt..hh>>
is meant to provide a high-level interface to console-font file-
handling. It also exports the lower-level functions used to construct
higher-level ones.
It only supports for now some low- to medium-level functions that ease
writing programs, but I hope to make it a lot more than that,
especially with the coming of the XPSF file-format (see doc/font-
formats/xpsf.draft for details).
As of release 1998.08.11, implementation of higher-level interface has
just started.
77..33.. <<llcctt//uunniiccooddee..hh>> lliibbccttuuttiillss,, <<llcctt//uuttiillss..hh>> ++
is a collection of misc utility functions for use by the 2 other libs
and by the tools. I hope most this stuff will one day make its way to
an existing general purpose utility-library. Any offers welcomed.
88.. TThhee ffuuttuurree ooff tthhee ccoonnssoollee ddrriivveerr aanndd ooff tthhee LLiinnuuxx CCoonnssoollee TToooollss
The Linux Console Tools were derived from kbd. However, it is not a
good thing to have two distinct distributions for these tools. I hope
we'll manage to finally merge the two packages back, together with
Andries Brouwer, who still maintains kbd.
There is an ongoing project, known as GGI (for General Graphical
Interface), which is in the process of, among other things,
revolutionarize the way the console is handled. Have a look at their
WWW site <http://synergy.foo.net/~ggi> for details.
Thus I don't think it is a good idea to hack the official console
driver at this time. As there are many improvements to do, they will
probably have to be done in the GGI drivers.
Well, as of 2.1.1xx kernels, the console driver has been thoroughly
modified. I hope those guys know what they're doing ;)
As far as possible, I will try to keep the Linux Console Tools in sync
with what is developped for the kernel, and to what gets added to new
releases of kbd but I have to look better at the current state of the
GGI project before I give any more info.
|