File: lct.txt

package info (click to toggle)
console-tools 1998.08.11-3
links: PTS
area: main
in suites: slink
size: 5,240 kB
ctags: 805
sloc: ansic: 8,241; sh: 2,947; yacc: 970; makefile: 356; lex: 287; pascal: 192; perl: 110
file content (462 lines) | stat: -rw-r--r-- 18,971 bytes
  The Linux Console Tools
  Yann Dirson, dirson@debian.org
  11 August 1998
  ____________________________________________________________

  Table of Contents


  1. Status of this document

     1.1 Other documents

  2. What the Linux Console Tools are

  3. Understanding the big picture of the console

  4. What is Unicode

  5. Understanding and setting up the keyboard driver

     5.1 How it works
     5.2 See also

  6. Understanding and setting up the screen driver

     6.1 Unicode is everywhere
        6.1.1 Screen Font Maps
        6.1.2 SFM Fallback tables
     6.2 The unicode screen-mode
     6.3 The byte screen-mode
        6.3.1 Charset slots
     6.4 Special UCS2 codes
     6.5 About the old 8-bit ``screen maps''
     6.6 See also

  7. The libraries

     7.1 libconsole,
     7.2 libcfont,
     7.3 libctutils,

  8. The future of the console driver and of the Linux Console Tools



  ______________________________________________________________________

  11..  SSttaattuuss ooff tthhiiss ddooccuummeenntt

  This is an introduction to the Linux Console Tools package.  You
  should refer to the manpages for more details.


  11..11..  OOtthheerr ddooccuummeennttss

  Files in the doc/contrib/ directory are unsupported, and may be
  obsolete, but are provided just in case someone needs them.

  README.{acm,sfm,keytables} give some info on the respective included
  data files.

  kbd.FAQ.* is the Console and Keyboard HOWTO by Andries Brouwer, as
  included in kbd 0.96a.  It would need some corrections, though.



  22..  WWhhaatt tthhee LLiinnuuxx CCoonnssoollee TToooollss aarree

  The Linux Console Tools are a set of programs allowing the user to
  setup/customize your console (restricted meaning: screen + keyboard
  only).  It is derived from version 0.94 of the kbd package, and has
  benefited from most (hopefully all) features introduced in kbd until
  version 0.96a.

  The Linux Console Tools are still under development, but using it just
  as a replacement for kbd should be quite safe, at it fixes many bugs
  kbd has.


  33..  UUnnddeerrssttaannddiinngg tthhee bbiigg ppiiccttuurree ooff tthhee ccoonnssoollee

  The console driver is currently made of 2 sub-drivers: the keyboard
  driver, and the screen driver.  Basically, the keyboard driver sends
  characters to your application, then the application does its own job,
  and sends to the screen driver the characters to be displayed.


  44..  WWhhaatt iiss UUnniiccooddee

  Traditionnaly, character encodings use 8 bits, and thus are limited to
  256 characters.  This causes problems because:

  1. it's not enough for some languages;

  2. people speaking languages using different encodings have to choose
     which one they use, and have to switch the system's state when
     changing the language, which makes it difficult to mix several
     languages in the same file;

  3. etc...

  Thus the UCS (Universal Character Set), also know as _U_n_i_c_o_d_e was
  created to handle and mix all of our world's scripts.  This is a
  32-bit (4 bytes) encoding, otherwise known as UCS4 because of the size
  of its characters, which is normalised by ISO as the 10646-1 standard.
  The most widely used characters from UCS are contained in the UCS2
  16-bit subset of UCS; this is the subset used by the Linux console.

  For convenience, the UTF8 encoding was designed as a variable-length
  encoding (with 8 bytes of maximum length) with ASCII compatibility;
  all chars that have a UCS4 encoding can be expressed as a UTF8
  sesquence, and vice-versa.

  The Unicode consortium <http://unicode.org> defines additional
  properties for UCS2 characters.

  See: unicode(7), utf-8(7).


  55..  UUnnddeerrssttaannddiinngg aanndd sseettttiinngg uupp tthhee kkeeyybbooaarrdd ddrriivveerr

  55..11..  HHooww iitt wwoorrkkss

  The keyboard driver is made up several levels:


  +o  the keyboard hardware, which turns the user's finger moves into so-
     called _s_c_a_n_c_o_d_e_s (Disclaimer: this is not really part of the
     software driver itself; no support is provided for bugs in this
     domain ;-).  An event (key pressed or released) generates from 1 to
     6 _s_c_a_n_c_o_d_e_s.

  +o  a mechanism turning _s_c_a_n_c_o_d_e_s into _k_e_y_c_o_d_e_s using a translation-
     table which you can access with the getkeycodes(8) and
     setkeycodes(8) utilities.  You will only need to look at that if
     you have some sort of non-standard (or programmable ?) keys on your
     keyboard.  AFAIK, these keycodes are the same among a set of
     keyboards sharing the same hardware, but differing in the symbols
     drawn on the keys.

  +o  a mechanism turning _k_e_y_c_o_d_e_s into _c_h_a_r_a_c_t_e_r_s using a _k_e_y_m_a_p. You
     can access this _k_e_y_m_a_p using the loadkeys(1) and dumpkeys(1)
     utilities.

  The keyboard driver can be in one of 4 modes (which you can access
  using kbd_mode(1)), which will influence what type of data
  applications will get as keyboard input:


  +o  the scancode (K_RAW) mode, in which the application gets scancodes
     for input.  It is used by applications that implement their own
     keyboard driver.  For example, X11 does that.

  +o  the keycode (K_MEDIUMRAW) mode, in which the application gets
     information on which keys (identified by their keycodes) get
     pressed and released.  AFAIK, no real-life application uses this
     mode.

  +o  the ASCII (K_XLATE) mode, in which the application effectively gets
     the characters as defined by the _k_e_y_m_a_p, using an 8-bit encoding.
     In this mode, the Ascii_0 to Ascii_9 keymap symbols allow to
     compose characters by giving their decimal 8bit-code, and Hex_0 to
     Hex_F do the same with (2-digit) hexadecimal codes.

  +o  the Unicode (K_UNICODE) mode, which at this time only differs from
     the ASCII mode by allowing the user to compose UTF8 unicode
     characters by their decimal value, using Ascii_0 to Ascii_9 (who
     needs that ?), or their hexadecimal (4-digit) value, using Hex_0 to
     Hex_9.  A keymap can be set up to produce UTF8 sequences (with a
     U+XXXX pseudo-symbol, where each X is an hexadecimal digit), but be
     warned that these UTF8 sequences will also be produced even in
     ASCII mode.  I think this is a bug in the kernel.

  BBEE WWAARRNNEEDD that putting the keyboard in RAW or MEDIUMRAW mode will make
  it unusable for most applications.  Use showkey(1) to get a demo of
  these special modes, or to find out what scancodes/keycodes are
  produced by a specific key.


  55..22..  SSeeee aallssoo

  keytables(5), setleds(1), setmetamode(1).



  66..  UUnnddeerrssttaannddiinngg aanndd sseettttiinngg uupp tthhee ssccrreeeenn ddrriivveerr

  66..11..  UUnniiccooddee iiss eevveerryywwhheerree

  66..11..11..  SSccrreeeenn FFoonntt MMaappss

  In recent (as of 1998/08/11) kernels, the screen driver is based on
  16-bit unicode (UCS2) encoding, which means that every console-font
  loaded sshhoouulldd be defined using a _u_n_i_c_o_d_e _S_c_r_e_e_n _F_o_n_t _M_a_p (SFM for
  short), which tells, for each character in the font, the list of UCS2
  characters it will render. (-- SFM's were formerly called ``Unicode
  Map'', or ``unimap'' for short, but this term should be dropped, as
  now what they called ``screen maps'' uses Unicode as well: it probably
  confuses many many people--)


  66..11..22..  SSFFMM FFaallllbbaacckk ttaabblleess

  Starting with release 1997.11.13 of the Linux Console Tools,
  consolechars(8) now understands _S_F_M _f_a_l_l_b_a_c_k _t_a_b_l_e_s.  Before that,
  SFM's should contain at the same time the Unicode of the characters it
  was primarily meant to render, as well as any approximations the user
  would like to.  These fallback tables allow to only put the primary
  mappings in the SFM provided with the font-file, and to _s_e_p_a_r_a_t_e_l_y
  keep a list telling _`_`_i_f _n_o _g_l_y_p_h _f_o_r _t_h_a_t _c_h_a_r_a_c_t_e_r _i_s _a_v_a_i_l_a_b_l_e _i_n
  _t_h_e _c_u_r_r_e_n_t _f_o_n_t_, _t_h_e_n _t_r_y _t_o _d_i_s_p_l_a_y _i_t _w_i_t_h _t_h_e _g_l_y_p_h _f_o_r _t_h_i_s _o_n_e_,
  _o_r _e_l_s_e _t_h_e _o_n_e _f_o_r _t_h_a_t _o_n_e_, _o_r _._._._'_'.  This permits to keep in one
  only place all possible fallbacks, and everyone will be able to choose
  which fallback tables (s)he wants.  Have a look at
  data/consoletrans/*.fallback for examples.

  A fallback-table file is made of fallback entries, each entry being on
  its own line. Empty lines, and lines beginning with the # comment
  character are ignored.

  A fallback entry is a series of 2 or more UCS2 codes. The first one is
  the character for which we want a glyph; the following ones are those
  whose glyph we want to use when no glyph designed specially for our
  character is available. The order of the codes defines a priority
  order (own glyph if available, then second char's, then the third's,
  etc.)

  If a SFM was to be loaded, fallback mappings are added to this map
  before it is loaded. If there was not (ie. a font without SFM was
  loaded, and no --sfm option was given to consolechars, or the --force-
  no-sfm option was given), then the current SFM is requested from the
  kernel, the fallback mappings are added, and the resulting SFM is
  loaded back into the kernel.

  Note that each fallback entry is checked against the original SFM, not
  against the SFM we get by adding former fallback entries to the
  original SFM (the one read from a file, or given by the kernel); this
  applies even to entries in different files, and thus the order of -k
  options has no effect. If you want some entries to be influenced by
  previous ones, you will have to use different fallback files, and to
  load them with several consecutive invocations of consolechars -k.


  66..22..  TThhee uunniiccooddee ssccrreeeenn--mmooddee

  There are basically 2 screen-modes (byte mode and UTF mode).  The
  simpler to explain is the UTF mode, in which the bytes received from
  the application (ie. written to the console screen) are interpreted as
  UTF8 sequences, which are converted in the ``equivalent UCS2 codes'',
  and then looked-up in the SFM to determine the glyphs used to display
  each character.

  Switching to and from UTF mode is done by sending to the screen the
  escape sequences <ESC>%G and <ESC>%@ respectively.  You may use the
  unicode_start(1) and unicode_stop(1) scripts instead, as they also
  change the keyboard mode, and let you optionally change the screen-
  font.

  Use vt-is-UTF8(1) to find out whether active VT is in UTF mode.





  66..33..  TThhee bbyyttee ssccrreeeenn--mmooddee

  The byte mode is a bit more complicated, as it uses an additional map
  to transform the byte-characters sent by the application into UCS2
  characters, which are then treated as told above.  This map I call the
  Application Charset Map (ACM), because it defines the encoding the
  application uses, but it used to be called a ``screen map'', or
  ``console map'' (this comes from the time where the screen driver
  didn't use Unicode, and there was only one Map down there).

  Although there is only one ACM active at a given time, there are 4 of
  them at any time in the kernel; 3 of them are built-in and never
  change, and they define the IBM codepage 437 (the i386's default, and
  thus the kernel's default even on other archs), the DEC VT100 charset,
  and the ISO latin1 charset; the 4th is user-definable, and defaults on
  boot to the ``straight to font'' mapping, decribed below under
  ``Special UCS2 codes''.

  The consolechars(1) command can be used to change the ACM, as well as
  the font and its associated SFM.


  66..33..11..  CChhaarrsseett sslloottss

  The Linux Console Driver has 2 slots for charsets, labeled _G_0 and _G_1.
  Each of these slots contains a reference to one of the 4 kernel ACMs,
  3 of which are predefined to provide the _c_p_4_3_7, _i_s_o_0_1, and _v_t_1_0_0
  _g_r_a_p_h_i_c_s charsets.  The 4th one is user-definable; this is the one you
  can set with consolechars --acm and get with consolechars --old-acm.

  Versions of the Linux Console Tools prior to 1998.08.11, as well as
  all versions of kbd at least until 0.96a, were always assuming you
  wanted to use the G0 slot, pointing to the user-defined ACM.  You can
  now use the charset utility to tune your charset slots.

  You will note that, although each VT has its own slot settings, there
  is only one user-defined ACM for use by all the VTs.  That is, whereas
  you can have tty1 using _G_0_=_c_p_4_3_7 and _G_1_=_v_t_1_0_0, at the same time as
  tty2 using _G_0_=_i_s_o_0_1 and _G_1_=_i_s_o_0_2 (user-defined), you ccaannnnoott have at
  the same time tty1 using _i_s_o_0_2 and tty2 using _i_s_o_0_3.  This is a
  limitation of the linux kernel.

  Note that you can emulate such a setting using the filterm utility,
  with your console in UTF8-mode, by telling filterm to translate screen
  output on-the-fly to UTF8.

  You'll find ffiilltteerrmm in the kkoonnwweerrtt package, by Marcin Kowalczyk, which
  is available from his home page <http://qrczak.home.ml.org/>.



  66..44..  SSppeecciiaall UUCCSS22 ccooddeess

  There are special UCS2 values you should care about, but the present
  list is probably not exhaustive:


  +o  codes C from U+F000 to U+F1FF are not looked-up in the SFM, and
     directly accesses the character in font-position C & 0x01FF (yes, a
     font can be 512-chars on many hardware platforms, like VGA).  This
     is refered to as the _s_t_r_a_i_g_h_t _t_o _f_o_n_t zone.


  +o  code U+FFFD is the _r_e_p_l_a_c_e_m_e_n_t _c_h_a_r_a_c_t_e_r, usually at font-position
     0 in a font.  It is displayed by the kernel each time the
     application requested a unicode character that is not present in
     the SFM.  This allows not only the driver to be safe in Unicode
     mode, but also prevents displaying invalid characters when the ACM
     on a particular VT contains characters not in the current font !


  66..55..  AAbboouutt tthhee oolldd 88--bbiitt ````ssccrreeeenn mmaappss''''

  There was a time where the kernel didn't know anything about Unicode.
  In this ancient time, Application Charset Maps were called ``screen
  maps'', and just mapped the application's characters into font
  positions.  The file format used for these 8bit ACM's is still
  supported for backward compatibility, but should not be used any more.

  The old way of using custom ACM's didn't know about unicode, so the
  ACM had to depend on the font.  Now, as each VT chooses its own ACM
  (from the 4 ones in the kernel at a given time), and as the console-
  font is common to all VT's, we can use a charset even if the font
  can't display all of its characters; it will then display the
  replacement character (U+FFFD).


  66..66..  SSeeee aallssoo

  psfaddtable(1), psfgettable(1), psfstriptable(1), showfont(1).



  77..  TThhee lliibbrraarriieess

  There are several shared libraries installed by the Linux Console
  Tools.  They were at first meant just to share code betwwen the
  various utilities (kbd has lots of duplicated code), but they could be
  used as a base to build new tools.

  However, they are not yet ready for production use (hence the version
  number 0.0.0), and are absolutely not complete nor coherent at the
  time.

  Here is a summary of what they are meant to become:


  77..11..  <<llcctt//kkssyymmss..hh>> lliibbccoonnssoollee,, <<llcctt//ccoonnssoollee..hh>>  ++

  is a meant to be a collection of:

  +o  wrappers around the kernel-level functionnalities, which should be
     as kernel-version-independant as reasonable;

  +o  higher-level interfaces to these functionnalities.

  Maybe this goal overlaps with some part of libggi (see ``The
  future''), but I didn't investigate that for now.


  77..22..  lliibbccffoonntt,, <<llcctt//ffoonntt..hh>>

  is meant to provide a high-level interface to console-font file-
  handling. It also exports the lower-level functions used to construct
  higher-level ones.

  It only supports for now some low- to medium-level functions that ease
  writing programs, but I hope to make it a lot more than that,
  especially with the coming of the XPSF file-format (see doc/font-
  formats/xpsf.draft for details).


  As of release 1998.08.11, implementation of higher-level interface has
  just started.


  77..33..  <<llcctt//uunniiccooddee..hh>> lliibbccttuuttiillss,, <<llcctt//uuttiillss..hh>>  ++

  is a collection of misc utility functions for use by the 2 other libs
  and by the tools. I hope most this stuff will one day make its way to
  an existing general purpose utility-library.  Any offers welcomed.



  88..  TThhee ffuuttuurree ooff tthhee ccoonnssoollee ddrriivveerr aanndd ooff tthhee LLiinnuuxx CCoonnssoollee TToooollss

  The Linux Console Tools were derived from kbd.  However, it is not a
  good thing to have two distinct distributions for these tools.  I hope
  we'll manage to finally merge the two packages back, together with
  Andries Brouwer, who still maintains kbd.

  There is an ongoing project, known as GGI (for General Graphical
  Interface), which is in the process of, among other things,
  revolutionarize the way the console is handled.  Have a look at their
  WWW site <http://synergy.foo.net/~ggi> for details.

  Thus I don't think it is a good idea to hack the official console
  driver at this time.  As there are many improvements to do, they will
  probably have to be done in the GGI drivers.

  Well, as of 2.1.1xx kernels, the console driver has been thoroughly
  modified.  I hope those guys know what they're doing ;)

  As far as possible, I will try to keep the Linux Console Tools in sync
  with what is developped for the kernel, and to what gets added to new
  releases of kbd but I have to look better at the current state of the
  GGI project before I give any more info.