[go: up one dir, main page]

WO2018189308A1 - Character set identification - Google Patents

Character set identification Download PDF

Info

Publication number
WO2018189308A1
WO2018189308A1 PCT/EP2018/059417 EP2018059417W WO2018189308A1 WO 2018189308 A1 WO2018189308 A1 WO 2018189308A1 EP 2018059417 W EP2018059417 W EP 2018059417W WO 2018189308 A1 WO2018189308 A1 WO 2018189308A1
Authority
WO
WIPO (PCT)
Prior art keywords
character set
character
characters
program information
sets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/EP2018/059417
Other languages
German (de)
French (fr)
Inventor
Matthias Wefers
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hirschmann Car Communication GmbH
Original Assignee
Hirschmann Car Communication GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hirschmann Car Communication GmbH filed Critical Hirschmann Car Communication GmbH
Publication of WO2018189308A1 publication Critical patent/WO2018189308A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • H04N21/4856End-user interface for client configuration for language selection, e.g. for the menu or subtitles
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04HBROADCAST COMMUNICATION
    • H04H60/00Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
    • H04H60/68Systems specially adapted for using specific information, e.g. geographical or meteorological information
    • H04H60/72Systems specially adapted for using specific information, e.g. geographical or meteorological information using electronic programme guides [EPG]

Definitions

  • the invention relates to a method for transmitting electronic program information containing information about currently running and upcoming programs, wherein in addition to the electronic program information at least one predetermined character set is transmitted to adapt the program content to different languages and their characteristics, according to the features of the preamble of claim 1.
  • EPG Electronic Program Guide
  • Program Information Program Information
  • a character set is selected from a plurality of character sets according to the fact that the defined possible character sets have ranges in which no characters or specific characters are defined. and then, if within the program information such undefined or certain characters occur, it is assumed that this character set is not to be used for decoding, whereby this check is run through with all possible character sets, thus creating a list of character sets which are not to be used which concludes the remaining character set used as a default character set. That is, by detecting undefined or specific characters transmitted with the program information, a selection of that character set to be used as a default character set is made from a plurality of preceding character sets.
  • a method for recognizing a character set wherein in a default character set ISO 6937 gives the possibility to code so-called diacritical marks and these characters have a special coding type which does not occur in other character sets, in which, before program information after the signaled character set If such a combination occurs, then it is assumed that the default character set is to be used and in this case is not translated according to the signaled character set but with ISO 6937 Otherwise, if such a combination of 0xC1 -0xC2 and the second character does not occur, the signaled character set will continue to be used.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Character Discrimination (AREA)
  • Controls And Circuits For Display Device (AREA)

Abstract

The invention relates to a method for transferring electronic program information containing information about currently running and coming programs, wherein, in addition to the electronic program information, at least one specifiable character set is transferred in order adapt the program content in various languages and the particularities of said languages, characterized in that, if no specifiable character set was transferred or a character set deviating from the specifiable character set was transferred, a character set is selected from a plurality of character sets as follows: the defined possible character sets have ranges in which no characters or certain characters are defined and, if such undefined or certain characters occur within the program information, it is assumed that said character set should not be used for decoding, this check being performed with all possible character sets and thus a list of character sets that should not be used being produced, the remaining character set thereby being inferred, which remaining character set is used as a default character set.

Description

Zeichensatz-Erkennung  Font recognition

Beschreibung description

Die Erfindung betrifft ein Verfahren zum Übertragen von elektronischen Programminformationen, die Informationen über aktuell laufende sowie kommende Programme beinhalten, wobei zusätzlich zu den elektronischen Programminformationen zumindest ein vorgebbarer Zeichensatz übertragen wird, um den Programminhalt auf verschiedenen Sprachen und deren Besonderheiten anzupassen, gemäß den Merkmalen des Oberbegriffes des Patentanspruches 1 . The invention relates to a method for transmitting electronic program information containing information about currently running and upcoming programs, wherein in addition to the electronic program information at least one predetermined character set is transmitted to adapt the program content to different languages and their characteristics, according to the features of the preamble of claim 1.

Ausgangslage: Background:

Bei Fernsehen (TV) wird ein sogenanntes EPG (Electronic Program Guide) übertragen. Diese Elektronische Programmzeitschrift (Programminformationen) gibt eine Übersicht über die aktuellen und kommenden Programme. Um den Inhalt auf verschiedenen Sprachen und deren Besonderheiten anzupassen, sieht die Norm ETSI EN 300 468 vor, verschiedene Zeichenkodierungen zu unterstützen. Üblicherweise wird der verwendete Zeichensatz zusammen mit den Programm-Informationen mit übertragen. For television (TV) a so-called EPG (Electronic Program Guide) is transmitted. This Electronic Program Guide (Program Information) provides an overview of current and upcoming programs. In order to adapt the content to different languages and their particularities, the ETSI EN 300 468 standard proposes to support different character encodings. Usually, the character set used is transmitted together with the program information.

Problem: Problem:

Wird kein spezieller Zeichensatz übertragen (signalisiert), so soll nach ETSI EN 300 468 ein Default Zeichensatz (ISO 6937) verwendet werden. Dabei gibt es in der Realität jedoch zwei Fehlermöglichkeiten: If no special character set is transmitted (signaled), then according to ETSI EN 300 468 a default character set (ISO 6937) should be used. There are, however, two possibilities for error in reality:

(1 ) Es wird kein Zeichensatz übertragen (signalisiert), jedoch ein anderer als der Default-Zeichensatz verwendet. (1) No character set is transmitted (signaled) but uses a different font than the default character set.

(2) Es wird der Default-Zeichensatz verwendet, jedoch ein anderer übertragen (signalisiert). (2) The default character set is used but another is transmitted (signaled).

Lösung für (1 ): Solution for (1):

Allgemein ist erfindungsgemäß vorgesehen, dass dann, wenn kein vorgebbarer Zeichensatz oder ein von dem vorgebbaren Zeichensatz abweichender Zeichensatz übertragen wurde, aus mehreren Zeichensätze ein Zeichensatz danach ausgewählt wird, dass die festgelegten möglichen Zeichensätze Bereiche aufweisen, in denen keine Zeichen oder bestimmte Zeichen definiert sind, und dann, wenn innerhalb der Programminformationen solche nicht definierten oder bestimmte Zeichen vorkommen, davon ausgegangen wird, dass dieser Zeichensatz zur Dekodierung nicht verwendet werden soll, wobei diese Überprüfung mit allen möglichen Zeichensätzen durchlaufen wird und so eine Liste an Zeichensätzen entsteht, die nicht zu verwenden sind, wobei dadurch auf den verbleibenden Zeichensatz geschlossen wird, der als ein Default- Zeichensatz verwendet wird. Das bedeutet, dass durch Feststellung von nicht definierten oder bestimmten Zeichen, die mit den Programminformationen übertragen werden, eine Selektion desjenigen Zeichensatzes, der als Default-Zeichensatz verwendet werden soll, aus mehreren vorgehenden Zeichensätzen erfolgt. In general, it is provided according to the invention that if no predeterminable character set or a character set differing from the predefinable character set has been transmitted, a character set is selected from a plurality of character sets according to the fact that the defined possible character sets have ranges in which no characters or specific characters are defined. and then, if within the program information such undefined or certain characters occur, it is assumed that this character set is not to be used for decoding, whereby this check is run through with all possible character sets, thus creating a list of character sets which are not to be used which concludes the remaining character set used as a default character set. That is, by detecting undefined or specific characters transmitted with the program information, a selection of that character set to be used as a default character set is made from a plurality of preceding character sets.

Innerhalb der in ETSI EN 300 468 festgelegten möglichen Zeichensätze gibt es immer Bereiche, in den keine Zeichen definiert sind. (z.B. bei ISO 8859-8 der Bereich OxCO - OxDE. Within the possible character sets defined in ETSI EN 300 468, there are always areas in which no characters are defined. (eg for ISO 8859-8 the range OxCO - OxDE.

Sollten innerhalb der Programm-Informationen solche nicht definierten Zeichen vorkommen, so ist dies ein sicheres Zeichen, dass dieser Zeichensatz zur Dekodierung nicht verwendet werden sollte. Wird dieser Test mit allen möglichen Zeichensätzen durchlaufen, so entsteht eine Liste an Zeichensätzen, die nicht zu verwenden sind. Idealerweise kann so auf den verbleibenden Zeichensatz geschlossen werden. If such undefined characters occur within the program information, this is a sure sign that this character set should not be used for decoding. If this test is run through with all possible character sets, a list of character sets is created that are not to be used. Ideally, this is how the remaining character set can be deduced.

Lösung für (2): in Weiterbildung der Erfindung ist vorgesehen, dass als bestimmte Zeichen diakritische Zeichen verwendet werden. Solution for (2): in a further development of the invention it is provided that diacritical marks are used as specific characters.

In Default-Zeichensatz ISO 6937 gibt es die Möglichkeit, sogenannte diakritische Zeichen zu kodieren (z.B. Ä, E, ], Ö, Ü, ä, e, i, ö, u). Diese Zeichen besitzen eine besondere Kodierungsart, die so in anderen Zeichensätzen nicht vorkommt. In the default character set ISO 6937 there is the possibility to code so-called diacritics (for example, Ä, E,], Ö, Ü, ä, e, i, ö, u). These characters have a special encoding style that does not appear in other fonts.

Sie beginnen immer mit einem Code 0xC1 - OxCF (dem Akzent), gefolgt von einem zweiten Zeichen (dem Buchstaben). Mögliche Kombinationen sind genau festgelegt: They always start with a code 0xC1 - OxCF (the accent), followed by a second character (the letter). Possible combinations are specified:

Figure imgf000004_0001
Figure imgf000004_0001

Figure imgf000005_0001
Figure imgf000005_0001

Bevor nun eine Programminformation nach dem signalisierten Zeichensatz übersetzt wird, wird nach der Erfindung geprüft, ob darin eine solche Kombination auftritt: Before a program information is translated according to the signaled character set, it is checked according to the invention whether such a combination occurs therein:

Tritt eine solche Kombination auf, so kann nahezu sicher davon ausgegangen werden, dass der Default Zeichensatz zu verwenden ist. In diesem Fall wird nicht nach dem signalisierten Zeichensatz übersetzt, sondern mit ISO 6937. Sollte eine solche Kombination aus 0xC1 -0xC2 und dem zweiten Zeichen nicht vorkommen, so wird weiterhin der signalisierte Zeichensatz verwendet. If such a combination occurs, it can almost certainly be assumed that the default character set is to be used. In this case, it is not translated according to the signaled character set, but with ISO 6937. If such a combination of 0xC1 -0xC2 and the second character does not occur, then the signaled character set will continue to be used.

Verfahren zur Erkennung eines Zeichensatzes, wobei es in einem Default-Zeichensatz ISO 6937 die Möglichkeit gibt, sogenannte diakritische Zeichen zu kodieren und diese Zeichen eine besondere Kodierungsart besitzen, die so in anderen Zeichensätzen nicht vorkommt, bei dem, bevor eine Programminformation nach dem signalisierten Zeichensatz übersetzt wird, geprüft wird, ob darin eine solche Kombination auftritt, wobei dann, wenn eine solche Kombination auftritt, davon ausgegangen wird, dass der Default-Zeichensatz zu verwenden ist und in diesem Fall nicht nach dem signalisierten Zeichensatz, sondern mit ISO 6937 übersetzt wird, wobei ansonsten, wenn eine solche Kombination aus 0xC1 -0xC2 und dem zweiten Zeichen nicht vorkommt, weiterhin der signalisierte Zeichensatz verwendet wird. A method for recognizing a character set, wherein in a default character set ISO 6937 gives the possibility to code so-called diacritical marks and these characters have a special coding type which does not occur in other character sets, in which, before program information after the signaled character set If such a combination occurs, then it is assumed that the default character set is to be used and in this case is not translated according to the signaled character set but with ISO 6937 Otherwise, if such a combination of 0xC1 -0xC2 and the second character does not occur, the signaled character set will continue to be used.

Claims

Patentansprüche claims 1 . Verfahren zum Übertragen von elektronischen Programminformationen, die Informationen über aktuell laufende sowie kommende Programme beinhalten, wobei zusätzlich zu den elektronischen Programminformationen zumindest ein vorgebbarer Zeichensatz übertragen wird, um den Programminhalt auf verschiedenen Sprachen und deren Besonderheiten anzupassen, dadurch gekennzeichnet, dass dann, wenn kein vorgebbarer Zeichensatz oder ein von dem vorgebbaren Zeichensatz abweichender Zeichensatz übertragen wurde, aus mehreren Zeichensätze ein Zeichensatz danach ausgewählt wird, dass die festgelegten möglichen Zeichensätze Bereiche aufweisen, in denen keine Zeichen oder bestimmte Zeichen definiert sind, und dann, wenn innerhalb der Programminformationen solche nicht definierten oder bestimmte Zeichen vorkommen, davon ausgegangen wird, dass dieser Zeichensatz zur Dekodierung nicht verwendet werden soll, wobei diese Überprüfung mit allen möglichen Zeichensätzen durchlaufen wird und so eine Liste an Zeichensätzen entsteht, die nicht zu verwenden sind, wobei dadurch auf den verbleibenden Zeichensatz geschlossen wird, der als ein Default-Zeichensatz verwendet wird. 1 . A method for transmitting electronic program information containing information about currently running and upcoming programs, wherein in addition to the electronic program information at least one predeterminable character set is transmitted to adapt the program content to different languages and their characteristics, characterized in that, if no specifiable A set of characters or a character set differing from the predeterminable character set has been transferred, a set of characters is selected from a plurality of character sets, the defined possible character sets have areas in which no characters or specific characters are defined, and then if within the program information such undefined or certain characters occur, it is assumed that this character set should not be used for decoding, whereby this check is run through with all possible character sets and so a list to fonts that are not to be used, thereby deducing the remaining set of fonts used as a default font. 2. Verfahren nach Anspruch 1 , dadurch gekennzeichnet, dass als bestimmte Zeichen diakritische Zeichen verwendet werden. 2. The method according to claim 1, characterized in that are used as specific characters diacritic marks. 3. Verfahren nach Anspruch 2, dadurch gekennzeichnet, dass ein Default- Zeichensatz nach ISO 6937 zur Anwendung kommt, in dem die diakritischen Zeichen kodiert sind. 3. The method according to claim 2, characterized in that a default character set according to ISO 6937 is used, in which the diacritic symbols are encoded.
PCT/EP2018/059417 2017-04-13 2018-04-12 Character set identification Ceased WO2018189308A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102017108042.2 2017-04-13
DE102017108042 2017-04-13

Publications (1)

Publication Number Publication Date
WO2018189308A1 true WO2018189308A1 (en) 2018-10-18

Family

ID=62027967

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2018/059417 Ceased WO2018189308A1 (en) 2017-04-13 2018-04-12 Character set identification

Country Status (2)

Country Link
DE (1) DE102018108693A1 (en)
WO (1) WO2018189308A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7148824B1 (en) * 2005-08-05 2006-12-12 Xerox Corporation Automatic detection of character encoding format using statistical analysis of the text strings
US7191114B1 (en) * 1999-08-27 2007-03-13 International Business Machines Corporation System and method for evaluating character sets to determine a best match encoding a message
EP2169950A2 (en) * 2008-09-30 2010-03-31 Kabushiki Kaisha Toshiba Character code conversion apparatus and character code conversion method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7191114B1 (en) * 1999-08-27 2007-03-13 International Business Machines Corporation System and method for evaluating character sets to determine a best match encoding a message
US7148824B1 (en) * 2005-08-05 2006-12-12 Xerox Corporation Automatic detection of character encoding format using statistical analysis of the text strings
EP2169950A2 (en) * 2008-09-30 2010-03-31 Kabushiki Kaisha Toshiba Character code conversion apparatus and character code conversion method

Also Published As

Publication number Publication date
DE102018108693A1 (en) 2018-10-18

Similar Documents

Publication Publication Date Title
DE10342594B4 (en) Method and system for collecting data from a plurality of machine readable documents
DE69423254T2 (en) Method and device for automatic speech recognition of documents
DE19547812C2 (en) Character string reader
DE10162156B4 (en) The user navigation through multimedia file content supporting system and method
DE69937176T2 (en) Segmentation method to extend the active vocabulary of speech recognizers
DE69525401T2 (en) Method and device for identifying words described in a portable electronic document
DE602005002473T2 (en) Method for recognizing semantic units in an electronic document
DE19708184A1 (en) Method for speech recognition with language model adaptation
DE102012023022A1 (en) Method for detecting traffic sign in image data used in motor vehicle, involves applying classifier to segmented image data to recognize object as a specific traffic sign, when probability value is greater than predetermined threshold
EP1918104A2 (en) Method for testing an imprint and imprint testing device
WO2018189308A1 (en) Character set identification
DE102004009617A1 (en) Method and device for coding and decoding structured documents
DE69331035T2 (en) Character recognition system
DE10339971A1 (en) Method for coding an XML-based document
DE69839144T2 (en) Image coding apparatus using image pattern coding
DE19649692C1 (en) Procedure for the verification of a sample lettering with the help of a reference lettering
EP2845145A1 (en) Apparatus and method for comparing two files containing graphics elements and text elements
DE102023203660A1 (en) Computer-implemented method and device for machine learning of facts, in particular for filling a knowledge base
DE102008014611A1 (en) Method for displaying meta-information and device
DE102024107248A1 (en) Computer-implemented method and device for the automated identification of components on a printed circuit board
DE19911535A1 (en) Language and speech recognition method dynamically matching vocabulary to contents to be recognized, such as Internet sides
DE3147225A1 (en) Method and device for reading contrasting characters
EP2466488A2 (en) Method and system of computer assisted proofreading during translation
DE10240133A1 (en) Equivalence comparison method for comparison of digital circuits during design, whereby an initial stored description is converted into at least two circuit descriptions in a second format followed by an equivalence comparison
DE102023128344A1 (en) Method for modifying a search query for a text-based image search

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18719095

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18719095

Country of ref document: EP

Kind code of ref document: A1