WO2018189308A1 - Character set identification - Google Patents
Character set identification Download PDFInfo
- Publication number
- WO2018189308A1 WO2018189308A1 PCT/EP2018/059417 EP2018059417W WO2018189308A1 WO 2018189308 A1 WO2018189308 A1 WO 2018189308A1 EP 2018059417 W EP2018059417 W EP 2018059417W WO 2018189308 A1 WO2018189308 A1 WO 2018189308A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- character set
- character
- characters
- program information
- sets
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/485—End-user interface for client configuration
- H04N21/4856—End-user interface for client configuration for language selection, e.g. for the menu or subtitles
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04H—BROADCAST COMMUNICATION
- H04H60/00—Arrangements for broadcast applications with a direct linking to broadcast information or broadcast space-time; Broadcast-related systems
- H04H60/68—Systems specially adapted for using specific information, e.g. geographical or meteorological information
- H04H60/72—Systems specially adapted for using specific information, e.g. geographical or meteorological information using electronic programme guides [EPG]
Definitions
- the invention relates to a method for transmitting electronic program information containing information about currently running and upcoming programs, wherein in addition to the electronic program information at least one predetermined character set is transmitted to adapt the program content to different languages and their characteristics, according to the features of the preamble of claim 1.
- EPG Electronic Program Guide
- Program Information Program Information
- a character set is selected from a plurality of character sets according to the fact that the defined possible character sets have ranges in which no characters or specific characters are defined. and then, if within the program information such undefined or certain characters occur, it is assumed that this character set is not to be used for decoding, whereby this check is run through with all possible character sets, thus creating a list of character sets which are not to be used which concludes the remaining character set used as a default character set. That is, by detecting undefined or specific characters transmitted with the program information, a selection of that character set to be used as a default character set is made from a plurality of preceding character sets.
- a method for recognizing a character set wherein in a default character set ISO 6937 gives the possibility to code so-called diacritical marks and these characters have a special coding type which does not occur in other character sets, in which, before program information after the signaled character set If such a combination occurs, then it is assumed that the default character set is to be used and in this case is not translated according to the signaled character set but with ISO 6937 Otherwise, if such a combination of 0xC1 -0xC2 and the second character does not occur, the signaled character set will continue to be used.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Character Discrimination (AREA)
- Controls And Circuits For Display Device (AREA)
Abstract
Description
Zeichensatz-Erkennung Font recognition
Beschreibung description
Die Erfindung betrifft ein Verfahren zum Übertragen von elektronischen Programminformationen, die Informationen über aktuell laufende sowie kommende Programme beinhalten, wobei zusätzlich zu den elektronischen Programminformationen zumindest ein vorgebbarer Zeichensatz übertragen wird, um den Programminhalt auf verschiedenen Sprachen und deren Besonderheiten anzupassen, gemäß den Merkmalen des Oberbegriffes des Patentanspruches 1 . The invention relates to a method for transmitting electronic program information containing information about currently running and upcoming programs, wherein in addition to the electronic program information at least one predetermined character set is transmitted to adapt the program content to different languages and their characteristics, according to the features of the preamble of claim 1.
Ausgangslage: Background:
Bei Fernsehen (TV) wird ein sogenanntes EPG (Electronic Program Guide) übertragen. Diese Elektronische Programmzeitschrift (Programminformationen) gibt eine Übersicht über die aktuellen und kommenden Programme. Um den Inhalt auf verschiedenen Sprachen und deren Besonderheiten anzupassen, sieht die Norm ETSI EN 300 468 vor, verschiedene Zeichenkodierungen zu unterstützen. Üblicherweise wird der verwendete Zeichensatz zusammen mit den Programm-Informationen mit übertragen. For television (TV) a so-called EPG (Electronic Program Guide) is transmitted. This Electronic Program Guide (Program Information) provides an overview of current and upcoming programs. In order to adapt the content to different languages and their particularities, the ETSI EN 300 468 standard proposes to support different character encodings. Usually, the character set used is transmitted together with the program information.
Problem: Problem:
Wird kein spezieller Zeichensatz übertragen (signalisiert), so soll nach ETSI EN 300 468 ein Default Zeichensatz (ISO 6937) verwendet werden. Dabei gibt es in der Realität jedoch zwei Fehlermöglichkeiten: If no special character set is transmitted (signaled), then according to ETSI EN 300 468 a default character set (ISO 6937) should be used. There are, however, two possibilities for error in reality:
(1 ) Es wird kein Zeichensatz übertragen (signalisiert), jedoch ein anderer als der Default-Zeichensatz verwendet. (1) No character set is transmitted (signaled) but uses a different font than the default character set.
(2) Es wird der Default-Zeichensatz verwendet, jedoch ein anderer übertragen (signalisiert). (2) The default character set is used but another is transmitted (signaled).
Lösung für (1 ): Solution for (1):
Allgemein ist erfindungsgemäß vorgesehen, dass dann, wenn kein vorgebbarer Zeichensatz oder ein von dem vorgebbaren Zeichensatz abweichender Zeichensatz übertragen wurde, aus mehreren Zeichensätze ein Zeichensatz danach ausgewählt wird, dass die festgelegten möglichen Zeichensätze Bereiche aufweisen, in denen keine Zeichen oder bestimmte Zeichen definiert sind, und dann, wenn innerhalb der Programminformationen solche nicht definierten oder bestimmte Zeichen vorkommen, davon ausgegangen wird, dass dieser Zeichensatz zur Dekodierung nicht verwendet werden soll, wobei diese Überprüfung mit allen möglichen Zeichensätzen durchlaufen wird und so eine Liste an Zeichensätzen entsteht, die nicht zu verwenden sind, wobei dadurch auf den verbleibenden Zeichensatz geschlossen wird, der als ein Default- Zeichensatz verwendet wird. Das bedeutet, dass durch Feststellung von nicht definierten oder bestimmten Zeichen, die mit den Programminformationen übertragen werden, eine Selektion desjenigen Zeichensatzes, der als Default-Zeichensatz verwendet werden soll, aus mehreren vorgehenden Zeichensätzen erfolgt. In general, it is provided according to the invention that if no predeterminable character set or a character set differing from the predefinable character set has been transmitted, a character set is selected from a plurality of character sets according to the fact that the defined possible character sets have ranges in which no characters or specific characters are defined. and then, if within the program information such undefined or certain characters occur, it is assumed that this character set is not to be used for decoding, whereby this check is run through with all possible character sets, thus creating a list of character sets which are not to be used which concludes the remaining character set used as a default character set. That is, by detecting undefined or specific characters transmitted with the program information, a selection of that character set to be used as a default character set is made from a plurality of preceding character sets.
Innerhalb der in ETSI EN 300 468 festgelegten möglichen Zeichensätze gibt es immer Bereiche, in den keine Zeichen definiert sind. (z.B. bei ISO 8859-8 der Bereich OxCO - OxDE. Within the possible character sets defined in ETSI EN 300 468, there are always areas in which no characters are defined. (eg for ISO 8859-8 the range OxCO - OxDE.
Sollten innerhalb der Programm-Informationen solche nicht definierten Zeichen vorkommen, so ist dies ein sicheres Zeichen, dass dieser Zeichensatz zur Dekodierung nicht verwendet werden sollte. Wird dieser Test mit allen möglichen Zeichensätzen durchlaufen, so entsteht eine Liste an Zeichensätzen, die nicht zu verwenden sind. Idealerweise kann so auf den verbleibenden Zeichensatz geschlossen werden. If such undefined characters occur within the program information, this is a sure sign that this character set should not be used for decoding. If this test is run through with all possible character sets, a list of character sets is created that are not to be used. Ideally, this is how the remaining character set can be deduced.
Lösung für (2): in Weiterbildung der Erfindung ist vorgesehen, dass als bestimmte Zeichen diakritische Zeichen verwendet werden. Solution for (2): in a further development of the invention it is provided that diacritical marks are used as specific characters.
In Default-Zeichensatz ISO 6937 gibt es die Möglichkeit, sogenannte diakritische Zeichen zu kodieren (z.B. Ä, E, ], Ö, Ü, ä, e, i, ö, u). Diese Zeichen besitzen eine besondere Kodierungsart, die so in anderen Zeichensätzen nicht vorkommt. In the default character set ISO 6937 there is the possibility to code so-called diacritics (for example, Ä, E,], Ö, Ü, ä, e, i, ö, u). These characters have a special encoding style that does not appear in other fonts.
Sie beginnen immer mit einem Code 0xC1 - OxCF (dem Akzent), gefolgt von einem zweiten Zeichen (dem Buchstaben). Mögliche Kombinationen sind genau festgelegt: They always start with a code 0xC1 - OxCF (the accent), followed by a second character (the letter). Possible combinations are specified:
Bevor nun eine Programminformation nach dem signalisierten Zeichensatz übersetzt wird, wird nach der Erfindung geprüft, ob darin eine solche Kombination auftritt: Before a program information is translated according to the signaled character set, it is checked according to the invention whether such a combination occurs therein:
Tritt eine solche Kombination auf, so kann nahezu sicher davon ausgegangen werden, dass der Default Zeichensatz zu verwenden ist. In diesem Fall wird nicht nach dem signalisierten Zeichensatz übersetzt, sondern mit ISO 6937. Sollte eine solche Kombination aus 0xC1 -0xC2 und dem zweiten Zeichen nicht vorkommen, so wird weiterhin der signalisierte Zeichensatz verwendet. If such a combination occurs, it can almost certainly be assumed that the default character set is to be used. In this case, it is not translated according to the signaled character set, but with ISO 6937. If such a combination of 0xC1 -0xC2 and the second character does not occur, then the signaled character set will continue to be used.
Verfahren zur Erkennung eines Zeichensatzes, wobei es in einem Default-Zeichensatz ISO 6937 die Möglichkeit gibt, sogenannte diakritische Zeichen zu kodieren und diese Zeichen eine besondere Kodierungsart besitzen, die so in anderen Zeichensätzen nicht vorkommt, bei dem, bevor eine Programminformation nach dem signalisierten Zeichensatz übersetzt wird, geprüft wird, ob darin eine solche Kombination auftritt, wobei dann, wenn eine solche Kombination auftritt, davon ausgegangen wird, dass der Default-Zeichensatz zu verwenden ist und in diesem Fall nicht nach dem signalisierten Zeichensatz, sondern mit ISO 6937 übersetzt wird, wobei ansonsten, wenn eine solche Kombination aus 0xC1 -0xC2 und dem zweiten Zeichen nicht vorkommt, weiterhin der signalisierte Zeichensatz verwendet wird. A method for recognizing a character set, wherein in a default character set ISO 6937 gives the possibility to code so-called diacritical marks and these characters have a special coding type which does not occur in other character sets, in which, before program information after the signaled character set If such a combination occurs, then it is assumed that the default character set is to be used and in this case is not translated according to the signaled character set but with ISO 6937 Otherwise, if such a combination of 0xC1 -0xC2 and the second character does not occur, the signaled character set will continue to be used.
Claims
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| DE102017108042.2 | 2017-04-13 | ||
| DE102017108042 | 2017-04-13 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018189308A1 true WO2018189308A1 (en) | 2018-10-18 |
Family
ID=62027967
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/EP2018/059417 Ceased WO2018189308A1 (en) | 2017-04-13 | 2018-04-12 | Character set identification |
Country Status (2)
| Country | Link |
|---|---|
| DE (1) | DE102018108693A1 (en) |
| WO (1) | WO2018189308A1 (en) |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7148824B1 (en) * | 2005-08-05 | 2006-12-12 | Xerox Corporation | Automatic detection of character encoding format using statistical analysis of the text strings |
| US7191114B1 (en) * | 1999-08-27 | 2007-03-13 | International Business Machines Corporation | System and method for evaluating character sets to determine a best match encoding a message |
| EP2169950A2 (en) * | 2008-09-30 | 2010-03-31 | Kabushiki Kaisha Toshiba | Character code conversion apparatus and character code conversion method |
-
2018
- 2018-04-12 WO PCT/EP2018/059417 patent/WO2018189308A1/en not_active Ceased
- 2018-04-12 DE DE102018108693.8A patent/DE102018108693A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7191114B1 (en) * | 1999-08-27 | 2007-03-13 | International Business Machines Corporation | System and method for evaluating character sets to determine a best match encoding a message |
| US7148824B1 (en) * | 2005-08-05 | 2006-12-12 | Xerox Corporation | Automatic detection of character encoding format using statistical analysis of the text strings |
| EP2169950A2 (en) * | 2008-09-30 | 2010-03-31 | Kabushiki Kaisha Toshiba | Character code conversion apparatus and character code conversion method |
Also Published As
| Publication number | Publication date |
|---|---|
| DE102018108693A1 (en) | 2018-10-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| DE10342594B4 (en) | Method and system for collecting data from a plurality of machine readable documents | |
| DE69423254T2 (en) | Method and device for automatic speech recognition of documents | |
| DE19547812C2 (en) | Character string reader | |
| DE10162156B4 (en) | The user navigation through multimedia file content supporting system and method | |
| DE69937176T2 (en) | Segmentation method to extend the active vocabulary of speech recognizers | |
| DE69525401T2 (en) | Method and device for identifying words described in a portable electronic document | |
| DE602005002473T2 (en) | Method for recognizing semantic units in an electronic document | |
| DE19708184A1 (en) | Method for speech recognition with language model adaptation | |
| DE102012023022A1 (en) | Method for detecting traffic sign in image data used in motor vehicle, involves applying classifier to segmented image data to recognize object as a specific traffic sign, when probability value is greater than predetermined threshold | |
| EP1918104A2 (en) | Method for testing an imprint and imprint testing device | |
| WO2018189308A1 (en) | Character set identification | |
| DE102004009617A1 (en) | Method and device for coding and decoding structured documents | |
| DE69331035T2 (en) | Character recognition system | |
| DE10339971A1 (en) | Method for coding an XML-based document | |
| DE69839144T2 (en) | Image coding apparatus using image pattern coding | |
| DE19649692C1 (en) | Procedure for the verification of a sample lettering with the help of a reference lettering | |
| EP2845145A1 (en) | Apparatus and method for comparing two files containing graphics elements and text elements | |
| DE102023203660A1 (en) | Computer-implemented method and device for machine learning of facts, in particular for filling a knowledge base | |
| DE102008014611A1 (en) | Method for displaying meta-information and device | |
| DE102024107248A1 (en) | Computer-implemented method and device for the automated identification of components on a printed circuit board | |
| DE19911535A1 (en) | Language and speech recognition method dynamically matching vocabulary to contents to be recognized, such as Internet sides | |
| DE3147225A1 (en) | Method and device for reading contrasting characters | |
| EP2466488A2 (en) | Method and system of computer assisted proofreading during translation | |
| DE10240133A1 (en) | Equivalence comparison method for comparison of digital circuits during design, whereby an initial stored description is converted into at least two circuit descriptions in a second format followed by an equivalence comparison | |
| DE102023128344A1 (en) | Method for modifying a search query for a text-based image search |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18719095 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 18719095 Country of ref document: EP Kind code of ref document: A1 |