[go: up one dir, main page]

WO1999041680A3 - Segmentation de mots dans un texte chinois - Google Patents

Segmentation de mots dans un texte chinois Download PDF

Info

Publication number
WO1999041680A3
WO1999041680A3 PCT/IB1999/000320 IB9900320W WO9941680A3 WO 1999041680 A3 WO1999041680 A3 WO 1999041680A3 IB 9900320 W IB9900320 W IB 9900320W WO 9941680 A3 WO9941680 A3 WO 9941680A3
Authority
WO
WIPO (PCT)
Prior art keywords
words
characters
combination
character
facility
Prior art date
Application number
PCT/IB1999/000320
Other languages
English (en)
Other versions
WO1999041680A2 (fr
Inventor
Andi Wu
Stephen D Richardson
Zixin Jiang
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Priority to JP2000531795A priority Critical patent/JP4573432B2/ja
Priority to EP99902779A priority patent/EP1055182A2/fr
Publication of WO1999041680A2 publication Critical patent/WO1999041680A2/fr
Publication of WO1999041680A3 publication Critical patent/WO1999041680A3/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/268Morphological analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/53Processing of non-Latin text

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

L'invention concerne une fonction permettant de sélectionner des combinaisons de caractères pouvant être des mots, dans une séquence de caractères en langage naturel. Cette fonction utilise, pour chaque séquence de caractères, des indications (a) sur les caractères qui apparaissent en seconde position dans des mots commençant par ledit caractère et (b) sur les positions dans lesquelles ledit caractère apparaît dans des mots. Pour chaque combinaison contiguë d'une séquence de caractères, la fonction détermine si le caractère qui apparaît en seconde position de la combinaison est signalé comme apparaissant dans des mots qui commencent par ledit caractère en première position de la combinaison. Le cas échéant, la fonction détermine si chaque caractère de la combinaison est signalé comme apparaissant dans des mots à une position dans laquelle ledit caractère apparaît dans la combinaison. Le cas échéant, la fonction établit que la combinaison de caractères peut être un mot. Selon certains modes de réalisation, la fonction compare la combinaison de caractères à une liste de mots valides de manière à déterminer si la combinaison de caractères est un mot.
PCT/IB1999/000320 1998-02-13 1999-01-13 Segmentation de mots dans un texte chinois WO1999041680A2 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2000531795A JP4573432B2 (ja) 1998-02-13 1999-01-13 漢字文における単語区分方法
EP99902779A EP1055182A2 (fr) 1998-02-13 1999-01-13 Segmentation de mots dans un texte chinois

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US2358698A 1998-02-13 1998-02-13
US09/023,586 1998-02-13

Publications (2)

Publication Number Publication Date
WO1999041680A2 WO1999041680A2 (fr) 1999-08-19
WO1999041680A3 true WO1999041680A3 (fr) 1999-11-25

Family

ID=21816034

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB1999/000320 WO1999041680A2 (fr) 1998-02-13 1999-01-13 Segmentation de mots dans un texte chinois

Country Status (4)

Country Link
EP (1) EP1055182A2 (fr)
JP (2) JP4573432B2 (fr)
CN (1) CN1114165C (fr)
WO (1) WO1999041680A2 (fr)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6810375B1 (en) * 2000-05-31 2004-10-26 Hapax Limited Method for segmentation of text
CN1545665A (zh) * 2001-06-29 2004-11-10 英特尔公司 用于多分析器架构的预测串联算法
FR2880708A1 (fr) * 2005-01-11 2006-07-14 Vision Objects Sa Procede de recherche dans l'encre par conversion dynamique de requete.
CN100424685C (zh) * 2005-09-08 2008-10-08 中国科学院自动化研究所 一种基于标点处理的层次化汉语长句句法分析方法及装置
US8310461B2 (en) 2010-05-13 2012-11-13 Nuance Communications Inc. Method and apparatus for on-top writing
CN103177089A (zh) * 2013-03-08 2013-06-26 北京理工大学 基于中心块的句义成分关系分层识别方法
CN107748744B (zh) * 2017-10-31 2021-01-26 广东小天才科技有限公司 一种勾勒框知识库的建立方法及装置
CN110955748B (zh) * 2018-09-26 2022-10-28 华硕电脑股份有限公司 语意处理方法、电子装置以及非暂态电脑可读取记录媒体
CN109670123B (zh) * 2018-12-28 2021-02-26 杭州迪普科技股份有限公司 一种数据处理的方法和装置

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5806021A (en) * 1995-10-30 1998-09-08 International Business Machines Corporation Automatic segmentation of continuous text using statistical approaches

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2798931B2 (ja) * 1988-04-26 1998-09-17 健 楠井 中国語の語音区切方式および語音漢字変換方式
US5448474A (en) * 1993-03-03 1995-09-05 International Business Machines Corporation Method for isolation of Chinese words from connected Chinese text
JPH08339383A (ja) * 1995-04-11 1996-12-24 Ricoh Co Ltd 文書検索装置及び辞書作成装置

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5806021A (en) * 1995-10-30 1998-09-08 International Business Machines Corporation Automatic segmentation of continuous text using statistical approaches

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHARNG-KANG FAN ET AL.: "Automatic Word Identification in Chinese Sentences by the Relaxation Technique", COMPUTER PROCESSING OF CHINESE & ORIENTAL LANGUAGES, vol. 4, no. 1, November 1988 (1988-11-01), pages 33 - 56, XP002114839 *
XIAOHONG HUANG ET AL: "A Quick Method for Chinese Word Segmentation", IEEE CONF. ON INTELLIGENT PROCESSING SYSTEMS, 28 October 1997 (1997-10-28) - 31 October 1997 (1997-10-31), pages 1773 - 1776, XP002114838 *

Also Published As

Publication number Publication date
JP2002503849A (ja) 2002-02-05
JP4573432B2 (ja) 2010-11-04
CN1114165C (zh) 2003-07-09
JP2010157260A (ja) 2010-07-15
JP5100770B2 (ja) 2012-12-19
CN1290371A (zh) 2001-04-04
EP1055182A2 (fr) 2000-11-29
WO1999041680A2 (fr) 1999-08-19

Similar Documents

Publication Publication Date Title
AU2001290464A1 (en) Method for normalizing case
WO1999062000A3 (fr) Systeme de verification orthographique et grammaticale
CY2579B1 (en) Text processor
WO2006052858A8 (fr) Appareil et procede fournissant une indication visuelle de l'ambiguite d'un caractere pendant une saisie textuelle
WO2006039398A3 (fr) Procedes et systemes de selection d'un langage de segmentation de texte
HK1046786A1 (zh) 具有自动校正功能的键盘系统
WO1999008390A3 (fr) Procede de saisie de texte japonais a l'aide d'un clavier ne possedant que des caracteres kana de base
TW428137B (en) Sentence processing apparatus and method thereof
EP1178408A3 (fr) Segmenteur pour un système de traitement de langues naturelles
JP2002517039A5 (fr)
WO1999041680A3 (fr) Segmentation de mots dans un texte chinois
DE60045283D1 (fr)
US5619563A (en) Mnemonic number dialing plan
UA24036C2 (uk) Словhик алфавітhої іhоземhої мови
EP1359515A3 (fr) Système et procédé de filtration de langage extrême-orientale
EP1248183A3 (fr) Systeme de resolution d'ambiguites pour clavier reduit
US20200273370A1 (en) Interlinear targum
CN1200332C (zh) 一种汉字计算机输入方法
CN101957663B (zh) 五笔汉字输入方法
WO2009116835A3 (fr) Dispositif et procédé de saisie de caractères japonais
Tumasonis Encoding of Lithuanian accented letters
EP1113413A3 (fr) Poste de travail avec antemémoire pour police de caractères
Cartlidge A Book of Middle English.
Siu-Pong et al. 3Cantonese Romanization
Vasilev The Numerals from 200 to 900 in Serbo-Croatian

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 99802944.0

Country of ref document: CN

AK Designated states

Kind code of ref document: A2

Designated state(s): CA CN JP KR

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): CA CN JP KR

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

NENP Non-entry into the national phase

Ref country code: KR

WWE Wipo information: entry into national phase

Ref document number: 1999902779

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1999902779

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1999902779

Country of ref document: EP