[go: up one dir, main page]

CN106484673A - A kind of Chinese event method for expressing towards cognitive analysis - Google Patents

A kind of Chinese event method for expressing towards cognitive analysis Download PDF

Info

Publication number
CN106484673A
CN106484673A CN201610810553.2A CN201610810553A CN106484673A CN 106484673 A CN106484673 A CN 106484673A CN 201610810553 A CN201610810553 A CN 201610810553A CN 106484673 A CN106484673 A CN 106484673A
Authority
CN
China
Prior art keywords
event
chinese
cognitive analysis
initiator
receiver
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610810553.2A
Other languages
Chinese (zh)
Inventor
贺成龙
徐琳
葛唯益
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 28 Research Institute
Original Assignee
CETC 28 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 28 Research Institute filed Critical CETC 28 Research Institute
Priority to CN201610810553.2A priority Critical patent/CN106484673A/en
Publication of CN106484673A publication Critical patent/CN106484673A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of Chinese event method for expressing towards cognitive analysis, the present invention proposes a kind of structurized representations of events model, and proposes the assignment method of the determinant attributes such as the event type based on Chinese text analyzing and processing.The present invention supports the attribute extraction of field social eventss and the expression such as political, military, diplomatic, is that expression, tissue, storage management and the cognitive analysis application of magnanimity event data provides realistic feasibility.

Description

一种面向认知分析的中文事件表示方法A Representation Method for Chinese Events Oriented to Cognitive Analysis

技术领域technical field

本发明涉及一种事件表示方法,特别是涉及一种面向认知分析的中文事件表示方法。The invention relates to an event representation method, in particular to a cognitive analysis-oriented Chinese event representation method.

背景技术Background technique

事件是关联参与者、时间、地点和行为等要素的知识单元,能够反映现实世界中的运动和变化。海量互联网等信息来源中的包含大量以文本形式表示的各类社会事件。事件是新闻等文本记述的主体,以认知为目标,建立事件模型并实现事件信息的提取,以结构化的方式清晰地表达信息承载的主要语义内容,是实现海量文本信息有效组织和机器理解的基础。以面向认知的事件表示模型为基础,从海量中文文本信息中提取政治、军事、外交等领域的社会事件信息,形成社会事件数据储备,并在此基础上进行事件的认知计算,是大数据时代进行社会认知计算的重要途径。An event is a knowledge unit that links elements such as participants, time, place, and behavior, and can reflect movements and changes in the real world. Massive Internet and other information sources contain a large number of various social events expressed in text form. Events are the main body of text descriptions such as news. Taking cognition as the goal, establishing an event model and realizing the extraction of event information, and clearly expressing the main semantic content of the information in a structured manner is the key to realizing the effective organization and machine understanding of massive text information. Foundation. Based on the cognition-oriented event representation model, extracting social event information in political, military, diplomatic and other fields from massive Chinese text information, forming a social event data reserve, and performing cognitive calculation of events on this basis, is a large An important approach to social cognitive computing in the data age.

建立一种面向认知分析的中文事件表示方法,需要根据事件的关键属性建立结构化表示模型,并对事件性质建立分类体系,并给出依该体系构建分类器的基本方法。目前,国际上已有若干针对英文的事件分类体系,如政治事件分类体系CAMEO等,但针对中文的事件分类体系尚未形成标准。To establish a Chinese event representation method for cognitive analysis, it is necessary to establish a structured representation model based on the key attributes of the event, and establish a classification system for the nature of the event, and give a basic method for building a classifier based on this system. At present, there are several event classification systems for English in the world, such as the political event classification system CAMEO, etc., but the event classification system for Chinese has not yet formed a standard.

发明内容Contents of the invention

发明目的:本发明的目的是提供一种能够解决现有技术中存在的缺陷的面向认知分析的中文事件表示方法。Purpose of the invention: The purpose of the present invention is to provide a Chinese event representation method oriented to cognitive analysis that can solve the defects in the prior art.

技术方案:本发明所述的面向认知分析的中文事件表示方法,该方法包括以下步骤:Technical solution: the Chinese event representation method oriented to cognitive analysis of the present invention, the method comprises the following steps:

S1:输入待处理的中文文本,进行格式调整和语句分割这两种预处理;S1: Input the Chinese text to be processed, and perform two preprocessing of format adjustment and sentence segmentation;

S2:采用自然语言处理工具对文本进行句法解析,准确识别主语、宾语实体和谓语动词,识别对应语句中的时间、地点实体;S2: Use natural language processing tools to analyze the text syntactically, accurately identify the subject, object entity and predicate verb, and identify the time and place entities in the corresponding sentence;

S3:进行事件信息提取,包括事件发生的时间、地点及其经纬度编码、事件的发起者与承受者及其所属国家、事件动作,事件的报道时间、重复报道次数和原始信息来源;S3: Extract event information, including the time and location of the event and its latitude and longitude code, the initiator and recipient of the event and their country, event action, event reporting time, number of repeated reports and original information sources;

S4:进行事件性质判定,根据步骤S3中提取的事件动作信息,辅助考虑事件发起者与承受者的社会属性,将事件划归为20类事件中的一类;S4: Determine the nature of the event, and classify the event into one of the 20 types of events based on the event action information extracted in step S3 and assisting in considering the social attributes of the event initiator and recipient;

S5:基于步骤S3和S4的处理结果,将事件进行结构化表示和编码,加入事件数据库。S5: Based on the processing results of steps S3 and S4, the event is structured and coded, and added to the event database.

进一步,事件性质分类体系针对政治、军事和外交领域的社会事件,包括公开声明、呼吁、表达合作意向、商议、进行外交合作、进行实质合作、提供援助、让步、调查、要求、不赞成、拒绝、威胁、抗议、展示军事姿态、减少关系、强迫、侵犯、战斗和进行非常规大规模暴力这20类。Further, the event classification system is aimed at social events in the political, military, and diplomatic fields, including public statements, appeals, expressions of cooperation intentions, discussions, diplomatic cooperation, substantive cooperation, assistance, concessions, investigations, demands, disapproval, and rejections. , Threats, Protests, Military Posturing, Reduced Relations, Coercion, Violation, Combat, and Unconventional Mass Violence.

有益效果:与现有技术相比,本发明具有如下的有益效果:Beneficial effects: compared with the prior art, the present invention has the following beneficial effects:

1)建立了结构化的事件表示模型,以此为基础,能够实现事件信息的提取,以结构化的方式清晰地表达信息承载的主要语义内容;1) A structured event representation model is established, based on which, the extraction of event information can be realized, and the main semantic content carried by the information can be clearly expressed in a structured way;

2)建立了事件性质分类体系,并给出依该体系构建分类器的基本方法,能够依此实现事件性质的判定,为认知计算提供量化基础。2) The classification system of event nature is established, and the basic method of constructing a classifier based on this system is given, which can realize the judgment of event nature and provide a quantitative basis for cognitive computing.

附图说明Description of drawings

图1为本发明的方法流程图。Fig. 1 is a flow chart of the method of the present invention.

具体实施方式detailed description

本发明公开了一种面向认知分析的中文事件表示方法,如图1所示,该方法包括以下步骤:The invention discloses a Chinese event representation method oriented to cognitive analysis, as shown in Figure 1, the method includes the following steps:

S1:输入待处理的中文文本,进行格式调整和语句分割这两种预处理;S1: Input the Chinese text to be processed, and perform two preprocessing of format adjustment and sentence segmentation;

S2:采用自然语言处理工具对文本进行句法解析,准确识别主语、宾语实体和谓语动词,识别对应语句中的时间、地点实体;S2: Use natural language processing tools to analyze the text syntactically, accurately identify the subject, object entity and predicate verb, and identify the time and place entities in the corresponding sentence;

S3:依据事件表示模型进行事件信息提取,包括事件发生的时间、地点及其经纬度编码、事件的发起者与承受者及其所属国家、事件动作,事件的报道时间、重复报道次数和原始信息来源;S3: Extract event information based on the event representation model, including the time and location of the event and its latitude and longitude code, the initiator and recipient of the event and their country, event action, event reporting time, number of repeated reports and original information source ;

S4:进行事件性质判定,根据步骤S3中提取的事件动作信息,辅助考虑事件发起者与承受者的社会属性,依据分类器,将事件划归为20类事件中的一类;S4: Determine the nature of the event, according to the event action information extracted in step S3, assist in considering the social attributes of the event initiator and recipient, and classify the event into one of the 20 types of events according to the classifier;

S5:基于步骤S3和S4的处理结果,将事件进行结构化表示和编码,加入事件数据库。S5: Based on the processing results of steps S3 and S4, the event is structured and coded, and added to the event database.

事件表示模型如表1所示。模型由ID、PostTime、EventTime、StoryNum、Actor1Name、Actor1Country、Actor1Lat、Actor1Long、Actor2Name、Actor2Country、Actor2Lat、Actor2Long、Action、ActionCountry、ActionLat、ActionLong、Category、Content、URL等字段组成,如表1所示,用于记录有关于什么时间、于什么地点、什么人物(或组织)对什么人物(或组织)做了什么类型的事情。The event representation model is shown in Table 1. The model consists of ID, PostTime, EventTime, StoryNum, Actor1Name, Actor1Country, Actor1Lat, Actor1Long, Actor2Name, Actor2Country, Actor2Lat, Actor2Long, Action, ActionCountry, ActionLat, ActionLong, Category, Content, URL and other fields, as shown in Table 1. It is used to record when, where, what person (or organization) did what type of thing to what person (or organization).

表1事件表示模型示例Table 1 Event representation model example

其中,in,

ID:事件的全局统一标识;ID: global uniform identifier of the event;

PostTime:事件被报道/发布的日期,格式为YYYYMMDD;PostTime: The date the event was reported/published, in the format YYYYMMDD;

EventTime:事件发生的日期,格式为YYYYMMDD;EventTime: the date when the event occurred, in the format of YYYYMMDD;

StoryNum:事件在不同数据源中被重复提取到的次数,如不同新闻媒体对同一事件的重复报道次数,该属性用于衡量事件的舆论重要性;StoryNum: The number of times an event is repeatedly extracted from different data sources, such as the number of repeated reports of the same event by different news media. This attribute is used to measure the importance of public opinion of the event;

Actor1Name:事件发起者的名称,例如国家、省、市的全称,人名或组织的名称等;Actor1Name: The name of the initiator of the event, such as the full name of the country, province, city, person or organization, etc.;

Actor1Country:事件发起者所属的国家,使用三位ISO 3166国家编码;Actor1Country: The country to which the event initiator belongs, using the three-digit ISO 3166 country code;

Actor1Lat:事件发起者所属行政区划的地理位置纬度;Actor1Lat: the geographical latitude of the administrative division to which the event initiator belongs;

Actor1Long:事件发起者所属行政区划的地理位置经度;Actor1Long: the geographic longitude of the administrative division to which the event initiator belongs;

Actor2Name事件承受者的名称,例如国家、省、市的全称,人名或组织的名称等;Actor2Name The name of the event recipient, such as the full name of the country, province, city, person or organization, etc.;

Actor2Country:事件承受者所属的国家,使用三位ISO 3166国家编码;Actor2Country: The country to which the event recipient belongs, using the three-digit ISO 3166 country code;

Actor2Lat:事件发起者所属行政区划的地理位置纬度;Actor2Lat: the geographical latitude of the administrative division to which the event initiator belongs;

Actor2Long:事件发起者所属行政区划的地理位置经度;Actor2Long: the geographic longitude of the administrative division to which the event initiator belongs;

Action:事件的行为;Action: the behavior of the event;

ActionCountry:事件发生地所属的国家,使用三位ISO 3166国家编码;ActionCountry: The country where the event occurred, using the three-digit ISO 3166 country code;

ActionLat:事件发生地所属行政区划的地理位置纬度;ActionLat: the geographical latitude of the administrative division where the event occurred;

ActionLong:事件发生地所属行政区划的地理位置纬度;ActionLong: the geographic latitude and longitude of the administrative division where the event occurred;

Category:事件类型,根据步骤S2进行分类判定;Category: event type, classified and judged according to step S2;

Content:抽取该事件对应的文本原文内容;Content: Extract the original text content corresponding to the event;

URL:该事件被抽取的原文对应的网页URL;非网络来源的该字段可空缺。URL: URL of the webpage corresponding to the original text extracted from this event; this field can be left blank for non-network sources.

本发明可用于中文新闻等文本的处理,应用于从海量中文文本信息中提取政治、军事、外交等领域的社会事件信息并进行结构化表示,形成社会事件数据储备,支持在此基础上进行事件的认知计算,为海量事件数据的表示、组织、存储管理和认知分析应用提供了现实可行性,为实现海量文本信息有效组织和机器理解奠定基础。The present invention can be used for processing texts such as Chinese news, and is applied to extract social event information in the fields of politics, military affairs, diplomacy, etc. from massive Chinese text information and perform structured representation to form a social event data reserve, and support events based on this Cognitive computing provides practical feasibility for the representation, organization, storage management and cognitive analysis of massive event data, and lays the foundation for the effective organization and machine understanding of massive text information.

本发明提供了一种面向认知分析的中文事件表示方法,具体实现该技术方案的方法和途径很多,以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。本实施例中未明确的各组成部分均可用现有技术加以实现。The present invention provides a Chinese event representation method oriented to cognitive analysis. There are many methods and approaches to realize this technical solution. The above description is only a preferred implementation mode of the present invention. It should be pointed out that for those of ordinary skill in the art In other words, without departing from the principle of the present invention, several improvements and modifications can be made, and these improvements and modifications should also be regarded as the protection scope of the present invention. All components that are not specified in this embodiment can be realized by existing technologies.

Claims (2)

1. A Chinese event representation method facing cognitive analysis is characterized in that: the method comprises the following steps:
s1: constructing a structured event representation model, inputting a Chinese text to be processed, and carrying out two pre-processes of format adjustment and sentence segmentation;
s2: the method comprises the steps of adopting a natural language processing tool to carry out syntactic analysis on a text, accurately identifying subject, object entities and predicate verbs, and identifying time and place entities in corresponding sentences;
s3: extracting event information, including time, place and longitude and latitude codes of the event, initiator and receiver of the event, country of the initiator and receiver, event action, event report time, repeated report times and original information source;
s4: judging the event property, and classifying the event into one of 20 types of events according to the event action information extracted in the step S3 by taking the social attributes of the event initiator and the receiver into auxiliary consideration;
s5: based on the processing results of steps S3 and S4, the event is structurally represented and encoded, and added to the event database.
2. The cognitive analysis-oriented chinese event representation method of claim 1, wherein: the event property classification system aims at 20 types of social events in the fields of politics, military affairs and outturn, including public declaration, call, expression of intention to collaborate, negotiation, outturn collaboration, substantial collaboration, assistance provision, concession, investigation, requirement, disapproval, rejection, threat, resistance, military posture display, relationship reduction, forcing, offender, battle and unconventional large-scale violence.
CN201610810553.2A 2016-09-09 2016-09-09 A kind of Chinese event method for expressing towards cognitive analysis Pending CN106484673A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610810553.2A CN106484673A (en) 2016-09-09 2016-09-09 A kind of Chinese event method for expressing towards cognitive analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610810553.2A CN106484673A (en) 2016-09-09 2016-09-09 A kind of Chinese event method for expressing towards cognitive analysis

Publications (1)

Publication Number Publication Date
CN106484673A true CN106484673A (en) 2017-03-08

Family

ID=58273561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610810553.2A Pending CN106484673A (en) 2016-09-09 2016-09-09 A kind of Chinese event method for expressing towards cognitive analysis

Country Status (1)

Country Link
CN (1) CN106484673A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408806A (en) * 2018-09-11 2019-03-01 中国电子科技集团公司第二十八研究所 A kind of Event Distillation method based on English grammar rule
CN109446513A (en) * 2018-09-18 2019-03-08 中国电子科技集团公司第二十八研究所 The abstracting method of event in a kind of text based on natural language understanding
CN110222032A (en) * 2019-05-22 2019-09-10 武汉掌游科技有限公司 A kind of generalised event model based on software data analysis

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101819573A (en) * 2009-09-15 2010-09-01 电子科技大学 Self-adaptive network public opinion identification method
US20150154263A1 (en) * 2013-12-02 2015-06-04 Qbase, LLC Event detection through text analysis using trained event template models

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101819573A (en) * 2009-09-15 2010-09-01 电子科技大学 Self-adaptive network public opinion identification method
US20150154263A1 (en) * 2013-12-02 2015-06-04 Qbase, LLC Event detection through text analysis using trained event template models

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
付剑锋: "面向事件的知识处理研究", 《中国博士学位论文全文数据库信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408806A (en) * 2018-09-11 2019-03-01 中国电子科技集团公司第二十八研究所 A kind of Event Distillation method based on English grammar rule
CN109446513A (en) * 2018-09-18 2019-03-08 中国电子科技集团公司第二十八研究所 The abstracting method of event in a kind of text based on natural language understanding
CN109446513B (en) * 2018-09-18 2023-06-20 中国电子科技集团公司第二十八研究所 Extraction method of events in text based on natural language understanding
CN110222032A (en) * 2019-05-22 2019-09-10 武汉掌游科技有限公司 A kind of generalised event model based on software data analysis

Similar Documents

Publication Publication Date Title
Pons et al. Impact of Corporate Social Responsibility in mining industries
Sathe et al. Automated fact-checking of claims from Wikipedia
Rudkowsky et al. More than bags of words: Sentiment analysis with word embeddings
Schrodt et al. Three’sa charm?: Open event data coding with el: Diablo, Petrarch, and the open event data alliance
CN106055658A (en) Extraction method aiming at Twitter text event
CN103294664A (en) Method and system for discovering new words in open fields
CN109408806A (en) A kind of Event Distillation method based on English grammar rule
CN101782897A (en) Chinese corpus labeling method based on events
Zhang et al. Rise and fall of the global conversation and shifting sentiments during the COVID-19 pandemic
CN108305180A (en) A kind of friend recommendation method and device
CN111191413B (en) Method, device and system for automatically marking event core content based on graph sequencing model
CN107305545A (en) A kind of recognition methods of the network opinion leader based on text tendency analysis
CN115618006A (en) Automatic construction system of knowledge graph and working method thereof
CN106484673A (en) A kind of Chinese event method for expressing towards cognitive analysis
Lyu et al. Exploring temporal and multilingual dynamics of post-disaster social media discourse: A case of fukushima daiichi nuclear accident
Liu et al. Epic30m: An epidemics corpus of over 30 million relevant tweets
CN106503256A (en) A kind of hot information method for digging based on social networkies document
Algur et al. Sentiment analysis by identifying the speaker's polarity in Twitter data
CN116186422A (en) Disease-related public opinion analysis system based on social media and artificial intelligence
Sagcan et al. Toponym recognition in social media for estimating the location of events
Dandapat Is gender-based violence a confluence of culture? Empirical evidence from social media
CN119227661A (en) Evaluation report generation method, device, electronic device and storage medium
Liu et al. Research on relation extraction of named entity on social media in smart cities
CN113609297A (en) Public opinion monitoring method and device for court industry
Merson et al. A text mining approach to identify and analyse prominent issues from public complaints

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170308

RJ01 Rejection of invention patent application after publication