[go: up one dir, main page]

CN103886051A - Comment analysis method based on entities and features - Google Patents

Comment analysis method based on entities and features Download PDF

Info

Publication number
CN103886051A
CN103886051A CN201410093275.4A CN201410093275A CN103886051A CN 103886051 A CN103886051 A CN 103886051A CN 201410093275 A CN201410093275 A CN 201410093275A CN 103886051 A CN103886051 A CN 103886051A
Authority
CN
China
Prior art keywords
entity
comment
module
mainly used
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410093275.4A
Other languages
Chinese (zh)
Inventor
秦志光
周尔强
罗熹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201410093275.4A priority Critical patent/CN103886051A/en
Publication of CN103886051A publication Critical patent/CN103886051A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

本发明的提出了一种基于实体及特征的评论分析方法。本发明属于自然语言处理领域。目的是进行评论文本分析。通过利用自然语言处理相关手段,对评论文本进行处理,得到评论文本的实体树及相关实体的特征。进一步利用评论文本的实体及特征对文本进行信息提取。对于舆情分析,关系提取,倾向性分析等等其他评论分析工作有着推动作用。

The present invention proposes a review analysis method based on entities and features. The invention belongs to the field of natural language processing. The purpose is to conduct review text analysis. By using related means of natural language processing, the comment text is processed, and the entity tree of the comment text and the characteristics of related entities are obtained. Further use the entities and features of the comment text to extract information from the text. It has a driving effect on public opinion analysis, relationship extraction, tendency analysis and other comment analysis work.

Description

一种基于实体及特征的评论分析方法A Review Analysis Method Based on Entity and Features

技术领域technical field

本发明属于自然语言处理领域,更为具体地讲,涉及一种基于实体及特征的评论分析方法。The invention belongs to the field of natural language processing, and more specifically relates to a comment analysis method based on entities and features.

背景技术Background technique

随着web2.0时代的到来,网络的评论信息数量呈现爆炸式的增长。假如你的公司发布了一款新的产品。新产品发布后带来了来自不同媒体的相关报道,还有来着各大门户网站的相关评论。面对着这些评论,你也许迫切的希望了解用户究竟对产品的哪一些方面更为关注,用户对这一款产品的评价究竟如何。当然获取上述的信息通过人工的方式是几乎不可能完成。这就对计算机处理上述数据得到想要的结果提出了要求。本发明的基于实体及特征的评论文本的分析方法通过构建实体以及相关实体特征,对上述数据进行分析并得到的结果。With the advent of the web2.0 era, the amount of comment information on the Internet has shown explosive growth. Suppose your company releases a new product. After the release of new products, related reports from different media, as well as related comments from major portal websites. Faced with these comments, you may be eager to know which aspects of the product users pay more attention to, and how users evaluate this product. Of course, it is almost impossible to obtain the above-mentioned information manually. This puts forward requirements for the computer to process the above data to obtain the desired results. The method for analyzing comment text based on entities and features of the present invention analyzes the above data and obtains results by constructing entities and related entity features.

发明内容Contents of the invention

本发明的最终目的是对评论文本进行分析。本发明通过对评论大量评论文本的实体及特征的提取,构建自己的实体及特征框架分析结构,进一步帮助评论文本进行分析,进行信息提取。The ultimate purpose of the present invention is to analyze the review text. The present invention constructs its own entity and feature framework analysis structure by extracting entities and features of a large number of comment texts, and further helps comment texts to be analyzed and information extracted.

为了实现上述目的,本发明基于实体及特征评论文本的分析方法,其方法构成主要由以下特征构成:In order to achieve the above object, the present invention is based on the analysis method of entity and feature comment text, and its method mainly consists of the following features:

—评论数据采集模块。主要用于采集相关领域的评论数据。通过网络爬虫或者其他方法获取大量评论文本数据。— Comments on the data acquisition module. It is mainly used to collect review data in related fields. Obtain a large amount of comment text data through web crawlers or other methods.

—数据预处理模块。主要用于分开评论文本的中句子。在分开文本中句子后,使用分词词性标注工具对其进行分词词性标注。— Data preprocessing module. Mainly used to separate mid-sentences of comment text. After separating the sentences in the text, use the part-of-speech tagging tool for part-of-speech tagging.

—实体提取模块。主要用于提取评论中的实体。实体主要由名词成分构成。本发明使用词频以及人工参与的方式进行实体名词提取。— Entity extraction module. Mainly used to extract entities in reviews. Entities are mainly composed of noun components. The present invention uses word frequency and manual participation to extract entity nouns.

—实体本体树构建模块。主要用于将实体中的名词进行本体树构建。本体树上不同类别的词被构建到不同的分支上,同时词与词的层级关系也在本体树上得到体现。— Entity ontology tree building block. It is mainly used to construct the ontology tree of the nouns in the entity. Words of different categories on the ontology tree are constructed on different branches, and the hierarchical relationship between words is also reflected in the ontology tree.

—实体特征提取模块。主要用于提取相关实体的特征。实体特征主要由形容词,动词,名词构成。本发明采用句法依存关系的方法,以及词的同时出现方法进行实体特征提取。- Entity feature extraction module. It is mainly used to extract features of related entities. Entity features are mainly composed of adjectives, verbs, and nouns. The present invention adopts the method of syntactic dependence and the method of simultaneous appearance of words to extract entity features.

—评论分析模块。主要用于利用实体及特征进行未处理评论文本分析。并且得到相关信息提取结果。— Review analysis module. Mainly used for analysis of unprocessed review text using entities and features. And obtain relevant information extraction results.

本发明的发明目的是这样实现的:本发明通过调用数据采集模块和数据预处理模块得到初步处理后的数据,接下来通过调用实体提取模块,实体本体树构建模块,实体特征提取模块得到相关训练结果,最后通过评论分析模块对上述模块进行封装,封装完成后,当进入新的评论文本时,由评论分析模块对文本进行分析得到最后结果。The invention object of the present invention is achieved like this: the present invention obtains the data after preliminary processing by calling data collection module and data preprocessing module, then by calling entity extraction module, entity ontology tree construction module, entity feature extraction module obtains relevant training As a result, the above-mentioned modules are finally encapsulated by the comment analysis module. After the encapsulation is completed, when a new comment text is entered, the comment analysis module analyzes the text to obtain the final result.

附图说明Description of drawings

图1是本发明基于实体及特征的评论分析方法具体实施原理以及框图。Fig. 1 is a specific implementation principle and a block diagram of the comment analysis method based on entities and features of the present invention.

具体实施方式Detailed ways

下面结合附图对本发明的具体实施方式进行描述,以便本领域的技术人员更好地理解本发明。需要特别提醒注意的是,在以下的描述中,当已知功能和设计的详细描述也许会淡化本发明的主要内容时,这些描述在这里将被忽略。Specific embodiments of the present invention will be described below in conjunction with the accompanying drawings, so that those skilled in the art can better understand the present invention. It should be noted that in the following description, when detailed descriptions of known functions and designs may dilute the main content of the present invention, these descriptions will be omitted here.

图1是本发明基于实体及特征的评论分析方法具体实施原理以及框图。Fig. 1 is a specific implementation principle and a block diagram of the comment analysis method based on entities and features of the present invention.

在本实施例中,如图1所示,本发明基于实体及特征的评论分析方法数据采集模块101,数据预处理模块102,实体提取模块103,实体本体树构建模块104,实体特征提取模块105,实体及特征构建模块201,未处理评论106,分析结果107。In the present embodiment, as shown in FIG. 1 , the comment analysis method based on entities and features of the present invention has a data acquisition module 101, a data preprocessing module 102, an entity extraction module 103, an entity ontology tree construction module 104, and an entity feature extraction module 105. , entity and feature building block 201 , unprocessed comment 106 , analysis result 107 .

在本实例中通过调用数据采集模块101获得相关的数据后,将其数据传递给数据预处理模块102,由数据预处理模块完成分开段落,分开评论中的长句子,分开评论中的短句子,分词以及词性标注后经数据预处理模块102数据传递给实体提取模块103,实体特征提取模块105,由实体标注模块103提取实体后将数据传递给实体本体树构建模块104。同时利用实体特征提取模块104提取相应的特征。实体提取模块103,实体本体树构建模块104,实体特征提取模块105均属于实体及特征构建模块201。在完成实体及特征构建模块201后将利用201处理未处理评论106。处理后得到分析结果107。In this example, after obtaining the relevant data by calling the data collection module 101, the data is passed to the data preprocessing module 102, and the data preprocessing module completes the separation of paragraphs, separation of long sentences in comments, and separation of short sentences in comments. After the word segmentation and part-of-speech tagging, the data is passed to the entity extraction module 103 by the data preprocessing module 102, and the entity feature extraction module 105, after the entity is extracted by the entity tagging module 103, the data is passed to the entity ontology tree construction module 104. At the same time, the entity feature extraction module 104 is used to extract corresponding features. The entity extraction module 103 , the entity ontology tree construction module 104 , and the entity feature extraction module 105 all belong to the entity and feature construction module 201 . Unprocessed comments 106 will be processed with 201 after completion of entity and feature building block 201 . The analysis result 107 is obtained after processing.

尽管上面对本发明说明性的具体实施方式进行了描述,以便于本技术领的技术人员理解本发明,但应该清楚,本发明不限于具体实施方式的范围,对本技术领域的普通技术人员来讲,只要各种变化在所附的权利要求限定和确定的本发明的精神和范围内,这些变化是显而易见的,一切利用本发明构思的发明创造均在保护之列。Although the illustrative specific embodiments of the present invention have been described above, so that those skilled in the art can understand the present invention, it should be clear that the present invention is not limited to the scope of the specific embodiments. For those of ordinary skill in the art, As long as various changes are within the spirit and scope of the present invention defined and determined by the appended claims, these changes are obvious, and all inventions and creations using the concept of the present invention are included in the protection list.

Claims (1)

1. the comment and analysis method based on entity and feature is made up of following characteristics:
-comment data acquisition module.Be mainly used in gathering the comment data of association area.Obtain a large amount of comment text data by web crawlers or additive method.
-data preprocessing module.Be mainly used in the separately middle sentence of comment text.In text separately, after sentence, use participle part-of-speech tagging instrument to carry out participle part-of-speech tagging to it.
-entity extraction module.Be mainly used in extracting the entity in comment.Entity is mainly made up of noun composition.The present invention uses word frequency and the artificial mode participating in to carry out substantive noun extraction.
-entity body tree builds module.Being mainly used in that the noun in entity is carried out to body tree builds.The upper different classes of word of body tree is built in different branches, and the hierarchical relationship of word and word is also embodied on body tree simultaneously.
-substance feature extraction module.Be mainly used in extracting the feature of related entities.Substance feature is mainly by adjective, verb, and noun forms.The present invention adopts the method for syntax dependence, and occurs that method carries out substance feature extraction when word.
-comment and analysis module.Be mainly used in utilizing entity and feature to carry out untreated comment text analysis.And obtain relevant information and extract result.
CN201410093275.4A 2014-03-13 2014-03-13 Comment analysis method based on entities and features Pending CN103886051A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410093275.4A CN103886051A (en) 2014-03-13 2014-03-13 Comment analysis method based on entities and features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410093275.4A CN103886051A (en) 2014-03-13 2014-03-13 Comment analysis method based on entities and features

Publications (1)

Publication Number Publication Date
CN103886051A true CN103886051A (en) 2014-06-25

Family

ID=50954943

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410093275.4A Pending CN103886051A (en) 2014-03-13 2014-03-13 Comment analysis method based on entities and features

Country Status (1)

Country Link
CN (1) CN103886051A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528625A (en) * 2020-12-11 2021-03-19 北京百度网讯科技有限公司 Event extraction method and device, computer equipment and readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101196904A (en) * 2007-11-09 2008-06-11 清华大学 News keyword abstraction method based on word frequency and multi-component grammar
US20090119156A1 (en) * 2007-11-02 2009-05-07 Wise Window Inc. Systems and methods of providing market analytics for a brand
US20090265332A1 (en) * 2008-04-18 2009-10-22 Biz360 Inc. System and Methods for Evaluating Feature Opinions for Products, Services, and Entities
WO2010042888A1 (en) * 2008-10-10 2010-04-15 The Regents Of The University Of California A computational method for comparing, classifying, indexing, and cataloging of electronically stored linear information
CN102968408A (en) * 2012-11-23 2013-03-13 西安电子科技大学 Method for identifying substance features of customer reviews
CN103077164A (en) * 2012-12-27 2013-05-01 新浪网技术(中国)有限公司 Text analysis method and text analyzer
CN103370707A (en) * 2011-02-24 2013-10-23 瑞典爱立信有限公司 Method and server for media classification
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090119156A1 (en) * 2007-11-02 2009-05-07 Wise Window Inc. Systems and methods of providing market analytics for a brand
CN101196904A (en) * 2007-11-09 2008-06-11 清华大学 News keyword abstraction method based on word frequency and multi-component grammar
US20090265332A1 (en) * 2008-04-18 2009-10-22 Biz360 Inc. System and Methods for Evaluating Feature Opinions for Products, Services, and Entities
WO2010042888A1 (en) * 2008-10-10 2010-04-15 The Regents Of The University Of California A computational method for comparing, classifying, indexing, and cataloging of electronically stored linear information
CN103370707A (en) * 2011-02-24 2013-10-23 瑞典爱立信有限公司 Method and server for media classification
CN102968408A (en) * 2012-11-23 2013-03-13 西安电子科技大学 Method for identifying substance features of customer reviews
CN103077164A (en) * 2012-12-27 2013-05-01 新浪网技术(中国)有限公司 Text analysis method and text analyzer
CN103544255A (en) * 2013-10-15 2014-01-29 常州大学 Text semantic relativity based network public opinion information analysis method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528625A (en) * 2020-12-11 2021-03-19 北京百度网讯科技有限公司 Event extraction method and device, computer equipment and readable storage medium
CN112528625B (en) * 2020-12-11 2024-02-23 北京百度网讯科技有限公司 Event extraction method, device, computer equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN107885737B (en) Man-machine interactive translation method and system
CN108364199B (en) A data analysis method and system based on Internet user comments
CN106503049A (en) A kind of microblog emotional sorting technique for merging multiple affection resources based on SVM
CN103176963B (en) Chinese sentence meaning structure model automatic labeling method based on CRF ++
CN104199972A (en) Named entity relation extraction and construction method based on deep learning
CN105956052A (en) Building method of knowledge map based on vertical field
CN103853834B (en) Text structure analysis-based Web document abstract generation method
CN103514213B (en) Term extraction method and device
CN104408078A (en) Construction method for key word-based Chinese-English bilingual parallel corpora
CN104391842A (en) Translation model establishing method and system
WO2017198031A1 (en) Semantic parsing method and apparatus
CN104536956A (en) A Microblog platform based event visualization method and system
CN102279890A (en) Sentiment word extracting and collecting method based on micro blog
CN103544321A (en) Data processing method and device for micro-blog emotion information
CN103886053A (en) Knowledge base construction method based on short text comments
CN103150335A (en) Co-clustering-based coal mine public sentiment monitoring system
CN103838796A (en) Webpage structured information extraction method
CN106682123A (en) Hot event acquiring method and device
CN103150331A (en) Method and device for providing search engine tags
CN107797994A (en) Vietnamese noun phrase block identifying method based on constraints random field
CN108960772A (en) Enterprise's evaluation householder method and system based on deep learning
CN103123620A (en) Web text sentiment analysis method based on propositional logic
Mahmoodi-Bakhtiari et al. The relationship of language and gender in the contemporary Persian novels: A study of six works
CN110096618B (en) Movie recommendation method based on dimension-based emotion analysis
CN104615705B (en) Method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140625