CN107122494B - Topic Model Construction Method Based on Community Discovery - Google Patents
Topic Model Construction Method Based on Community Discovery Download PDFInfo
- Publication number
- CN107122494B CN107122494B CN201710361414.0A CN201710361414A CN107122494B CN 107122494 B CN107122494 B CN 107122494B CN 201710361414 A CN201710361414 A CN 201710361414A CN 107122494 B CN107122494 B CN 107122494B
- Authority
- CN
- China
- Prior art keywords
- community
- short text
- data
- topic model
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Primary Health Care (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Computational Linguistics (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及一种基于社团发现的主题模型构建方法,尤其涉及内部蕴含社会网络的社交型短文本数据主题挖掘的技术。The invention relates to a topic model construction method based on community discovery, in particular to a topic mining technology for social short text data containing social networks.
背景技术Background technique
在当前的网络环境下,随着各种线上平台的丰富,大量的社交型数据被产生出来,社交网络俨然已经成为了一个进行信息挖掘的数据源泉。在此场景下产生的数据,大部分又以短文本的形式呈现。相对于长文本,短文本表达的语义简练,传递信息的速度快,是信息传播的一个明显发展趋势。短文本正在成为当今社会最重要的信息载体之一。In the current network environment, with the enrichment of various online platforms, a large amount of social data is generated, and the social network has become a data source for information mining. Most of the data generated in this scenario is presented in the form of short text. Compared with long texts, short texts express concise semantics and transmit information faster, which is an obvious development trend of information dissemination. Short text is becoming one of the most important information carriers in today's society.
目前在对这些数据的分析方法中,通过主题模型挖掘文本内涵的语义信息是一种很有效的方式。经典的主题模型算法,如PLSA、LDA等主要基于双模式和词共现关系对文本进行语义分析。这类算法在对长篇的文档进行处理时效果是显著的,而在针对短文本时,因为词共现关系不足,使算法面临数据稀疏性问题,会严重影响模型质量。At present, in the analysis method of these data, it is a very effective way to mine the semantic information of text connotation through topic model. Classic topic model algorithms, such as PLSA, LDA, etc., mainly perform semantic analysis on texts based on dual patterns and word co-occurrence relationships. This kind of algorithm is effective when dealing with long documents, but when dealing with short texts, due to insufficient word co-occurrence relationship, the algorithm faces the problem of data sparsity, which will seriously affect the quality of the model.
现阶段学术界针对这种短文本的主题模型主要有下面五种处理方案:1)采用简单的拼接,把短文本直接连在一起;2)用引入外界资料库的方法将短文本聚合成长文本;3)从一种启发式的方法来实现,如基于推特内容的标签信息、内容发送的时间流信息或者发送内容的作者等对文本进行扩展;4)对文本的主题采用宽松的假设,假设一个短文本中只包含一个主题;5)对建模对象进行改变。比较有代表性的是Yan等人在2013年提出的BTM模型。At this stage, the topic model for this kind of short texts in the academic world mainly has the following five processing schemes: 1) use simple splicing to directly connect the short texts; 2) use the method of introducing external databases to aggregate short texts into long texts 3) Implemented from a heuristic method, such as extending the text based on the tag information of the Twitter content, the time stream information sent by the content, or the author who sent the content; 4) Using loose assumptions about the subject of the text, Suppose a short text contains only one topic; 5) Make changes to the modeled objects. A more representative one is the BTM model proposed by Yan et al. in 2013.
以上方案或强行抹去了文档的边界或受到外界资料的干扰等,具有诸多不足之处。The above solutions have many shortcomings, such as forcibly erasing the boundary of the document or being interfered by external data.
发明内容SUMMARY OF THE INVENTION
本发明提出一种基于社团发现的主题模型(即TMCD模型,Topic Model based onCommunity Detection)构建方法,该方法可针对社交型数据集构建主题模型,即采用社会发现算法为社交型短文本数据的主题挖掘提供解决方案。TMCD模型从数据中内在蕴含的社团关系的角度出发,以社团发现算法为基础进行短文本的自扩展,解决了数据稀疏性问题。The present invention proposes a method for constructing a topic model based on community discovery (ie, TMCD model, Topic Model based on Community Detection). Dig for solutions. From the perspective of the inherent community relationship in the data, the TMCD model performs self-expansion of short texts based on the community discovery algorithm, which solves the problem of data sparsity.
为解决上述问题,本发明所公开的基于社团发现的主题模型构建的方法的技术方案包括如下步骤:In order to solve the above problems, the technical solution of the method for constructing a topic model based on community discovery disclosed by the present invention includes the following steps:
步骤1、基于短文本数据提取蕴含的关系网络;
步骤2、采用社团发现算法将关系网络划分成多个社团;Step 2. Use the community discovery algorithm to divide the relational network into multiple communities;
步骤3、将各社团中提取的短文本进行扩充以得到具有词共现关系的长文档,Step 3. Expand the short texts extracted from each community to obtain long documents with word co-occurrence relationships,
并将得到的多个长文档构成长文档集合;And multiple long documents are obtained to form a long document collection;
步骤4、针对长文档集合进行主题挖掘,得到基于社团发现的TMCD主题模型。Step 4. Perform topic mining on long document collections to obtain a TMCD topic model based on community discovery.
进一步的,步骤1中关系网络的提取过程是:采用短文本数据中的主体作为结点,通过主体间交互关系进行关联并抽象形成边,将得到的结点和边共同形成一关系网络。Further, the extraction process of the relational network in
进一步的,以主体间交互关系的密切程度作为边的权重,以关联的主被动关系作为边的方向。Further, the closeness of the interaction between the subjects is used as the weight of the edge, and the active-passive relationship of the association is used as the direction of the edge.
进一步的,步骤2中所述的社团发现算法包括凝聚、分裂、标签传播和全局探索中的一种或多种。Further, the community discovery algorithm described in step 2 includes one or more of agglomeration, splitting, label propagation and global exploration.
进一步的,步骤3中是采用自扩展方法对短文本进行扩充。Further, in step 3, a self-expansion method is used to expand the short text.
进一步的,所述短文本数据是内部蕴含着社会网络的社交型数据,所述关系网络是社会网络。Further, the short text data is social data containing a social network inside, and the relationship network is a social network.
本发明所公开的基于社团发现的主题模型构建方法,为社交型短文本数据的主题挖掘提供了新的解决方案,具有以下有益效果:The topic model construction method based on community discovery disclosed by the present invention provides a new solution for topic mining of social short text data, and has the following beneficial effects:
(1)该方法通过挖掘数据内部蕴含的社团网络关联来作为文本分类依据,在此基础上完成对短文本的扩充,进而解决短文本主题挖掘中数据稀疏性的问题,为此类社交型短文本数据集主题模型构建提供了解决方案。(1) This method uses the community network associations contained in the data as the basis for text classification, and on this basis, completes the expansion of short texts, and then solves the problem of data sparsity in the topic mining of short texts. This dataset topic model construction provides a solution.
(2)该方法通过基于内容相似性的自扩展方法,在不引入外界帮助数据的情况下,解决了现有短文本主题建模解决方案中因简单拼接所有具有内容相关性和不具备内容相关性的文本而导致的强行抹去了文档的边界的问题或因为引入了外部辅助语料库而带来的外部噪音干扰问题,并从根本上避免了词共现关系不足对主题模型的影响。(2) Through the self-expanding method based on content similarity, this method solves the problem of simple splicing of all content-related and non-content-related topics in the existing short text topic modeling solutions without introducing external help data. The problem of forcibly erasing the boundary of the document caused by the sexual text or the problem of external noise interference caused by the introduction of an external auxiliary corpus, and fundamentally avoid the influence of insufficient word co-occurrence relationship on the topic model.
附图说明Description of drawings
图1为主题模型中文档-主题-词语之间的关系示意图。Figure 1 is a schematic diagram of the relationship between documents-topic-words in the topic model.
图2为社会网络示意图。Figure 2 is a schematic diagram of a social network.
图3为实施例中针对社交型数据集的主题模型构建方法的流程图。FIG. 3 is a flowchart of a method for constructing a topic model for a social data set in an embodiment.
图4为图3中短文本扩充部分的流程图。FIG. 4 is a flowchart of the short text extension part in FIG. 3 .
具体实施方式Detailed ways
为了更了解本发明的技术内容,特举具体实施例并配合所附图式说明如下。In order to better understand the technical content of the present invention, specific embodiments are given and described below in conjunction with the accompanying drawings.
如图1所示为主题模型中文档、主题、词汇之间的关系。在数据引入“主题”这一概念后,主题就可以作为联系文档与词的“桥梁”,通过观测文档与主题之间的概率分布以及主题与词汇之间的概率分布即可通过相关数学模型得到主体的分布情况。在获取主题与词关系时,词共现关系的多少会影响到观测结果的准确度,此准确度也会进一步影响最终主题模型的质量。对于长文本来说,在观测时有足够多的词共现关系作为支撑,而短文本则缺乏足够的词共现关系,也就是出现了数据的稀疏性问题。本发明提出的TMCD模型构建方法正是针对此问题的解决而展开的。Figure 1 shows the relationship between documents, topics, and vocabulary in the topic model. After the concept of "topic" is introduced into the data, the topic can be used as a "bridge" between documents and words. By observing the probability distribution between documents and topics and the probability distribution between topics and words, it can be obtained through relevant mathematical models. distribution of subjects. When obtaining the relationship between topics and words, the number of word co-occurrence relationships will affect the accuracy of the observation results, and this accuracy will further affect the quality of the final topic model. For long texts, there are enough word co-occurrence relationships as support during observation, while short texts lack sufficient word co-occurrence relationships, that is, the problem of data sparsity occurs. The TMCD model construction method proposed by the present invention is developed to solve this problem.
如图2所示,实施例中TMCD模型针对社交型数据,通过对数据集中的关键主体(即数据集中产生数据的对象,一般为联系人)和主体间的关联(即产生数据的传播途径)进行抽象后得到的主题模型,会呈现出一个明显的社会网络。这里的抽象是指把数据集中有实际意义的联系人和联系人之间的关系等抽象为社会网络中的结点和边。其中,抽象数据集中的主体对象为结点,如社交数据中以联系人为结点;抽象主体间关联为边,以关联的密切程度作为边的权重,如社交数据中联系人互发消息作为边,发消息的条数作为权重,以发消息的主被动关系作为边的方向。得到的社会网络的一个重要特征就是蕴含着社团结构,而社团结构是指社会网络通过一些算法作用可被划分为若干社团,且同一社团中的数据具有相似性。在划分的结果中,社团内部的结点关系较为密切,联系紧密,而社团间的结点联系比较稀疏。As shown in FIG. 2 , the TMCD model in the embodiment is aimed at social data, through the correlation between the key subjects in the data set (that is, the objects that generate data in the data set, generally contacts) and the subjects (that is, the dissemination paths that generate the data) The topic model obtained after abstraction will present an obvious social network. The abstraction here refers to the abstraction of meaningful contacts and relationships between contacts in the dataset into nodes and edges in the social network. Among them, the subject objects in the abstract data set are nodes, such as contacts in social data; the association between abstract subjects is an edge, and the closeness of the association is used as the weight of the edge. For example, in the social data, the contacts send messages to each other as the edge , the number of messages sent is used as the weight, and the active-passive relationship of sending messages is used as the direction of the edge. An important feature of the obtained social network is that it contains a community structure, and the community structure means that the social network can be divided into several communities through some algorithms, and the data in the same community are similar. In the results of the division, the nodes within the community are relatively closely related and closely connected, while the nodes between the communities are relatively sparsely connected.
如图3为实施例中一种针对社交型数据集的主题模型构建方法的流程图,该方法基于社团发现进行模型构建,包括如下步骤:Figure 3 is a flowchart of a method for constructing a topic model for a social data set in an embodiment. The method constructs a model based on community discovery, including the following steps:
步骤1:根据社交型数据内部的主体和主体间数据的传播关系提取蕴含的社会网络。其中,社交型数据包含所有内部蕴含着社会网络的数据集,如:QQ、微信等即时通信中联系人实时生成的信息构成的数据集,微博、知乎等在线社交平台由转发、评论数据产生的数据集等。具体提取过程如下:Step 1: Extract the contained social network according to the communication relationship between the subjects within the social data and the data between subjects. Among them, social data includes all data sets that contain social networks, such as data sets composed of real-time information generated by contacts in instant messaging such as QQ and WeChat, and online social platforms such as Weibo and Zhihu are composed of forwarding and commenting data. The resulting dataset, etc. The specific extraction process is as follows:
1)采用抽象数据(即社交型数据)中的主体作为结点,其中,抽象数据中的主体包含可以作为构建的社会网络中结点的对象,如人、物或事件等;1) The subject in the abstract data (ie social data) is used as the node, wherein the subject in the abstract data includes objects that can be used as nodes in the constructed social network, such as people, things or events;
2)通过主体间交互关系进行关联,抽象形成边,其中,交互关系包含所有可以在两个主体间形成有效关联的关系,如:由即时通信中消息的传递构成主体联系人的关联,在线社交平台中转发、评论、分享构成的主体关联等;2) Associate through the interaction relationship between subjects, and form an abstract edge, where the interaction relationship includes all the relationships that can form an effective association between two subjects, such as: the transmission of messages in instant messaging constitutes the association of subject contacts, online social networking The subject association formed by forwarding, commenting, and sharing on the platform, etc.;
3)基于上述步骤抽象得到的结点和边形成一个明显的社会网络。3) Based on the above steps, the abstracted nodes and edges form an obvious social network.
步骤2:采用社团发现算法将社会网络划分成多个社团结构。社团发现算法包括所有可以针对社会网络进行有效社团划分的算法,包括但不限于基于凝聚过程、分裂过程、标签传播和全局探索(包括谱分析)思路实现的算法,这也是大部分社团发现算法的设计思想,几乎涵盖所有可以有效划分的社团发现算法。Step 2: Use the community discovery algorithm to divide the social network into multiple community structures. Community discovery algorithms include all algorithms that can effectively divide communities for social networks, including but not limited to algorithms based on agglomeration process, splitting process, label propagation, and global exploration (including spectral analysis) ideas, which are also most community discovery algorithms. The design idea covers almost all community discovery algorithms that can be effectively divided.
步骤3:依据社团结构划分结果对各社团中包含的短文本进行扩充。扩充方法主要包括如下子步骤:Step 3: Expand the short text contained in each community according to the community structure division result. The expansion method mainly includes the following sub-steps:
1)提取各个划分出的社团下包含的多个结点所对应的短文本数据;1) Extract the short text data corresponding to multiple nodes contained under each divided community;
2)通过基于自扩展的传统扩充方法把短文本扩充为长文档;2) The short text is expanded into a long document by the traditional expansion method based on self-expansion;
3)基于上述步骤可以得到若干个(取决于划分出社团的数目)由社会网络中具有文本相似性的数据自扩充得到的包含丰富词共现关系的长文档,并将各社团扩充后得到的长文档构成一个长文档集合。3) Based on the above steps, several long documents containing rich word co-occurrence relationships can be obtained (depending on the number of divided communities) obtained by self-expansion of data with text similarity in the social network, and each community is expanded. Long documents constitute a long document collection.
值得说明的是,基于自身数据集进行扩展,不引入外界帮助数据,具体可以直接拼接法为例作说明,即,将提取出的短文本直接进行连接,这种扩充方法本身不会考虑文本是否具有相似性,此场景下具体操作为把所有位于同一社团下多个结点所对应的文本扩充作为一个长的文档。It is worth noting that the expansion based on its own data set does not introduce external help data. Specifically, the direct splicing method can be used as an example, that is, the extracted short text is directly connected. This expansion method itself does not consider whether the text is not. With similarity, the specific operation in this scenario is to expand all the texts corresponding to multiple nodes in the same community as a long document.
步骤4:针对长文档集合进行主题建模,并得到TMCD模型。使用传统的主题模型构建方法(如:LDA、概率潜在语义分析PLSA等),以文档中丰富的词共现关系得到词-主题的观测结果,再结合观测到的文档-主题结果,通过一定的数学方法(如:吉布斯采样等)完成主题分析和挖掘过程,得到针对社交型数据集的TMCD模型。该TMCD模型将直观的输出文档中包含的主题情况和对应关键词等信息,相较于直接把传统主题模型方法作用在短文本上,TMCD模型额外进行了基于社团发现的文本扩充过程,使得文本中有足够的词共现关系,从而大幅提高主题挖掘的结果的质量。Step 4: Perform topic modeling for long document collections and obtain TMCD models. Using traditional topic model construction methods (such as LDA, probabilistic latent semantic analysis, PLSA, etc.), the word-topic observation results are obtained based on the rich word co-occurrence relationship in the document, and then combined with the observed document-topic results, through a certain Mathematical methods (such as Gibbs sampling, etc.) complete the process of topic analysis and mining, and obtain a TMCD model for social data sets. The TMCD model can intuitively output the information of the topic and corresponding keywords contained in the document. Compared with the traditional topic model method directly acting on the short text, the TMCD model additionally performs the text expansion process based on community discovery, so that the text There are enough word co-occurrence relationships in the t to greatly improve the quality of the results of topic mining.
如图4所示为实施例中步骤3的第2)子步骤所述的短文本扩充部分的流程图,具体包括以下步骤:As shown in Figure 4, it is the flowchart of the short text expansion part described in the 2nd) sub-step of step 3 in the embodiment, and specifically includes the following steps:
S3-1为短文本提取操作,按照图3步骤2中社团划分的结果提取一个未扩充社团中所包含的多个结点,然后从每个结点的信息中提取对应的短文本数据;S3-1 is a short text extraction operation, extracts a plurality of nodes included in an unexpanded community according to the result of community division in step 2 of Figure 3, and then extracts the corresponding short text data from the information of each node;
S3-2为短文本扩充操作,把步骤3-1中提取的短文本通过基于自扩充方式进行扩展,此处以自扩充方式中的直接拼接法为例作说明,即把提取出的短文本直接进行连接,把所有位于此社团的文本扩充成一个长的文档;S3-2 is the short text expansion operation, and the short text extracted in step 3-1 is expanded based on the self-expansion method. Concatenate and expand all texts in this community into one long document;
S3-3为判断条件,判断是否所有短文本以按照社团划分结果进行了扩充操作。若有未进行扩充的社团则进入步骤3-1,否则进入步骤3-4;S3-3 is a judging condition, judging whether all the short texts have been expanded according to the community division result. If there is an unexpanded community, go to step 3-1, otherwise go to step 3-4;
S3-4为返回扩充后的长文档集,依据社团划分结果的短文本扩充步骤结束。S3-4 is to return the expanded long document set, and the short text expansion step according to the community division result ends.
综上所述,实施例中,基于社会发现的主题模型构建的方法为社交型数据集的主题的挖掘提供了一种新的思路,该方法通过对社交型数据集内部蕴含的社团结构的发现,并以此为基础进行短文本的自扩充形成长的文档集,解决了直接在短文本上进行主题挖掘所面临的数据稀疏性问题,大幅度提高了主题模型的质量,为社交型数据集的主题模型提供了解决方案。To sum up, in the embodiment, the method for constructing a topic model based on social discovery provides a new idea for the topic mining of social data sets. , and based on this, self-expansion of short texts forms a long document set, which solves the data sparsity problem faced by topic mining directly on short texts, greatly improves the quality of topic models, and is a social dataset. The topic model provides a solution.
虽然本发明已在较佳的实施例揭露如上,然其并非用以限定本发明。本发明所属技术领域中具有通常知识者,在不脱离本发明的精神和范围内,当可作各种的更动与润饰。因此,本发明的保护范围当视权利要求书所界定者为准。Although the present invention has been disclosed above in preferred embodiments, it is not intended to limit the present invention. Those skilled in the art to which the present invention pertains can make various changes and modifications without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention should be determined according to the claims.
Claims (4)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710361414.0A CN107122494B (en) | 2017-05-22 | 2017-05-22 | Topic Model Construction Method Based on Community Discovery |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201710361414.0A CN107122494B (en) | 2017-05-22 | 2017-05-22 | Topic Model Construction Method Based on Community Discovery |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN107122494A CN107122494A (en) | 2017-09-01 |
| CN107122494B true CN107122494B (en) | 2020-06-26 |
Family
ID=59727788
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201710361414.0A Active CN107122494B (en) | 2017-05-22 | 2017-05-22 | Topic Model Construction Method Based on Community Discovery |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN107122494B (en) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108681557B (en) * | 2018-04-08 | 2022-04-01 | 中国科学院信息工程研究所 | Short text topic discovery method and system based on self-expansion representation and similar bidirectional constraint |
| CN110264372B (en) * | 2019-05-16 | 2022-03-08 | 西安交通大学 | Topic community discovery method based on node representation |
| CN114913336B (en) * | 2022-05-27 | 2025-03-04 | 北京达佳互联信息技术有限公司 | Network graph feature extraction method, device, electronic device, and storage medium |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7877407B2 (en) * | 1998-10-05 | 2011-01-25 | Smith Iii Julius O | Method and apparatus for facilitating use of hypertext links on the world wide web |
| CN103778207A (en) * | 2014-01-15 | 2014-05-07 | 杭州电子科技大学 | LDA-based news comment topic digging method |
| EP2751720A1 (en) * | 2011-08-31 | 2014-07-09 | Metaswitch Networks Ltd | Processing communications data |
| CN104123336A (en) * | 2014-05-21 | 2014-10-29 | 深圳北航新兴产业技术研究院 | Deep Boltzmann machine model and short text subject classification system and method |
| CN104391942A (en) * | 2014-11-25 | 2015-03-04 | 中国科学院自动化研究所 | Short text characteristic expanding method based on semantic atlas |
| CN104850650A (en) * | 2015-05-29 | 2015-08-19 | 清华大学 | Short-text expanding method based on similar-label relation |
| CN106055604A (en) * | 2016-05-25 | 2016-10-26 | 南京大学 | Short text topic model mining method based on word network to extend characteristics |
-
2017
- 2017-05-22 CN CN201710361414.0A patent/CN107122494B/en active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7877407B2 (en) * | 1998-10-05 | 2011-01-25 | Smith Iii Julius O | Method and apparatus for facilitating use of hypertext links on the world wide web |
| EP2751720A1 (en) * | 2011-08-31 | 2014-07-09 | Metaswitch Networks Ltd | Processing communications data |
| CN103778207A (en) * | 2014-01-15 | 2014-05-07 | 杭州电子科技大学 | LDA-based news comment topic digging method |
| CN104123336A (en) * | 2014-05-21 | 2014-10-29 | 深圳北航新兴产业技术研究院 | Deep Boltzmann machine model and short text subject classification system and method |
| CN104391942A (en) * | 2014-11-25 | 2015-03-04 | 中国科学院自动化研究所 | Short text characteristic expanding method based on semantic atlas |
| CN104850650A (en) * | 2015-05-29 | 2015-08-19 | 清华大学 | Short-text expanding method based on similar-label relation |
| CN106055604A (en) * | 2016-05-25 | 2016-10-26 | 南京大学 | Short text topic model mining method based on word network to extend characteristics |
Non-Patent Citations (1)
| Title |
|---|
| The dual-sparse topic model:mining focused topics and focused terms in short text;Lin T Tian;《Proceedings of the 23rd international conference on World wideweb" International World Wide Web Conferences Steering Committee》;20141231;第539-550页 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN107122494A (en) | 2017-09-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN107301170B (en) | Method and device for segmenting sentences based on artificial intelligence | |
| CN116561446B (en) | Multi-mode project recommendation method, system and device and storage medium | |
| CN111598710A (en) | Method and device for detecting social network events | |
| CN108681557B (en) | Short text topic discovery method and system based on self-expansion representation and similar bidirectional constraint | |
| CN106960030B (en) | Information pushing method and device based on artificial intelligence | |
| CN108052593A (en) | A kind of subject key words extracting method based on descriptor vector sum network structure | |
| CN106156365A (en) | A kind of generation method and device of knowledge mapping | |
| CN110134958B (en) | A Short Text Topic Mining Method Based on Semantic Word Network | |
| CN106776881A (en) | A kind of realm information commending system and method based on microblog | |
| CN104850650B (en) | Short text extending method based on category relation | |
| US20150046781A1 (en) | Browsing images via mined hyperlinked text snippets | |
| CN107798043B (en) | Text clustering method for long text auxiliary short text based on Dirichlet multinomial mixed model | |
| CN107239512B (en) | A microblog spam comment identification method combined with comment relationship network graph | |
| CN102314440B (en) | Utilize the method and system in network operation language model storehouse | |
| CN103150382A (en) | Automatic short text semantic concept expansion method and system based on open knowledge base | |
| CN102110140A (en) | Network-based method for analyzing opinion information in discrete text | |
| CN105320642A (en) | Automatic abstract generation method based on concept semantic unit | |
| CN106294314A (en) | Topic Mining Method and Device | |
| CN112989208B (en) | Information recommendation method and device, electronic equipment and storage medium | |
| CN110275962B (en) | Method and apparatus for outputting information | |
| CN103488637B (en) | A kind of method carrying out expert Finding based on dynamics community's excavation | |
| CN109325122A (en) | Vocabulary generation method, text classification method, apparatus, device and storage medium | |
| CN108491512A (en) | The method of abstracting and device of headline | |
| CN107122494B (en) | Topic Model Construction Method Based on Community Discovery | |
| CN104915443A (en) | Extraction method of Chinese Microblog evaluation object |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |