[go: up one dir, main page]

CN107305490B - Metadata grouping method and device - Google Patents

Metadata grouping method and device Download PDF

Info

Publication number
CN107305490B
CN107305490B CN201610257438.7A CN201610257438A CN107305490B CN 107305490 B CN107305490 B CN 107305490B CN 201610257438 A CN201610257438 A CN 201610257438A CN 107305490 B CN107305490 B CN 107305490B
Authority
CN
China
Prior art keywords
node
nodes
relationship
metadata
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610257438.7A
Other languages
Chinese (zh)
Other versions
CN107305490A (en
Inventor
祝希路
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Group Henan Co Ltd
Original Assignee
China Mobile Group Henan Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Group Henan Co Ltd filed Critical China Mobile Group Henan Co Ltd
Priority to CN201610257438.7A priority Critical patent/CN107305490B/en
Publication of CN107305490A publication Critical patent/CN107305490A/en
Application granted granted Critical
Publication of CN107305490B publication Critical patent/CN107305490B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/35Creation or generation of source code model driven
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明公开了一种元数据分组方法及装置,所述方法包括:获取数据库的各节点中的元数据的来源信息;根据所述各节点中的元数据的来源信息确定各节点之间的关联关系和关联度;根据所述各节点及所述各节点之间的关联关系构建关系网,所述关系网中的边为所述各节点之间的关联关系,所述关系网中的边的权重为具有关联关系的节点间关联度;利用粒子群算法对所述关系网进行划分,得到多个子关系网,其中,每个所述子关系网为一个分组。本发明实施例中,根据各节点及所述各节点之间的关联关系构建关系网后,在元数据的资料缺乏的情况下,仍然能够将元数据进行分组,解决了现有技术中在缺失元数据加工描述信息时无法完善分组类别的问题。

Figure 201610257438

The invention discloses a metadata grouping method and device. The method includes: acquiring source information of metadata in each node of a database; determining the association between each node according to the source information of metadata in each node relationship and degree of association; a relationship network is constructed according to the nodes and the association relationship between the nodes, the edges in the relationship network are the association relationships between the nodes, and the edges in the relationship network are The weight is the degree of association between nodes with an association relationship; the relationship network is divided by particle swarm algorithm to obtain a plurality of sub-relationship networks, wherein each of the sub-relationship networks is a group. In the embodiment of the present invention, after the relationship network is constructed according to each node and the association relationship between the nodes, in the case of lack of metadata data, the metadata can still be grouped, which solves the problem of lack of metadata in the prior art. The problem that the metadata cannot improve the grouping category when processing the description information.

Figure 201610257438

Description

一种元数据分组方法及装置Method and device for grouping metadata

技术领域technical field

本发明涉及移动互联网和自动化信息处理技术领域,尤其涉及一种短信发送方法及装置。The present invention relates to the technical field of mobile Internet and automatic information processing, and in particular, to a method and device for sending short messages.

背景技术Background technique

随着信息技术的飞速发展,通常是采用模型驱动的方式来设计软件。在以数据模型为驱动的系统中,这类数据模型通常是特定类型的抽象,统称为元数据。在数据量呈爆炸增长的年代,数据模型的管理显得越来越重要,元数据描述了数据的各种特征,包括内容,状况,质量等各种维度。因此,在大数据时代下,如何能合理,高效的对元数据进行分组管理成数据质量管控的突出问题。With the rapid development of information technology, software is usually designed in a model-driven manner. In data model-driven systems, such data models are often abstractions of a specific type, collectively referred to as metadata. In the era of explosive growth of data volume, the management of data model becomes more and more important. Metadata describes various characteristics of data, including content, status, quality and other dimensions. Therefore, in the era of big data, how to reasonably and efficiently group and manage metadata has become a prominent issue in data quality control.

目前运用较广泛的元数据分组管理主要是采用分级结构的元数据管理,并由ERP系统软件进行统一的管理和发布,通过一个或是多个层级管控确保元数据能合理,规范的分类,避免冲突。常用的管理方式还包括通过图形界面的方式进行管控,该方式从元数据输入源头到逻辑模型,物理模型构建都采用图形化界面方式实现,确保了元数据分组管理能直观。另外还包括采用构建分类树的方式对元数据进行逐步的聚类,并对分组后的结果进行统一的管理。At present, the widely used metadata grouping management mainly adopts the hierarchical structure of metadata management, and is managed and released by the ERP system software in a unified manner. conflict. The commonly used management methods also include management and control through a graphical interface. This method is implemented from the source of metadata input to the logical model, and the construction of the physical model is realized by a graphical interface, which ensures that the metadata grouping management can be intuitive. In addition, it also includes step-by-step clustering of metadata by building a classification tree, and unified management of the grouped results.

从现有的技术方案来看,缺乏自动化分类的方法,图形界面虽然能直观的分类,但是在大数据融合环境下,元数据也呈几何级增长。因此,手工方式完全不能适应现有背景下得元数据管控。并且在现有技术中尽管通常也采用层级结构分类管理,但是层级结构只是在一定层度上反映了相似性,但元数据之间的关系是无法仅依赖层级关系进行描述的,因此无法从更深度和广度的角度去考虑,必然影响了后续分类管控的准确性。仅仅采用字面属性对元数据进行管理,缺乏对上下文的考虑,很可能将字面上相近的元数据划分成同一类别进行管理,但实际应当属于不同类别进行管理。From the perspective of the existing technical solutions, there is a lack of automatic classification methods. Although the graphical interface can intuitively classify, in the big data fusion environment, the metadata also increases geometrically. Therefore, the manual method is completely unable to adapt to the metadata management and control in the existing context. And in the prior art, although hierarchical structure classification management is usually adopted, the hierarchical structure only reflects the similarity to a certain level, but the relationship between metadata cannot be described only by relying on the hierarchical relationship, so it cannot be described from more Considering it from the perspective of depth and breadth, it will inevitably affect the accuracy of subsequent classification management and control. Only using literal attributes to manage metadata lacks consideration of context. It is possible to classify metadata that are literally similar into the same category for management, but they should actually belong to different categories for management.

综上所述,现有技术中对元数据进行分组管理存在缺乏对元数据上下环境的考虑,在缺失元数据加工描述信息时无法完善分组类别的问题。To sum up, the group management of metadata in the prior art lacks consideration of the context of the metadata, and the grouping category cannot be perfected when the metadata processing description information is missing.

发明内容SUMMARY OF THE INVENTION

本发明提供一种元数据分组方法及装置,用以解决现有技术中对元数据进行分组管理存在缺乏对元数据上下环境的考虑,在缺失元数据加工描述信息时无法完善分组类别的问题。The present invention provides a method and device for grouping metadata to solve the problem of lack of consideration of the context of metadata in the prior art for grouping management of metadata, and failure to complete the grouping category when metadata processing description information is missing.

本发明提供一种元数据分组方法,包括The present invention provides a metadata grouping method, comprising:

获取数据库的各节点中的元数据的来源信息;Obtain the source information of metadata in each node of the database;

根据所述各节点中的元数据的来源信息确定各节点之间的关联关系和关联度;Determine the association relationship and association degree between the nodes according to the source information of the metadata in the nodes;

根据所述各节点及所述各节点之间的关联关系构建关系网,所述关系网中的边为所述各节点之间的关联关系,所述关系网中的边的权重为具有关联关系的节点间关联度;A relational network is constructed according to the nodes and the associations between the nodes, the edges in the relational network are the associations between the nodes, and the weights of the edges in the relational network are associations The degree of correlation between nodes;

利用粒子群算法对所述关系网进行划分,得到多个子关系网,其中,每个所述子关系网为一个分组。The relationship network is divided by the particle swarm algorithm to obtain a plurality of sub-relationship networks, wherein each of the sub-relationship networks is a group.

本发明实施例中,根据数据库中元数据的来源信息确定由元数据构成的节点之间的关联关系,以及节点之间的关联度,根据节点与其他节点之间的关联关系将数据库中所有节点构成关系网,并且在确定元数据的关系网后利用粒子群算法对关系网进行划分,能够得到每一个子网络,将子网络作为一个分组。本发明实施例中,根据各节点及所述各节点之间的关联关系构建关系网后,在元数据的资料缺乏的情况下,仍然能够将元数据进行分组。In the embodiment of the present invention, the association relationship between the nodes formed by the metadata and the association degree between the nodes are determined according to the source information of the metadata in the database, and all nodes in the database are classified according to the association relationship between the node and other nodes. The relationship network is formed, and after determining the relationship network of the metadata, the particle swarm algorithm is used to divide the relationship network, and each sub-network can be obtained, and the sub-network can be regarded as a group. In the embodiment of the present invention, after the relationship network is constructed according to each node and the association relationship between the various nodes, the metadata can still be grouped in the case of lack of metadata data.

进一步地,所述根据所述各节点中的元数据的来源信息确定各节点之间的关联关系和关联度,包括:Further, determining the association relationship and association degree between the nodes according to the source information of the metadata in the nodes includes:

若第一节点中的一个元数据来源于第二节点,则确定所述第一节点与所述第二节点之间存在一个关联关系;If a piece of metadata in the first node originates from the second node, determining that there is an association relationship between the first node and the second node;

将所述第一节点与所述第二节点之间存在的关联关系的数量确定为所述第一节点与所述第二节点之间的关联度,其中,所述第一节点与所述第二节点为所述数据库中的任意两个不同的表The number of associations existing between the first node and the second node is determined as the association degree between the first node and the second node, wherein the first node and the second node are Two nodes are any two different tables in the database

本发明实施例中,若一个节点中的元数据是来源于另一个节点,或者一个节点中的元数据是另一个节点的元数据来源时,就认为两个节点之间存在关联关系,利用元数据之间的关联关系构建关系网,解决了现有技术中不考虑关联关系,只是用元数据的描述信息进行分组的问题。并且将两个节点之间的关联关系的数量作为关联度,体现了两个节点之间元数据的输入与输出的具体关系。In this embodiment of the present invention, if the metadata in one node is from another node, or the metadata in one node is the source of the metadata of another node, it is considered that there is an association relationship between the two nodes, and the metadata is used The relational relationship between the data constructs a relational network, which solves the problem that in the prior art, the relational relation is not considered, but only the description information of the metadata is used for grouping. And the number of associations between two nodes is used as the association degree, which reflects the specific relationship between the input and output of metadata between the two nodes.

进一步地,所述根据所述各节点及所述各节点之间的关联关系构建关系网,包括:Further, the building a relationship network according to the nodes and the associations between the nodes includes:

若所述第一节点与所述第二节点之间存在关联关系,则确定所述第一节点与所述第二节点在所述关系网中存在一条边,其中,所述边的权重为所述第一节点与所述第二节点之间的关联度。If there is an association relationship between the first node and the second node, it is determined that an edge exists between the first node and the second node in the relationship network, where the weight of the edge is The degree of association between the first node and the second node.

本发明实施例中,将所述关系网中两个有关联关系的节点存在的关联关系简化为两个节点中的一条边,利用边的权中来表示两个节点之间存在的关联关系的数量,更好的表示了关系网中两个节点之间的关联关系。In the embodiment of the present invention, the association relationship existing between two associated nodes in the relationship network is simplified as an edge in the two nodes, and the weight of the edge is used to represent the relationship between the two nodes. Quantity, which better represents the relationship between two nodes in the relationship network.

进一步地,所述利用粒子群算法对所述关系网进行划分,得到多个子关系网之前,还包括:Further, before the use of the particle swarm algorithm to divide the relationship network to obtain a plurality of sub-relationship networks, the method further includes:

若所述关系网中存在无效节点,则将所述无效节点删除,其中,所述无效节点为与所述关系网中其他节点的相关性小于合并阈值且仅作为所述关系网中其他节点的元数据的输入来源,或,所述无效节点为与所述关系网中其他节点的相关性小于合并阈值且所述无效节点仅作为所述关系网中其他节点的元数据的输出来源,或所述无效节点为与所述关系网中其他节点的相关性为零。If there is an invalid node in the relationship network, the invalid node is deleted, wherein the invalid node has a correlation with other nodes in the relationship network that is less than the merging threshold and is only used as the other node in the relationship network. The input source of metadata, or, the invalid node is the correlation with other nodes in the relationship network is less than the merging threshold and the invalid node is only used as the output source of metadata of other nodes in the relationship network, or all The invalid node is zero correlation with other nodes in the relationship network.

本发明实施例中,将关系网中只有输出元数据的节点以及只有元数据输入的节点或者与其他元数据之间没有关联关系的节点删除,优化关系网,利用优化后的关系网进行分组,更能够提高元数据分组的效率与准确性。In the embodiment of the present invention, the nodes that only output metadata, the nodes that only have metadata input, or the nodes that have no relationship with other metadata are deleted in the relationship network, the relationship network is optimized, and the optimized relationship network is used for grouping, It can further improve the efficiency and accuracy of metadata grouping.

进一步地,所述利用粒子群算法对所述关系网进行划分,得到多个子关系网之前,还包括:Further, before the use of the particle swarm algorithm to divide the relationship network to obtain a plurality of sub-relationship networks, the method further includes:

若所述关系网中存在一组节点满足如下关系,则将所述一组节点合并为一个节点;If there is a group of nodes in the relationship network that satisfies the following relationship, the group of nodes is merged into one node;

所述关系为所述一组节点中的节点个数不少于2个,且每两个存在输出和输入关系节点的相关性不小于合并阈值。The relationship is that the number of nodes in the group of nodes is not less than 2, and the correlation between every two nodes with an output and input relationship is not less than the merging threshold.

本发明实施例中,将关系网中存在的相互之间有关联关系的节点,并且计算出有关联关系节点之间的相关性不小于合并阈值时,说明其中的一个节点中的元数据完全来自与另一个节点的元数据,则需要将符合条件的节点进行合并,进一步优化关系网。In the embodiment of the present invention, when the nodes in the relationship network that are associated with each other are calculated, and the correlation between the nodes with the associated relationship is calculated to be not less than the merge threshold, it means that the metadata in one of the nodes comes entirely from With the metadata of another node, the eligible nodes need to be merged to further optimize the relationship network.

进一步地,所述利用粒子群算法对所述关系网进行划分,得到多个子关系网之后,还包括:Further, after dividing the relationship network by using the particle swarm algorithm to obtain a plurality of sub-relationship networks, the method further includes:

将所述子关系网中出现频率最高的描述信息作为所述子关系网的标识信息。The description information with the highest frequency in the sub-relationship network is used as the identification information of the sub-relationship network.

本发明实施例中,利用分组后每一个子关系网中每个节点的描述信息出现的频率对每一个子关系网进行标注,以便之后能够更好的进行调用、修改等操作。In the embodiment of the present invention, each sub-relationship network is marked with the frequency of occurrence of the description information of each node in each sub-relationship network after grouping, so that operations such as calling and modification can be better performed later.

本发明还提供一种元数据分组装置,包括:The present invention also provides a metadata grouping device, comprising:

获取单元,用于获取数据库的各节点中的元数据的来源信息;an acquisition unit, used to acquire source information of metadata in each node of the database;

确定单元,用于根据所述各节点中的元数据的来源信息确定各节点之间的关联关系和关联度;a determining unit, configured to determine the association relationship and association degree between the nodes according to the source information of the metadata in the nodes;

关系网构建单元,用于根据所述各节点及所述各节点之间的关联关系构建关系网,所述关系网中的边为所述各节点之间的关联关系,所述关系网中的边的权重为具有关联关系的节点间关联度;A relationship network construction unit, configured to construct a relationship network according to each node and the relationship between the nodes, the edges in the relationship network are the relationship between the nodes, and the edges in the relationship network are the relationship between the nodes. The weight of the edge is the degree of association between nodes with an association relationship;

分组单元,用于利用粒子群算法对所述关系网进行划分,得到多个子关系网,其中,每个所述子关系网为一个分组。The grouping unit is used to divide the relationship network by using the particle swarm algorithm to obtain a plurality of sub-relationship networks, wherein each of the sub-relationship networks is a group.

本发明实施例中,根据数据库中元数据的来源信息确定由元数据构成的节点之间的关联关系,以及节点之间的关联度,根据节点与其他节点之间的关联关系将数据库中所有节点构成关系网,并且在确定元数据的关系网后利用粒子群算法对关系网进行划分,能够得到每一个子网络,将子网络作为一个分组。本发明实施例中,根据各节点及所述各节点之间的关联关系构建关系网后,在元数据的资料缺乏的情况下,仍然能够将元数据进行分组。In the embodiment of the present invention, the association relationship between the nodes formed by the metadata and the association degree between the nodes are determined according to the source information of the metadata in the database, and all nodes in the database are classified according to the association relationship between the node and other nodes. The relationship network is formed, and after determining the relationship network of the metadata, the particle swarm algorithm is used to divide the relationship network, and each sub-network can be obtained, and the sub-network can be regarded as a group. In the embodiment of the present invention, after the relationship network is constructed according to each node and the association relationship between the various nodes, the metadata can still be grouped in the case of lack of metadata data.

进一步地,所述确定单元,具体用于:Further, the determining unit is specifically used for:

若第一节点中的一个元数据来源于第二节点,则确定所述第一节点与所述第二节点之间存在一个关联关系;If a piece of metadata in the first node originates from the second node, determining that there is an association relationship between the first node and the second node;

将所述第一节点与所述第二节点之间存在的关联关系的数量确定为所述第一节点与所述第二节点之间的关联度,其中,所述第一节点与所述第二节点为所述数据库中的任意两个不同的节点。The number of associations existing between the first node and the second node is determined as the association degree between the first node and the second node, wherein the first node and the second node are Two nodes are any two different nodes in the database.

进一步地,所述确定单元,具体用于:Further, the determining unit is specifically used for:

若第一节点中的一个元数据来源于第二节点,则确定所述第一节点与所述第二节点之间存在一个关联关系;If a piece of metadata in the first node originates from the second node, determining that there is an association relationship between the first node and the second node;

将所述第一节点与所述第二节点之间存在的关联关系的数量确定为所述第一节点与所述第二节点之间的关联度,其中,所述第一节点与所述第二节点为所述数据库中的任意两个不同的节点。The number of associations existing between the first node and the second node is determined as the association degree between the first node and the second node, wherein the first node and the second node are Two nodes are any two different nodes in the database.

进一步地,所述关系网构建单元,具体用于:Further, the relationship network building unit is specifically used for:

若所述第一节点与所述第二节点之间存在关联关系,则确定所述第一节点与所述第二节点在所述关系网中存在一条边,其中,所述边的权重为所述第一节点与所述第二节点之间的关联度。If there is an association relationship between the first node and the second node, it is determined that an edge exists between the first node and the second node in the relationship network, where the weight of the edge is The degree of association between the first node and the second node.

进一步地,所述关系网构建单元,还用于:Further, the relationship network building unit is also used for:

若所述关系网中存在无效节点,则将所述无效节点删除,其中,所述无效节点为与所述关系网中其他节点的相关性小于合并阈值且仅作为所述关系网中其他节点的元数据的输入来源,或,所述无效节点为与所述关系网中其他节点的相关性小于合并阈值且所述无效节点仅作为所述关系网中其他节点的元数据的输出来源,或所述无效节点为与所述关系网中其他节点的相关性为零。If there is an invalid node in the relationship network, the invalid node is deleted, wherein the invalid node has a correlation with other nodes in the relationship network that is less than the merging threshold and is only used as the other node in the relationship network. The input source of metadata, or, the invalid node is the correlation with other nodes in the relationship network is less than the merging threshold and the invalid node is only used as the output source of metadata of other nodes in the relationship network, or all The invalid node is zero correlation with other nodes in the relationship network.

进一步地,所述关系网构建单元,还用于:Further, the relationship network building unit is also used for:

若所述关系网中存在一组节点满足如下关系,则将所述一组节点合并为一个节点;If there is a group of nodes in the relationship network that satisfies the following relationship, the group of nodes is merged into one node;

所述关系为所述一组节点中的节点个数不少于2个,且每两个存在输出和输入关系节点的相关性不小于合并阈值。The relationship is that the number of nodes in the group of nodes is not less than 2, and the correlation between every two nodes with an output and input relationship is not less than the merging threshold.

进一步地,所述分组单元,还用于:Further, the grouping unit is also used for:

将所述子关系网中出现频率最高的描述信息作为所述子关系网的标识信息。The description information with the highest frequency in the sub-relationship network is used as the identification information of the sub-relationship network.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1为本发明实施例提供的一种元数据分组方法的流程示意图;1 is a schematic flowchart of a method for grouping metadata according to an embodiment of the present invention;

图2为本发明实施例中节点与元数据的示意图;2 is a schematic diagram of nodes and metadata in an embodiment of the present invention;

图3为本发明实施例中节点与元数据的示意图;3 is a schematic diagram of nodes and metadata in an embodiment of the present invention;

图4为本发明实施例中数据库中节点之间元数据的输入输出关系示意图;4 is a schematic diagram of an input and output relationship of metadata between nodes in a database in an embodiment of the present invention;

图5为本发明实施例中数据库中节点之间元数据的输入输出关系示意图;5 is a schematic diagram of an input and output relationship of metadata between nodes in a database in an embodiment of the present invention;

图6为本发明实施例中优化前与优化后的关系网的示意图;6 is a schematic diagram of a relationship network before and after optimization in an embodiment of the present invention;

图7为本发明实施例提供的利用优化方法对关系网进行优化后的关系网示意图;7 is a schematic diagram of a relationship network after the optimization method is used to optimize the relationship network according to an embodiment of the present invention;

图8为本发明实施例中关系网在分组前的节点之间的关联关系示意图;8 is a schematic diagram of an association relationship between nodes in a relationship network before grouping in an embodiment of the present invention;

图9为本发明实施例中关系网进行分组时节点之间的关联关系示意图;9 is a schematic diagram of an association relationship between nodes when a relationship network is grouped in an embodiment of the present invention;

图10为本发明实施例中利用粒子群算法对关系网进行分组过程后节点之间的关联关系示意图;10 is a schematic diagram of the association relationship between nodes after the particle swarm algorithm is used to group the relationship network in an embodiment of the present invention;

图11为本发明实施例提供的一种元数据分组装置的结构示意图。FIG. 11 is a schematic structural diagram of an apparatus for grouping metadata according to an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明作进一步地详细描述,显然,所描述的实施例仅仅是本发明一部份实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. . Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明实施例提供一种元数据分组方法,如图1所示,包括An embodiment of the present invention provides a metadata grouping method, as shown in FIG. 1 , including:

步骤101,获取数据库的各节点中的元数据的来源信息;Step 101, obtaining source information of metadata in each node of the database;

步骤102,根据所述各节点中的元数据的来源信息确定各节点之间的关联关系和关联度;Step 102, determining the association relationship and association degree between the nodes according to the source information of the metadata in the nodes;

步骤103,根据所述各节点及所述各节点之间的关联关系构建关系网,所述关系网中的边为所述各节点之间的关联关系,所述关系网中的边的权重为具有关联关系的节点间关联度;Step 103: Build a relational network according to the nodes and the associations between the nodes, the edges in the relational network are the associations between the nodes, and the weights of the edges in the relational network are: The degree of association between nodes with an association relationship;

步骤104,利用粒子群算法对所述关系网进行划分,得到多个子关系网,其中,每个所述子关系网为一个分组。Step 104 , using the particle swarm algorithm to divide the relationship network to obtain a plurality of sub-relationship networks, wherein each of the sub-relationship networks is a group.

在步骤101中,通过获取数据库中各节点中的元数据的来源信息,可以得知各节点中元数据的输入输出关系,在本发明实施例中,可以认为节点就是保存在数据库中的表,而元数据就是表中的字段,例如如图2所示,a代表数据库中的表,a表中的“Name”、“Class”、“Address”表示的是字段,也就是说,在本发明实施例中,a表就是节点,而“Name”、“Class”、“Address”字段就是a表中的元数据。In step 101, by acquiring the source information of the metadata in each node in the database, the input and output relationship of the metadata in each node can be known. In this embodiment of the present invention, it can be considered that a node is a table stored in the database, Metadata is a field in a table. For example, as shown in Figure 2, a represents a table in a database, and "Name", "Class", and "Address" in table a represent fields, that is, in the present invention In the embodiment, the a table is the node, and the "Name", "Class", and "Address" fields are the metadata in the a table.

在本发明实施例中,元数据的来源信息指的是元数据是从哪个节点中输出的,或者是从数据库哪一层输入到数据库中,并执行了几个元数据合并的语言,将几个元数据组成了一个节点的,等等,所以元数据的来源信息可以通过多种途径来获得,例如通过获得建立数据库中表的建立过程,获得元数据的来源信息,本发明提供两种获得元数据的来源信息。In this embodiment of the present invention, the source information of the metadata refers to which node the metadata is output from, or from which layer of the database it is input into the database, and executes several languages for merging metadata, combining several Each metadata constitutes a node, etc., so the source information of metadata can be obtained in various ways, such as obtaining the source information of metadata by obtaining the establishment process of establishing a table in the database, and the present invention provides two kinds of obtaining. Source information for metadata.

方法一method one

在本发明实施例中,可以通过建立数据库中表的建立过程,获得表中元数据的来源信息,如图3所示,数据库中的表b即节点b由“Name”、“Class”、“Address”、“Age”、“Sex”元数据构成,如表1所示,每个元数据的来源信息都能够从表1中获得。In the embodiment of the present invention, the source information of the metadata in the table can be obtained through the establishment process of the table in the database. As shown in FIG. 3, the table b in the database, that is, the node b Address”, “Age”, and “Sex” metadata are composed, as shown in Table 1. The source information of each metadata can be obtained from Table 1.

元数据metadata 来源信息source information NameName 节点p输出node p output ClassClass 节点p输出node p output AddressAddress 节点f输出node f output AgeAge 节点m输出node m output SexSex 节点v输出node v output

表1:节点b中元数据的来源信息表Table 1: Source information table of metadata in node b

方法二Method Two

在本发明实施例中,根据节点与节点的实体关系图,来确定节点中元数据的来源信息,例如,如图4所示,节点c中有5个元数据,元数据1来源于节点d,元数据2来源于节点a,元数据3来源于元数据e,元数据4和元数据5来源于元数据b。In the embodiment of the present invention, the source information of the metadata in the node is determined according to the entity relationship diagram between the nodes , metadata 2 comes from node a, metadata 3 comes from metadata e, metadata 4 and metadata 5 come from metadata b.

在本发明实施例中,只以以上两种方法作为如何获取元数据的来源信息的说明,还可以使用其它方法获得元数据的来源信息。In the embodiment of the present invention, the above two methods are only used as the description of how to obtain the source information of the metadata, and other methods may also be used to obtain the source information of the metadata.

在本发明实施例中,在获取节点中的来源信息后,就可以获得节点与节点之间的关联关系以及关联度,在本发明实施例中,如图4和表1所示,节点b与节点p、节点f、节点m、节点v有关联关系,节点c与节点a、节点b节点d、节点e有关联关系;In the embodiment of the present invention, after obtaining the source information in the node, the association relationship and association degree between the nodes can be obtained. In the embodiment of the present invention, as shown in FIG. 4 and Table 1, node b and Node p, node f, node m, and node v are associated, and node c is associated with node a, node b, node d, and node e;

在本发明实施例中,关联度为有关联关系之间的节点之间的元数据的交互数量,可选的,可以利用两个节点之间元数据的输入输出关系,即元数据的来源信息确定两个节点的关联度,如表1所示,节点c与节点p之间有两个元数据的输入输出,所以关联度可以设置为2,节点c与节点f、节点m、节点v之间有一个元数据的输入输出,所以关联度可以设置为1。In this embodiment of the present invention, the degree of association is the number of metadata interactions between nodes that have an association relationship. Optionally, the input and output relationship of metadata between two nodes can be used, that is, the source information of metadata. Determine the degree of association of two nodes, as shown in Table 1, there are two metadata inputs and outputs between node c and node p, so the degree of association can be set to 2, and the relationship between node c and node f, node m, and node v is There is a metadata input and output between, so the correlation can be set to 1.

在本发明实施例中,根据上述两种方法确定的元数据的来源信息,可以获得两个有元数据输入输出的节点之间的关联关系,例如如图5所示,数据库中存在节点v1、节点v2、节点v3、节点v4之间存在关联关系,将节点v1与节点v2之间的关联关系定义为一条边,将节点v2与节点v3之间存在的两个关联关系定义为两条边,节点v3与v4之前的关联关系定义为一条边。In the embodiment of the present invention, according to the source information of metadata determined by the above two methods, the association relationship between two nodes with metadata input and output can be obtained. For example, as shown in FIG. 5 , there are nodes v1, There is an association relationship between node v2, node v3, and node v4. The association between node v1 and node v2 is defined as one edge, and the two association relationships between node v2 and node v3 are defined as two edges. The association between nodes v3 and v4 is defined as an edge.

在本发明实施例中,可选的,可以定义数据库中的节点,即数据库中的表为一个四元组,即一个节点包括节点中元数据的来源信息,节点中输出元数据的集合,节点中输入元数据的组合,以及节点的文字性描述,例如节点a中由“Name”、“Score1”、“Score2”、“Score3”四个元数据构成,节点a还包括了“Name”字段来自节点m的输出,“Score1”字段来自节点n的输出,“Score2”、“Score3”为在建立节点a时就存在的字段,节点a中“Name”、“Score1”为输入字段的集合,“Score2”、“Score3”字段为输出字段的集合,对节点a的文字性描述为“成绩分数表”。In this embodiment of the present invention, optionally, a node in the database may be defined, that is, a table in the database is a quadruple, that is, a node includes source information of metadata in the node, a set of output metadata in the node, and a node The combination of input metadata and the textual description of the node. For example, node a is composed of four metadata: "Name", "Score1", "Score2", and "Score3". Node a also includes the "Name" field from The output of node m, the "Score1" field comes from the output of node n, "Score2" and "Score3" are fields that exist when node a is established, "Name" and "Score1" in node a are the set of input fields, " The fields "Score2" and "Score3" are sets of output fields, and the literal description of node a is "score table".

在本发明实施例中,可以用权重来表示两个节点的关联度,如表1所示,节点c与节点p之间的关联度为2,可以设置节点c与节点p之间的权重为2,节点c与节点f、节点m、节点v之间的关联度为1,可以设置节点c与节点f、节点m、节点v之间的权重为1,或者根据图5所示的数据库中节点之间的实体关系图,节点v1与与节点v2之间的权重可以设置为1,节点v2与节点v3之间的权重可以设置为2,节点v3与节点v4之间的权重可以设置为1。In this embodiment of the present invention, a weight can be used to represent the degree of association between two nodes. As shown in Table 1, the degree of association between node c and node p is 2, and the weight between node c and node p can be set as 2. The degree of association between node c and node f, node m, and node v is 1, and the weight between node c and node f, node m, and node v can be set to 1, or according to the database shown in Figure 5. In the entity relationship diagram between nodes, the weight between node v1 and node v2 can be set to 1, the weight between node v2 and node v3 can be set to 2, and the weight between node v3 and node v4 can be set to 1 .

在本发明实施例中,还可以通过两个节点之间的相关性来确定边的权重,例如图5中节点v3有两个元数据来源于节点v2,即v2的三个元数据中有两个输出给了v3,即这条边的相关性可以确定为2/3。In this embodiment of the present invention, the weight of an edge can also be determined by the correlation between two nodes. For example, in FIG. 5, node v3 has two metadata derived from node v2, that is, two of the three metadata of v2 The output is given to v3, that is, the correlation of this edge can be determined to be 2/3.

在本发明实施例中,根据节点与节点之间的边以及节点与节点之间的权重可以构建数据库中节点与节点之间的关系网,如图5所示,数据库中存在一个关系网,节点v1与v2之间有关联关系,并且v1与v2之间的边的权重为1,节点v2与v3之间有关联关系,并且v2与v3之间的边的权重为2,节点v3与节点v4之间有关联关系,并且v3与v4之间的边的权重为1。In this embodiment of the present invention, a relational network between nodes in the database can be constructed according to the edges between nodes and the weights between nodes and nodes. As shown in FIG. 5 , there is a relational network in the database. There is an association relationship between v1 and v2, and the weight of the edge between v1 and v2 is 1, and there is an association relationship between nodes v2 and v3, and the weight of the edge between v2 and v3 is 2, and node v3 and node v4 There is a relationship between them, and the weight of the edge between v3 and v4 is 1.

在本发明实施例中,将数据库中有关联关系的节点通过边和权重连接起来,形成节点之间有元数据输入输出关系的关系网,避免现有技术中通过元数据的属性描述缺乏时,无法对元数据进行有限分组的问题。In the embodiment of the present invention, the nodes in the database are connected through edges and weights to form a relationship network with metadata input and output relationships between nodes, so as to avoid the lack of attribute descriptions through metadata in the prior art, Problems with limited grouping of metadata.

进一步地,在建立了数据库中节点之间的关系网后,还需要对关系网进行优化,在本发明实施例中,提供几种关系网优化的方法对关系网进行优化。Further, after the relationship network between nodes in the database is established, the relationship network needs to be optimized. In the embodiment of the present invention, several methods for optimizing the relationship network are provided to optimize the relationship network.

方法一method one

如图6所示,图6中(a)图为优化前的关系网,节点v2与节点v3之间存在两条边,为了能够简化关系网中边的数量,在边的描述上加入权重,表示两个有关联关系的节点之间有多少条边,例如节点v2与v3之间有两条边,节点v2与v3之间的边的权重为2,则简化后的关系网如图6中(b)图所示,节点v2与v3之间只存在一条边,这条边的权重为2,因此在简化后的关系网中每两个有关联关系的节点之间都只存在一条边。As shown in Figure 6, (a) in Figure 6 is the relationship network before optimization. There are two edges between node v2 and node v3. In order to simplify the number of edges in the relationship network, weights are added to the description of the edges. Indicates how many edges there are between two related nodes. For example, there are two edges between nodes v2 and v3, and the weight of the edge between nodes v2 and v3 is 2, then the simplified relationship network is shown in Figure 6. (b) As shown in the figure, there is only one edge between nodes v2 and v3, and the weight of this edge is 2, so there is only one edge between every two related nodes in the simplified relationship network.

方法二Method Two

在本发明实施例中,对关系网进行优化还需要删除无效节点。在本发明实施例中,无效节点指的是与所述关系网中其他节点的相关性小于合并阈值且仅作为所述关系网中其他节点的元数据的输入来源,或,所述无效节点为与所述关系网中其他节点的相关性小于合并阈值且所述无效节点仅作为所述关系网中其他节点的元数据的输出来源,或所述无效节点为与所述关系网中其他节点的相关性为零。In the embodiment of the present invention, the optimization of the relationship network also needs to delete invalid nodes. In this embodiment of the present invention, the invalid node refers to that the correlation with other nodes in the relationship network is less than the merging threshold and is only used as an input source of metadata of other nodes in the relationship network, or the invalid node is The correlation with other nodes in the relationship network is less than the merge threshold and the invalid node is only used as an output source of metadata of other nodes in the relationship network, or the invalid node is related to other nodes in the relationship network. Correlation is zero.

在本发明实施例中,合并阈值指的是两个有关联关系的节点可以合并为一个节点的可能性,两个节点之间的相关性不小于合并阈值,则说明两个节点可以合并为一个节点,两个节点之间的相关性小于合并阈值,则说明两个节点不能合并为一个节点。In this embodiment of the present invention, the merging threshold refers to the possibility that two associated nodes can be merged into one node, and the correlation between the two nodes is not less than the merging threshold, indicating that the two nodes can be merged into one If the correlation between the two nodes is less than the merge threshold, it means that the two nodes cannot be merged into one node.

在本发明实施例中,两个节点之间的相关性为两个节点之间元数据的输入输出关系的值,例如,如图7所示,数据库中存在一组节点,节点vv2与节点vv3,节点vv2与vv3之间存在一条边,权重为3即说明节点vv3中的元数据全部来源于节点vv2,或者节点vv2中的元数据全部来源于节点vv3,则节点vv2与vv3之间的边的相关性为2/3。In this embodiment of the present invention, the correlation between two nodes is the value of the input-output relationship of metadata between the two nodes. For example, as shown in FIG. 7 , there is a set of nodes in the database, node vv2 and node vv3 , there is an edge between nodes vv2 and vv3, and the weight is 3, which means that all metadata in node vv3 comes from node vv2, or all metadata in node vv2 comes from node vv3, then the edge between nodes vv2 and vv3 The correlation is 2/3.

可选的,在本发明实施例中,由于两个节点之间的相关性计算是通过数学计算方式来确定的,所以相关性计算结果的值为小于等于1的正数,可以设置合并阈值为0.8,即两个节点之间的相关性的计算结果取模的值大于或者等于0.8,则认为两个节点可以合并为一个节点。Optionally, in this embodiment of the present invention, since the correlation calculation between two nodes is determined by mathematical calculation, the value of the correlation calculation result is a positive number less than or equal to 1, and the merging threshold can be set as 0.8, that is, if the value of the modulus of the calculation result of the correlation between the two nodes is greater than or equal to 0.8, it is considered that the two nodes can be merged into one node.

在本发明实施例中,存在三种无效节点,这些无效节点在关系网划分中会影响划分效果,所以需要将这些无效节点进行删除。一种是存在与数据库中与其它节点的关联度小于合并阈值的节点,即该节点不能与其它节点合并,并且,该节点中的元数据是全部用于输出给其它节点的。In the embodiment of the present invention, there are three kinds of invalid nodes, and these invalid nodes will affect the division effect in the division of the relationship network, so these invalid nodes need to be deleted. One is that there is a node whose degree of association with other nodes in the database is less than the merging threshold, that is, the node cannot be merged with other nodes, and all metadata in this node is used for output to other nodes.

另一种是存在与数据库中与其它节点的关联度小于合并阈值的节点,即该节点不能与其它节点合并,并且,该节点中的元数据全部来自于其它节点的输出。The other is that there is a node whose degree of association with other nodes in the database is less than the merging threshold, that is, the node cannot be merged with other nodes, and all the metadata in this node comes from the output of other nodes.

还有一种就是在建立关系网后,还有一些节点与其它节点之间没有关联关系,即节点与其它节点之间不存在元数据的输入输出关系,与关系网中其它节点之间的关联度为零,则不能够按照元数据的输入输出关系将元数据进行分组,所以,将这一类型的节点删除。Another is that after the relationship network is established, there are still some nodes that are not associated with other nodes, that is, there is no metadata input and output relationship between the node and other nodes, and the degree of association between other nodes in the relationship network. If it is zero, the metadata cannot be grouped according to the input and output relationship of the metadata, so delete this type of node.

方法三Method three

在本发明实施例中,在关系网中有一组节点,节点的个数不少于n,边不少于n-1,且每个边对应的两个节点之间的相关性不小于合并阈值,则将该组中的所有节点合并为一个节点。In the embodiment of the present invention, there is a set of nodes in the relationship network, the number of nodes is not less than n, the edge is not less than n-1, and the correlation between two nodes corresponding to each edge is not less than the merging threshold , all nodes in the group are merged into one node.

例如如图7所示,合并阈值为0.5,则节点vv2与节点vv3之间边的相关性为2/3,2/30.5,所以节点vv2与vv3可以合并为一个节点。For example, as shown in Figure 7, if the merge threshold is 0.5, the edge correlation between node vv2 and node vv3 is 2/3, 2/30.5, so nodes vv2 and vv3 can be merged into one node.

可选的,在本发明实施例中,还可以通过以下公式计算节点与节点之间的相关性:Optionally, in this embodiment of the present invention, the correlation between nodes may also be calculated by the following formula:

Figure BDA0000972412130000121
Figure BDA0000972412130000121

其中,mout为节点的输出给另一个节点的元数据,min为另一个节点向该节点输入的元数据,在本发明实施例中,可以将节点中元数据的输入输出关系用向量来表示,例如,如图7所示的两个节点,节点vv2中向节点vv3输出三个元数据,节点vv3向节点vv2输出三个元数据,在本发明实施例中,可以将元数据映射到哈希表中,然后映射出的哈希值取模,就可以得到每个元数据的计算出的一串向量,例如元数据name的字段通过哈希函数计算后对计算结果进行取模,在取模后的结果会映射到向量的一个单元格内,例如对哈希函数计算后的结果是除以1000取模的,那么就有1000个单元格,如果取模后的结果是落在第900个单元格内,则将该单元格的位置的值取1,其它位置为0,所以会形成(1,0,0,0,1,1,1,1,1,1,0……)的向量。Among them, m out is the metadata output from the node to another node, and min is the metadata input by another node to the node. In this embodiment of the present invention, the input-output relationship of the metadata in the node can be represented by a vector Indicates that, for example, in the two nodes shown in FIG. 7, the node vv2 outputs three metadata to the node vv3, and the node vv3 outputs three metadata to the node vv2. In this embodiment of the present invention, the metadata can be mapped to In the hash table, and then modulo the mapped hash value, a series of vectors calculated for each metadata can be obtained. For example, the field of the metadata name is calculated by the hash function, and then the calculation result is modulo. The modulo result will be mapped to a cell of the vector. For example, the result of the hash function calculation is divided by 1000 modulo, then there are 1000 cells. If the modulo result is in the first Within 900 cells, the value of the position of the cell is 1, and other positions are 0, so it will form (1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0... ) vector.

或者,在本发明实施例中,可以设置一个向量来表示节点中元数据的输入或者输出,设置0表示节点中的该元数据不是来源于其它节点,也不是输出给其它节点,设置1表示节点中的该元数据是其它节点中的元数据的输入,或者该元数据是输出给其它节点。Alternatively, in this embodiment of the present invention, a vector may be set to represent the input or output of metadata in the node, setting 0 means that the metadata in the node does not originate from other nodes, nor is it output to other nodes, setting 1 means that the node The metadata in is input to metadata in other nodes, or the metadata is output to other nodes.

在本发明实施例中,节点vv3的输入的元数据的向量为(1,0,1,1),节点vv3的向节点vv2输出向量为(1,0,1,1),则可以按照上式计算出vv3与vv2之间的相关性为1,则可以将节点vv2与vv3进行合并。In the embodiment of the present invention, the vector of the metadata input to the node vv3 is (1, 0, 1, 1), and the output vector of the node vv3 to the node vv2 is (1, 0, 1, 1). The formula calculates that the correlation between vv3 and vv2 is 1, then the nodes vv2 and vv3 can be merged.

在本发明实施例中,在进行计算节点相关性之前,还需要计算节点是否需要被拆分,例如,节点x1向节点x3输出“a”、“b”、“c”、“d”四个元数据,节点x2向节点x3输出“e”、“f”两个元数据,而节点x3中包括“a”、“b”、“c”、“d”、“e”、“f”六个元数据,则可以将节点x3拆分成节点x31以及节点x32,在拆分后节点x31中的所有字段都来自于节点x1的输出,则合并节点x1以及节点x31,在拆分后节点x32中的所有字段都来自于节点x2的输出,则合并节点x2以及x32。In this embodiment of the present invention, before calculating the node correlation, it is also necessary to calculate whether the nodes need to be split. For example, the node x1 outputs four "a", "b", "c", and "d" to the node x3 Metadata, node x2 outputs two metadata "e" and "f" to node x3, and node x3 includes "a", "b", "c", "d", "e", "f" six Metadata, then node x3 can be split into node x31 and node x32. After splitting, all fields in node x31 come from the output of node x1, then merge node x1 and node x31. After splitting, node x32 All fields in are from the output of node x2, then merge nodes x2 and x32.

在本发明实施例中,将建立的关系网进行优化后,就要对关系网进行分组,在本发明实施例中,利用粒子群算法和经典社区划分算法BGLL算法对本发明实施例中的关系网进行分组,避免了在传统分组凝聚过程中节点与节点之间,群与群之间反复融合导致的时间开销较大的问题。In the embodiment of the present invention, after optimizing the established relationship network, the relationship network should be grouped. The grouping avoids the problem of large time overhead caused by repeated fusion between nodes and groups in the traditional grouping aggregation process.

在本发明实施例中,对关系网进行分组的过程如下:In the embodiment of the present invention, the process of grouping relational networks is as follows:

如图8所示,数据库中存在一个关系网,由节点A、B、C、D组成,根据图8节点之间的边的关系,可以确定粒子群算法中粒子的初始位置为X1=(B,A,B,C)、X2=(B,C,B,C)、X3=(B,A,D,C)、X4=(B,C,D,C),在本发明实施例中,由于只给出了一个关系网中有四个节点,所以粒子群算法中粒子的初始位置为四个节点之间的连接关系,若数据库中的关系网为多个节点之间的连接关系,则粒子的初始位置的数量也对应增加。As shown in Figure 8, there is a relational network in the database, which consists of nodes A, B, C, and D. According to the relationship between the edges of the nodes in Figure 8, it can be determined that the initial position of the particle in the particle swarm algorithm is X1=(B , A, B, C), X2=(B, C, B, C), X3=(B, A, D, C), X4=(B, C, D, C), in the embodiment of the present invention , since there are only four nodes in one relational network, the initial position of the particle in the particle swarm algorithm is the connection relation between the four nodes. If the relational network in the database is the connection relation between multiple nodes, Then the number of initial positions of the particles also increases accordingly.

在本发明实施例中,在确定粒子的初始化位置后,还需要确定粒子的初始化速度,初始化速度和节点中每个边的权重有关,权重越大,则根据图8中权重值的不同,确定粒子的初始化速度为v1=(B,C,B,D)v2=(B,A,B,D)v3=(B,C,D,D)v4=(B,A,D,D)。In the embodiment of the present invention, after determining the initialization position of the particle, it is also necessary to determine the initialization speed of the particle. The initialization speed is related to the weight of each edge in the node. The initialization velocity of the particle is v1=(B,C,B,D)v2=(B,A,B,D)v3=(B,C,D,D)v4=(B,A,D,D).

在确定了粒子的初始化速度后,假设粒子X1的A初始化选择B,B初始化选择A,C初始化选择D,D初始化选择C。那么粒子x1将下图划分成两个群,如图9所示,节点边上的权重不变。After the initialization speed of the particle is determined, it is assumed that the A initialization of the particle X1 selects B, the B initialization selects A, the C initialization selects D, and the D initialization selects C. Then the particle x1 divides the following graph into two groups, as shown in Figure 9, and the weights on the node edges remain unchanged.

然后根据经典BGLL算法,计算粒子群中个体适应度以及全局适应度的值,在本发明实施例中,个体适应度的值的计算可以描述为

Figure BDA0000972412130000141
则ΔQv用经典BGLL算法进行计算,具体如下式所述:Then, according to the classical BGLL algorithm, the values of the individual fitness and the global fitness in the particle swarm are calculated. In this embodiment of the present invention, the calculation of the individual fitness value can be described as
Figure BDA0000972412130000141
Then ΔQ v is calculated by the classical BGLL algorithm, as described in the following formula:

Figure BDA0000972412130000142
Figure BDA0000972412130000142

在本发明实施例中,c表示加速常数,v表示种群数量,

Figure BDA0000972412130000143
是关系网中群体的内部边的权重,例如,如图9所示,群1内部边的权重为1,群2内部边的权重为2,
Figure BDA0000972412130000144
是关系网中所有群体的权重,即群1与群2的权重和为3,Wi是所有节点i相关的边权重,例如,节点A相关的边权中为1,Wi,in是节点i与群相连接的所有边的权重,m表示关系网中所有边的权重之和,在图9中由于只有两个节点,所以节点A与群相连接的所有边的权重还是等于1。In the embodiment of the present invention, c represents the acceleration constant, v represents the population number,
Figure BDA0000972412130000143
is the weight of the internal edge of the group in the relationship network. For example, as shown in Figure 9, the weight of the internal edge of group 1 is 1, and the weight of the internal edge of group 2 is 2.
Figure BDA0000972412130000144
is the weight of all groups in the relationship network, that is, the sum of the weights of group 1 and group 2 is 3, and Wi is the edge weight related to all node i , for example, the edge weight related to node A is 1, and Wi , in are the nodes i is the weight of all edges connected to the group, and m represents the sum of the weights of all edges in the relational network. In Figure 9, since there are only two nodes, the weight of all edges connected to the group by node A is still equal to 1.

在本发明实施例中,计算了ΔQv后,再计算个体适应度的值,全局适应度的值为fitg=max(pbest)。In the embodiment of the present invention, after calculating ΔQ v , the value of individual fitness is calculated, and the value of global fitness is fit g =max(p best ).

在每次迭代后,都需要更新粒子群中粒子的速度,其更新算法描述如下:After each iteration, the velocity of the particles in the particle swarm needs to be updated, and the update algorithm is described as follows:

Figure BDA0000972412130000146
Figure BDA0000972412130000146

其中

Figure BDA0000972412130000145
即粒子在新的群中的个体适应度最大值除以种群中所有粒子的个体适应度的和,即在粒子的处于m位置时,所有可能的粒子最优适应度进行平均化,然后按照随机值的方式选择适合粒子的m位置的速度。在对粒子群中的粒子速度进行更新后,调整粒子的位置。in
Figure BDA0000972412130000145
That is, the maximum individual fitness of the particle in the new group is divided by the sum of the individual fitness of all particles in the group, that is, when the particle is at the m position, the optimal fitness of all possible particles is averaged, and then randomly The value of the method chooses the velocity appropriate for the m position of the particle. Adjust the position of the particles after updating the particle velocities in the particle swarm.

循环上述步骤,直到满足fitn-fitn-1<ε,本发明实施例中ε为收敛阈值,当fitn-fitn-1<ε时,由于ε非常小。所以说明种群中粒子的位置不再发生大的变化时,则认为分组完毕。The above steps are repeated until fit n -fit n-1 <ε. In the embodiment of the present invention, ε is the convergence threshold. When fit n -fit n-1 <ε, ε is very small. Therefore, it is considered that the grouping is completed when the position of the particles in the population no longer changes greatly.

进一步地,在本发明实施例中,为了更好的对分组后的元数据进行管理,还需要对分组后的每个组进行标记,标记后的每个组的组名就是元数据的标记。Further, in this embodiment of the present invention, in order to better manage the grouped metadata, it is also necessary to mark each group after the grouping, and the group name of each marked group is the mark of the metadata.

在本发明实施例中,将分组后得到的每个子关系网中出现频率最高的描述信息作为子关系网的标识信息,例如如图10所示,将关系网中的7个节点分为两个子网,根据每个子关系网中节点的描述信息的出现频率作为子关系网的标识信息,第一子网中,节点A的描述信息为成绩,节点B的描述信息为成绩,节点E的描述信息为数学成绩,节点F的描述信息为平均成绩,所以可以将成绩作为第一子网的标识信息。In this embodiment of the present invention, the description information with the highest occurrence frequency in each sub-relationship network obtained after grouping is used as the identification information of the sub-relationship network. For example, as shown in FIG. 10 , seven nodes in the relationship network are divided into two sub-relationship networks In the first sub-network, the description information of node A is the achievement, the description information of node B is the achievement, and the description information of node E is the math score, and the description information of the node F is the average score, so the score can be used as the identification information of the first subnet.

可选的,在本发明实施例中,可以采用TF-IDF的方法对子关系网进行标识,在本发明实施例中,使用TF-IDF方法为每个子关系网进行标记,是由于TF-IDF方法的主要思想是,如果某个词或短语在一篇文章中出现的TF高,并且在其他文章中很少出现,则认为此词或者短语具有很好的类别区分能力,适合用来分类,TF是词频(Term Frequency)的缩写,指的是某一个给定的词语在该文件中出现的频率。IDF为逆向文件频率(Inverse DocumentFrequency)的缩写,是一个词语普遍重要性的度量。某一特定词语的IDF,可以由总文件数目除以包含该词语之文件的数目,再将得到的商取对数得到。在本发明实施例中,可以用以下公式来表示TF:Optionally, in this embodiment of the present invention, the TF-IDF method may be used to identify the sub-relationship network. The main idea of the method is that if a word or phrase has a high TF in one article and rarely appears in other articles, it is considered that the word or phrase has a good ability to distinguish between categories and is suitable for classification. TF is an abbreviation of Term Frequency, which refers to the frequency with which a given word appears in the document. IDF is an acronym for Inverse Document Frequency, a measure of the general importance of a word. The IDF for a particular word can be obtained by dividing the total number of documents by the number of documents containing the word, and then taking the logarithm of the quotient obtained. In this embodiment of the present invention, TF can be represented by the following formula:

Figure BDA0000972412130000151
Figure BDA0000972412130000151

在本发明实施例中,分子表示的是该词在文件中的出现次数,而分母则表示的是该词在文件中所有字词的出现次数之和。In this embodiment of the present invention, the numerator represents the number of occurrences of the word in the file, and the denominator represents the sum of the number of occurrences of all words in the file.

在本发明实施例中,可以用以下公式来表示IDF:In this embodiment of the present invention, the IDF can be represented by the following formula:

Figure BDA0000972412130000152
Figure BDA0000972412130000152

在本发明实施例中,TF表示词条在任一一个子关系网中出现的频率,例如如图10所示,TF表示节点A的描述信息为成绩在第一子网中出现的频率,在本发明实施例中成绩的TF为2/4=0.5,而IDF表示的是词条在所有节点A的描述信息在所有子关系网中出现的频率再对频率取对数,在本发明实施例中,对数据库中的节点进行分组后,得到5000个子关系网,而成绩这一描述信息在50个子关系网中出现过,则成绩的IDF为log(5000/50)=2,则根据TF-IDF方法计算的成绩这一描述信息的TF-IDF的值为0.5*2=1。In the embodiment of the present invention, TF represents the frequency of the entry in any sub-network. For example, as shown in FIG. 10 , TF represents the frequency that the description information of node A is the score in the first sub-network. In the embodiment of the present invention, the TF of the score is 2/4=0.5, and the IDF represents the frequency that the description information of the entry in all nodes A appears in all the sub-relationship networks, and then takes the logarithm of the frequency. In the embodiment of the present invention , after grouping the nodes in the database, 5000 sub-relationship networks are obtained, and the description information of grades has appeared in 50 sub-relationship networks, then the IDF of the grades is log(5000/50)=2, then according to TF- The TF-IDF value of the description information of the grade calculated by the IDF method is 0.5*2=1.

在本发明实施例中,根据计算的每个节点的描述信息的TF-IDF值对节点的描述信息进行排序,TF-IDF值越高,排序的位置越靠前,将排序后排序位置第一的描述信息作为该子关系网的标识信息。In the embodiment of the present invention, the description information of the nodes is sorted according to the calculated TF-IDF value of the description information of each node. The higher the TF-IDF value is, the higher the sorting position is, and the sorting position is the first after sorting. The description information is used as the identification information of the sub-network.

可选的,在本发明实施例中,还可以首先对子关系网中的节点的描述信息进行二元以及三元分词,再计算二元以及三元分词的TF-IDF的值,根据二元以及三元分词的TF-IDF的值进行排序,排序的位置越靠前,将排序后排序位置第一的二元分词或者三元分词作为该子关系网的标识信息。Optionally, in this embodiment of the present invention, binary and ternary word segmentation may be performed on the description information of the nodes in the sub-relationship network first, and then the TF-IDF values of the binary and ternary word segmentations may be calculated. And the TF-IDF value of the trigram is sorted. The higher the sorting position is, the first binary or trigram after the sorting is used as the identification information of the sub-relationship network.

例如,如图10所示,节点F的描述信息为平均成绩,则对F的描述信息为平均成绩进行二元分词为“平均”、“成绩”,对F的描述信息为平均成绩进行三元分词为“平均成”、“绩”,分别计算“平均”、“成绩”、“平均成”、“绩”的TF-IDF的值,根据“平均”、“成绩”、“平均成”、“绩”的TF-IDF的值进行排序,排序的位置越靠前,将排序后排序位置第一的“平均”、“成绩”、“平均成”或者“绩”作为该子关系网的标识信息。For example, as shown in Figure 10, the description information of node F is the average grade, then the description information of F is the average grade, and the binary word segmentation is "average" and "grade", and the description information of F is the average grade. The participles are "average achievement" and "achievement", and the TF-IDF values of "average", "grade", "average achievement" and "grade" are calculated respectively. Sort by the TF-IDF value of "Grade", the higher the sorting position is, the first "Average", "Grade", "Average Grade" or "Grade" in the sorting position will be used as the identifier of the sub-relationship network information.

本发明还提供一种元数据分组装置,如图11所示,包括:The present invention also provides a metadata grouping device, as shown in Figure 11, comprising:

获取单元S1001,用于获取数据库的各节点中的元数据的来源信息;Obtaining unit S1001, for obtaining source information of metadata in each node of the database;

确定单元S1002,用于根据所述各节点中的元数据的来源信息确定各节点之间的关联关系和关联度;Determining unit S1002, configured to determine the association relationship and association degree between the nodes according to the source information of the metadata in the nodes;

关系网构建单元S1003,用于根据所述各节点及所述各节点之间的关联关系构建关系网,所述关系网中的边为所述各节点之间的关联关系,所述关系网中的边的权重为具有关联关系的节点间关联度;A relationship network construction unit S1003, configured to construct a relationship network according to the nodes and the association relationship between the nodes, the edges in the relationship network are the association relationships between the nodes, and the edges in the relationship network The weight of the edge is the degree of association between nodes with an association relationship;

分组单元S1004,用于利用粒子群算法对所述关系网进行划分,得到多个子关系网,其中,每个所述子关系网为一个分组。The grouping unit S1004 is configured to use the particle swarm algorithm to divide the relationship network to obtain a plurality of sub-relationship networks, wherein each of the sub-relationship networks is a group.

进一步地,所述确定单元S1002,具体用于:Further, the determining unit S1002 is specifically used for:

若第一节点中的一个元数据来源于第二节点,则确定所述第一节点与所述第二节点之间存在一个关联关系;If a piece of metadata in the first node originates from the second node, determining that there is an association relationship between the first node and the second node;

将所述第一节点与所述第二节点之间存在的关联关系的数量确定为所述第一节点与所述第二节点之间的关联度,其中,所述第一节点与所述第二节点为所述数据库中的任意两个不同的节点。The number of associations existing between the first node and the second node is determined as the association degree between the first node and the second node, wherein the first node and the second node are Two nodes are any two different nodes in the database.

进一步地,所述关系网构建单元S1003,具体用于:Further, the relationship network construction unit S1003 is specifically used for:

若所述第一节点与所述第二节点之间存在关联关系,则确定所述第一节点与所述第二节点在所述关系网中存在一条边,其中,所述边的权重为所述第一节点与所述第二节点之间的关联度。If there is an association relationship between the first node and the second node, it is determined that an edge exists between the first node and the second node in the relationship network, where the weight of the edge is The degree of association between the first node and the second node.

进一步地,所述关系网构建单元S1003,还用于:Further, the relationship network construction unit S1003 is also used for:

若所述关系网中存在无效节点,则将所述无效节点删除,其中,所述无效节点为与所述关系网中其他节点的相关性小于合并阈值且仅作为所述关系网中其他节点的元数据的输入来源,或,所述无效节点为与所述关系网中其他节点的相关性小于合并阈值且所述无效节点仅作为所述关系网中其他节点的元数据的输出来源,或所述无效节点为与所述关系网中其他节点的相关性为零。If there is an invalid node in the relationship network, the invalid node is deleted, wherein the invalid node has a correlation with other nodes in the relationship network that is less than the merging threshold and is only used as the other node in the relationship network. The input source of metadata, or, the invalid node is the correlation with other nodes in the relationship network is less than the merging threshold and the invalid node is only used as the output source of metadata of other nodes in the relationship network, or all The invalid node is zero correlation with other nodes in the relationship network.

进一步地,所述关系网构建单元S1003,还用于:Further, the relationship network construction unit S1003 is also used for:

若所述关系网中存在一组节点满足如下关系,则将所述一组节点合并为一个节点;If there is a group of nodes in the relationship network that satisfies the following relationship, the group of nodes is merged into one node;

所述关系为所述一组节点中的节点个数不少于2个,且每两个存在输出和输入关系节点的相关性不小于合并阈值。The relationship is that the number of nodes in the group of nodes is not less than 2, and the correlation between every two nodes with an output and input relationship is not less than the merging threshold.

进一步地,所述分组单元S1004,还用于:Further, the grouping unit S1004 is also used for:

将所述子关系网中出现频率最高的描述信息作为所述子关系网的标识信息。The description information with the highest frequency in the sub-relationship network is used as the identification information of the sub-relationship network.

本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although preferred embodiments of the present invention have been described, additional changes and modifications to these embodiments may occur to those skilled in the art once the basic inventive concepts are known. Therefore, the appended claims are intended to be construed to include the preferred embodiment and all changes and modifications that fall within the scope of the present invention.

显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the invention. Thus, provided that these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.

Claims (10)

1.一种元数据分组方法,其特征在于,所述方法包括:1. A metadata grouping method, wherein the method comprises: 获取数据库的各节点中的元数据的来源信息;Obtain the source information of metadata in each node of the database; 根据所述各节点中的元数据的来源信息确定各节点之间的关联关系和关联度;Determine the association relationship and association degree between the nodes according to the source information of the metadata in the nodes; 根据所述各节点及所述各节点之间的关联关系构建关系网,所述关系网中的边为所述各节点之间的关联关系,所述关系网中的边的权重为具有关联关系的节点间关联度;A relational network is constructed according to the nodes and the associations between the nodes, the edges in the relational network are the associations between the nodes, and the weights of the edges in the relational network are associations The degree of correlation between nodes; 利用粒子群算法对所述关系网进行划分,得到多个子关系网,其中,每个所述子关系网为一个分组;The relationship network is divided by particle swarm algorithm to obtain a plurality of sub-relationship networks, wherein each of the sub-relationship networks is a group; 所述根据所述各节点中的元数据的来源信息确定各节点之间的关联关系和关联度,包括:The determining the association relationship and association degree between the nodes according to the source information of the metadata in the nodes includes: 若第一节点中的一个元数据来源于第二节点,则确定所述第一节点与所述第二节点之间存在一个关联关系;If a piece of metadata in the first node originates from the second node, determining that there is an association relationship between the first node and the second node; 将所述第一节点与所述第二节点之间存在的关联关系的数量确定为所述第一节点与所述第二节点之间的关联度,其中,所述第一节点与所述第二节点为所述数据库中的任意两个不同的节点。The number of associations existing between the first node and the second node is determined as the association degree between the first node and the second node, wherein the first node and the second node are Two nodes are any two different nodes in the database. 2.根据权利要求1所述的方法,其特征在于,所述根据所述各节点及所述各节点之间的关联关系构建关系网,包括:2. The method according to claim 1, wherein the constructing a relationship network according to the nodes and the associations between the nodes comprises: 若所述第一节点与所述第二节点之间存在关联关系,则确定所述第一节点与所述第二节点在所述关系网中存在一条边,其中,所述边的权重为所述第一节点与所述第二节点之间的关联度。If there is an association relationship between the first node and the second node, it is determined that an edge exists between the first node and the second node in the relationship network, where the weight of the edge is The degree of association between the first node and the second node. 3.根据权利要求1所述的方法,其特征在于,所述利用粒子群算法对所述关系网进行划分,得到多个子关系网之前,还包括:3. The method according to claim 1, characterized in that before the use of particle swarm algorithm to divide the relationship network to obtain a plurality of sub-relationship networks, the method further comprises: 若所述关系网中存在无效节点,则将所述无效节点删除,其中,所述无效节点为与所述关系网中其他节点的相关性小于合并阈值且仅作为所述关系网中其他节点的元数据的输入来源,或,所述无效节点为与所述关系网中其他节点的相关性小于合并阈值且所述无效节点仅作为所述关系网中其他节点的元数据的输出来源,或所述无效节点为与所述关系网中其他节点的相关性为零。If there is an invalid node in the relationship network, the invalid node is deleted, wherein the invalid node has a correlation with other nodes in the relationship network that is less than the merging threshold and is only used as the other node in the relationship network. The input source of metadata, or, the invalid node is the correlation with other nodes in the relationship network is less than the merging threshold and the invalid node is only used as the output source of metadata of other nodes in the relationship network, or all The invalid node has zero correlation with other nodes in the relationship network. 4.根据权利要求1所述的方法,其特征在于,所述利用粒子群算法对所述关系网进行划分,得到多个子关系网之前,还包括:4 . The method according to claim 1 , wherein, before dividing the relationship network by using particle swarm algorithm to obtain a plurality of sub-relationship networks, the method further comprises: 5 . 若所述关系网中存在一组节点满足如下关系,则将所述一组节点合并为一个节点;If there is a group of nodes in the relationship network that satisfies the following relationship, the group of nodes is merged into one node; 所述关系为所述一组节点中的节点个数不少于2个,且每两个存在输出和输入关系节点的相关性不小于合并阈值。The relationship is that the number of nodes in the group of nodes is not less than 2, and the correlation between every two nodes with an output and input relationship is not less than the merging threshold. 5.根据权利要求1所述的方法,其特征在于,所述利用粒子群算法对所述关系网进行划分,得到多个子关系网之后,还包括:5 . The method according to claim 1 , wherein after dividing the relationship network by using particle swarm algorithm to obtain a plurality of sub-relationship networks, the method further comprises: 6 . 将所述子关系网中出现频率最高的描述信息作为所述子关系网的标识信息。The description information with the highest frequency in the sub-relationship network is used as the identification information of the sub-relationship network. 6.一种元数据分组装置,其特征在于,包括:6. A device for grouping metadata, comprising: 获取单元,用于获取数据库的各节点中的元数据的来源信息;an acquisition unit, used to acquire source information of metadata in each node of the database; 确定单元,用于根据所述各节点中的元数据的来源信息确定各节点之间的关联关系和关联度;a determining unit, configured to determine the association relationship and association degree between the nodes according to the source information of the metadata in the nodes; 关系网构建单元,用于根据所述各节点及所述各节点之间的关联关系构建关系网,所述关系网中的边为所述各节点之间的关联关系,所述关系网中的边的权重为具有关联关系的节点间关联度;A relationship network construction unit, configured to construct a relationship network according to each node and the relationship between the nodes, the edges in the relationship network are the relationship between the nodes, and the edges in the relationship network are the relationship between the nodes. The weight of the edge is the degree of association between nodes with an association relationship; 分组单元,用于利用粒子群算法对所述关系网进行划分,得到多个子关系网,其中,每个所述子关系网为一个分组;a grouping unit, configured to divide the relationship network by using the particle swarm algorithm to obtain a plurality of sub-relationship networks, wherein each of the sub-relationship networks is a group; 所述确定单元,具体用于:The determining unit is specifically used for: 若第一节点中的一个元数据来源于第二节点,则确定所述第一节点与所述第二节点之间存在一个关联关系;If a piece of metadata in the first node originates from the second node, determining that there is an association relationship between the first node and the second node; 将所述第一节点与所述第二节点之间存在的关联关系的数量确定为所述第一节点与所述第二节点之间的关联度,其中,所述第一节点与所述第二节点为所述数据库中的任意两个不同的节点。The number of associations existing between the first node and the second node is determined as the association degree between the first node and the second node, wherein the first node and the second node are Two nodes are any two different nodes in the database. 7.根据权利要求6所述的装置,其特征在于,所述关系网构建单元,具体用于:7. The device according to claim 6, wherein the relationship network building unit is specifically used for: 若所述第一节点与所述第二节点之间存在关联关系,则确定所述第一节点与所述第二节点在所述关系网中存在一条边,其中,所述边的权重为所述第一节点与所述第二节点之间的关联度。If there is an association relationship between the first node and the second node, it is determined that an edge exists between the first node and the second node in the relationship network, where the weight of the edge is The degree of association between the first node and the second node. 8.根据权利要求6所述的装置,其特征在于,所述关系网构建单元,还用于:8. The device according to claim 6, wherein the relationship network construction unit is further used for: 若所述关系网中存在无效节点,则将所述无效节点删除,其中,所述无效节点为与所述关系网中其他节点的相关性小于合并阈值且仅作为所述关系网中其他节点的元数据的输入来源,或,所述无效节点为与所述关系网中其他节点的相关性小于合并阈值且所述无效节点仅作为所述关系网中其他节点的元数据的输出来源,或所述无效节点为与所述关系网中其他节点的相关性为零。If there is an invalid node in the relationship network, the invalid node is deleted, wherein the invalid node has a correlation with other nodes in the relationship network that is less than the merging threshold and is only used as the other node in the relationship network. The input source of metadata, or, the invalid node is the correlation with other nodes in the relationship network is less than the merging threshold and the invalid node is only used as the output source of metadata of other nodes in the relationship network, or all The invalid node is zero correlation with other nodes in the relationship network. 9.根据权利要求6所述的装置,其特征在于,所述关系网构建单元,还用于:9. The apparatus according to claim 6, wherein the relationship network construction unit is further used for: 若所述关系网中存在一组节点满足如下关系,则将所述一组节点合并为一个节点;If there is a group of nodes in the relationship network that satisfies the following relationship, the group of nodes is merged into one node; 所述关系为所述一组节点中的节点个数不少于2个,且每两个存在输出和输入关系节点的相关性不小于合并阈值。The relationship is that the number of nodes in the group of nodes is not less than 2, and the correlation between every two nodes with an output and input relationship is not less than the merging threshold. 10.根据权利要求6所述的装置,其特征在于,所述分组单元,还用于:10. The apparatus according to claim 6, wherein the grouping unit is further used for: 将所述子关系网中出现频率最高的描述信息作为所述子关系网的标识信息。The description information with the highest frequency in the sub-relationship network is used as the identification information of the sub-relationship network.
CN201610257438.7A 2016-04-22 2016-04-22 Metadata grouping method and device Active CN107305490B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610257438.7A CN107305490B (en) 2016-04-22 2016-04-22 Metadata grouping method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610257438.7A CN107305490B (en) 2016-04-22 2016-04-22 Metadata grouping method and device

Publications (2)

Publication Number Publication Date
CN107305490A CN107305490A (en) 2017-10-31
CN107305490B true CN107305490B (en) 2020-09-11

Family

ID=60150684

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610257438.7A Active CN107305490B (en) 2016-04-22 2016-04-22 Metadata grouping method and device

Country Status (1)

Country Link
CN (1) CN107305490B (en)

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1298536A1 (en) * 2001-10-01 2003-04-02 Partec AG Distributed file system and method of operating a distributed file system
US7072883B2 (en) * 2001-12-21 2006-07-04 Ut-Battelle Llc System for gathering and summarizing internet information
US7529740B2 (en) * 2006-08-14 2009-05-05 International Business Machines Corporation Method and apparatus for organizing data sources
CN101714142B (en) * 2008-10-06 2012-10-17 易搜比控股公司 Method for merging file clusters
CN101430708A (en) * 2008-11-21 2009-05-13 哈尔滨工业大学深圳研究生院 Blog hierarchy classification tree construction method based on label clustering
JP2011138197A (en) * 2009-12-25 2011-07-14 Sony Corp Information processing apparatus, method of evaluating degree of association, and program
US8804740B2 (en) * 2012-06-15 2014-08-12 Citrix Systems, Inc. Systems and methods for reassembly of packets distributed across a cluster
CN104731809B (en) * 2013-12-23 2018-10-02 阿里巴巴集团控股有限公司 The processing method and processing device of the attribute information of object

Also Published As

Publication number Publication date
CN107305490A (en) 2017-10-31

Similar Documents

Publication Publication Date Title
CN108959370A (en) The community discovery method and device of entity similarity in a kind of knowledge based map
CN112966763A (en) Training method and device for classification model, electronic equipment and storage medium
CN103810260B (en) Complex network community based on topological property finds method
CN103678670A (en) Micro-blog hot word and hot topic mining system and method
CN106156030A (en) The method and apparatus that in social networks, information of forecasting is propagated
CN114511905B (en) A face clustering method based on graph convolutional neural network
CN104731811B (en) A kind of clustering information evolution analysis method towards extensive dynamic short text
CN105701128A (en) Query statement optimization method and apparatus
CN115292303A (en) Data processing method and device
CN108364026A (en) A cluster center update method and device, and a K-means clustering analysis method and device
CN104636814A (en) Method and system for optimizing random forest models
CN106203474A (en) A kind of flow data clustering method dynamically changed based on density value
CN104123393B (en) The sorting technique and system of a kind of short message text
CN114662012A (en) Community query analysis method oriented to gene regulation network
CN104572687B (en) The key user&#39;s recognition methods and device that microblogging is propagated
CN107305490B (en) Metadata grouping method and device
CN104504266A (en) Graph partitioning method based on shortest path and density clustering
CN115935027B (en) Data processing method of target object topological graph and training method of graph classification model
CN104794237A (en) Web page information processing method and device
CN106096117B (en) An Evaluation Method for Critical Edges of Uncertain Graphs Based on Traffic and Reliability
CN106933844B (en) Construction method of reachability query index facing large-scale RDF data
CN112100241A (en) Social network dynamic influence maximization method based on theme
CN106648891A (en) MapReduce model-based task execution method and apparatus
CN109522915B (en) Virus file clustering method and device and readable medium
CN114359610B (en) Entity classification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant