CN110543586B - Multi-user identity fusion method, device, equipment and storage medium - Google Patents
Multi-user identity fusion method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN110543586B CN110543586B CN201910831646.7A CN201910831646A CN110543586B CN 110543586 B CN110543586 B CN 110543586B CN 201910831646 A CN201910831646 A CN 201910831646A CN 110543586 B CN110543586 B CN 110543586B
- Authority
- CN
- China
- Prior art keywords
- identity
- nodes
- user
- connection
- connection relationship
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9032—Query formulation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请公开了一种多重用户身份融合方法、装置、设备及存储介质,涉及大数据技术领域。具体实现方案为:获取用户身份数据,该用户身份数据具有至少两个身份特征;根据用户身份数据具有的至少两个身份特征,构建图谱网络,该图谱网络包括:表征身份特征的节点和表征身份特征的关联关系的连接边;根据该图谱网络中节点之间的连接关系、节点和连接边之间的连接关系,确定同一用户的身份群组,该身份群组包括:多个身份特征。该技术方案通过图谱网络的形式将用户身份数据具有的身份特征关联起来,不仅能够准确的确定出同一用户的多个身份特征对应的身份群组,而且其可以应用于任何场景,避免了使用范围受限的问题。
The application discloses a multi-user identity fusion method, device, equipment and storage medium, and relates to the technical field of big data. The specific implementation plan is: obtain user identity data, the user identity data has at least two identity characteristics; according to the at least two identity characteristics of the user identity data, construct a graph network, the graph network includes: nodes representing identity characteristics and representing identity The connection edge of the associated relationship of the feature; according to the connection relationship between the nodes in the graph network, and the connection relationship between the node and the connection edge, the identity group of the same user is determined, and the identity group includes: multiple identity features. This technical solution associates the identity features of user identity data in the form of graph network, not only can accurately determine the identity group corresponding to multiple identity features of the same user, but also can be applied to any scene, avoiding the scope of use limited problem.
Description
技术领域technical field
本申请涉及计算机技术领域,尤其涉及一种大数据技术中的多重用户身份融合方法、装置、设备及存储介质。The present application relates to the field of computer technology, and in particular to a method, device, device and storage medium for the fusion of multiple user identities in big data technology.
背景技术Background technique
在互联网普及的大环境下,将虚拟用户身份(例如,设备ID,网络ID等)和真实用户身份(例如,身份证号、手机号等身份信息、车产、房产等用户资产信息)相关联,可以从不同的表现载体中还原人的完整行为,从而创造巨大的产品商业价值。In the general environment of Internet popularity, associate virtual user identities (such as device IDs, network IDs, etc.) , can restore the complete human behavior from different expression carriers, thus creating huge product commercial value.
现有技术中,多重身份融合的方案主要可以基于预设的规则,将满足同一规则的多个不同用户身份确定为属于同一用户,并将该用户的多个用户身份进行融合,使其相互关联。In the prior art, the solution for the fusion of multiple identities can be mainly based on preset rules, determine multiple different user identities that meet the same rule as belonging to the same user, and fuse the multiple user identities of the user to make them related to each other .
然而,虽然这种融合方法归属判断的准确率高,但是由于使用的规则是人为设定的,无法应用于复杂的场景,使用范围受限。However, although this fusion method has a high accuracy rate of attribution judgment, it cannot be applied to complex scenarios because the rules used are artificially set, and the scope of use is limited.
发明内容Contents of the invention
本申请实施例提供一种多重用户身份融合方法、装置、设备及存储介质,用于解决现有融合方法无法应用于复杂的场景,使用范围受限的问题。Embodiments of the present application provide a method, device, device, and storage medium for merging multiple user identities, which are used to solve the problem that the existing merging methods cannot be applied to complex scenarios and the scope of use is limited.
第一方面,本申请提供一种多重用户身份融合的方法,包括:In the first aspect, the present application provides a method for merging multiple user identities, including:
获取用户身份数据,所述用户身份数据具有至少两个身份特征;Obtaining user identity data, the user identity data having at least two identity characteristics;
根据所述用户身份数据具有的至少两个身份特征,构建图谱网络,所述图谱网络包括:表征身份特征的节点和表征身份特征的关联关系的连接边;Constructing a graph network according to at least two identity features of the user identity data, the graph network including: nodes representing identity features and connection edges representing association relationships of identity features;
根据所述图谱网络中节点之间的连接关系、节点和连接边之间的连接关系,确定同一用户的身份群组,所述身份群组包括:多个身份特征。According to the connection relationship between nodes in the graph network and the connection relationship between nodes and connection edges, an identity group of the same user is determined, and the identity group includes: a plurality of identity features.
在本实施例中,通过图谱网络的形式将用户身份数据具有的身份特征关联起来,不仅能够准确的确定出同一用户的多个身份特征对应的身份群组,而且其可以应用于任何场景,避免了使用范围受限的问题。In this embodiment, by associating the identity features of user identity data in the form of a graph network, not only can the identity group corresponding to multiple identity features of the same user be accurately determined, but also it can be applied to any scene, avoiding problem of limited scope of use.
在第一方面的一种可能设计中,所述获取用户身份数据,包括:In a possible design of the first aspect, the acquiring user identity data includes:
获取预设的配置信息,所述配置信息包括:数据源类型、数据源路径、提取方式和提取周期;Obtain preset configuration information, the configuration information including: data source type, data source path, extraction method and extraction cycle;
根据所述数据源路径、所述提取方式和所述提取周期,从所述数据源类型对应数据源中提取所述用户身份数据。Extract the user identity data from the data source corresponding to the data source type according to the data source path, the extraction method and the extraction cycle.
在本实施例中,用户数据提取是基于预设的配置信息中的各信息依赖关系实现的,能够确保数据提取任务可以稳定有序的执行。In this embodiment, user data extraction is implemented based on information dependencies in preset configuration information, which can ensure that the data extraction task can be executed in a stable and orderly manner.
可选的,所述配置信息还包括:字段映射关系;Optionally, the configuration information further includes: a field mapping relationship;
所述方法还包括:The method also includes:
根据所述字段映射关系,依次对获取到的所述用户身份数据进行解析,提取所述用户身份数据具有的至少两个身份特征;According to the field mapping relationship, sequentially analyze the acquired user identity data, and extract at least two identity features of the user identity data;
在第一方面的另一种可能设计中,所述根据所述用户身份数据具有的至少两个身份特征,构建图谱网络,包括:In another possible design of the first aspect, the constructing a graph network according to at least two identity features of the user identity data includes:
以所述用户身份数据中的每个身份特征作为图谱网络的节点,以所述用户身份数据中的每两个身份特征的关联关系作为图谱网络的连接边,构建所述图谱网络,所述图谱网络中每个节点和每条连接边分别具有属性信息。Each identity feature in the user identity data is used as a node of the graph network, and the association relationship between every two identity features in the user identity data is used as a connection edge of the graph network to construct the graph network, the graph Each node and each connection edge in the network has attribute information respectively.
通过上述的方案可知,针对不同的数据可能来源不同系统的问题,通过预设的配置信息实现了用户身份数据的提取、用户身份数据中身份特征的识别和提取,并且基于提取的身份特征实现了图谱网络的构建,自动化程度高,成本低。Through the above scheme, it can be seen that for the problem that different data may come from different systems, the extraction of user identity data, the identification and extraction of identity features in user identity data are realized through preset configuration information, and based on the extracted identity features Realized The construction of the graph network has a high degree of automation and low cost.
在第一方面的再一种可能设计中,所述根据所述图谱网络中节点之间的连接关系、节点和连接边之间的连接关系,确定同一用户的身份群组,包括:In yet another possible design of the first aspect, the determining the identity group of the same user according to the connection relationship between nodes in the graph network and the connection relationship between nodes and connection edges includes:
根据所述图谱网络中节点之间的连接关系、节点和连接边之间的连接关系,确定所述图谱网络中相邻节点间的连接次数;determining the number of connections between adjacent nodes in the graph network according to the connection relationship between nodes in the graph network and the connection relationship between nodes and connection edges;
基于所述图谱网络中相邻节点间的连接次数和预设的次数阈值,确定出第一连接关系和第二连接关系,所述第一连接关系为相邻节点间的连接次数大于所述次数阈值的连接关系,所述第二连接关系为相邻节点间的连接次数小于或等于所述次数阈值的连接关系;Based on the number of connections between adjacent nodes in the graph network and a preset number of thresholds, a first connection relationship and a second connection relationship are determined, and the first connection relationship is that the number of connections between adjacent nodes is greater than the number of times A threshold connection relationship, the second connection relationship is a connection relationship in which the number of connections between adjacent nodes is less than or equal to the number of times threshold;
根据所述第一连接关系、所述第二连接关系,以目标节点为起点,依次向外遍历所述图谱网络的节点,确定出所述目标节点对应用户的身份群组。According to the first connection relationship and the second connection relationship, starting from the target node, traversing the nodes of the graph network outward in sequence, and determining the identity group of the user corresponding to the target node.
在本实施例中,通过基于确定的连接关系确定目标节点对应用户的身份群组,得到的结果准确率高。In this embodiment, by determining the identity group of the user corresponding to the target node based on the determined connection relationship, the accuracy of the obtained result is high.
在第一方面的又一种可能设计中,所述根据所述图谱网络中节点之间的连接关系、节点和连接边之间的连接关系,确定同一用户的身份群组,包括:In yet another possible design of the first aspect, the determining the identity group of the same user according to the connection relationship between nodes in the graph network and the connection relationship between nodes and connection edges includes:
根据所述图谱网络中节点之间的连接关系、节点和连接边之间的连接关系以及各节点具有的属性信息,确定出节点间的关联关系;Determine the association relationship between nodes according to the connection relationship between nodes in the graph network, the connection relationship between nodes and connection edges, and the attribute information of each node;
基于所述节点间的关联关系,对所述图谱网络中的节点进行聚合,确定同一用户的身份群组。Based on the association relationship between the nodes, the nodes in the graph network are aggregated to determine the identity group of the same user.
在本实施例中,通过基于节点间关联关系且通过聚合的方法,得到的同一用户的身份群组,可以保证较高的融合率。In this embodiment, the identity group of the same user is obtained based on the association relationship between nodes and through an aggregation method, which can ensure a high fusion rate.
在第一方面的又一种可能设计中,所述方法还包括:In yet another possible design of the first aspect, the method further includes:
根据同一用户的身份群组,确定所述身份群组中与目标身份特征具有关联关系的多个用户身份特征,所述目标身份特征为所述身份群组包括的用户身份特征中的任意一个;According to the identity group of the same user, determine a plurality of user identity features associated with the target identity feature in the identity group, and the target identity feature is any one of the user identity features included in the identity group;
向所述多个用户身份特征中的至少一个身份特征推送消息。Pushing a message to at least one identity feature among the plurality of user identity features.
在本实施例中,通过确定身份群组中与目标身份特征具有关联关系的多个用户身份特征,再有针对性的向用户推送消息,提高了产品商业价值。In this embodiment, the commercial value of the product is improved by determining multiple user identity features in the identity group that are associated with the target identity feature, and then pushing messages to the users in a targeted manner.
可选的,所述根据同一用户的身份群组,确定所述身份群组中与目标身份特征具有关联关系的多个用户身份特征,包括:Optionally, the determining multiple user identity features associated with the target identity feature in the identity group according to the identity group of the same user includes:
对同一用户的身份群组中的节点进行检索、遍历和筛选处理,确定所述身份群组中与目标身份特征具有关联关系的多个用户身份特征。Retrieve, traverse and filter nodes in the identity group of the same user, and determine multiple user identity features associated with the target identity feature in the identity group.
在本实施例中,通过上述节点身份检索、节点身份的广度遍历和节点身份特征的筛选等过程可快速推导获得与目标身份特征对应顶点有连通关系的顶点。In this embodiment, through the above-mentioned processes of node identity retrieval, node identity breadth traversal, and node identity feature screening, vertices that are connected to vertices corresponding to target identity features can be quickly derived.
在第一方面的又一种可能设计中,所述方法还包括:In yet another possible design of the first aspect, the method further includes:
以图数据库的形式,存储所述图谱网络中节点与连接边的对应关系;In the form of a graph database, storing the corresponding relationship between nodes and connection edges in the graph network;
所述图数据库包括:点存储、连接边存储和属性存储;The graph database includes: point storage, connection edge storage and attribute storage;
所述点存储包括:节点主键、节点拥有的属性信息和节点连接的连接边;The point storage includes: node primary key, attribute information owned by the node and connection edges connected by the node;
所述连接边存储包括:连接边主键、连接边所连接的起始点和终止点,以及连接边所携带的属性信息;The storage of the connection edge includes: the primary key of the connection edge, the starting point and the termination point connected by the connection edge, and the attribute information carried by the connection edge;
所述属性存储包括:属性主键、属性所表示的含义,以及属性表示的具体内容。The attribute storage includes: the attribute primary key, the meaning represented by the attribute, and the specific content represented by the attribute.
在本实施例中,基于用户身份数据具有的身份特征之间的关联关系及身份特征共同出现的记录刻画成图数据库的顶点和连接边,使得用户身份数据可以通过图论算法解决数据融合问题。In this embodiment, the vertices and connection edges of the graph database are described based on the association relationship between the identity features of the user identity data and the co-occurrence records of the identity features, so that the user identity data can solve the data fusion problem through the graph theory algorithm.
可选的,所述节点主键、所述连接边主键、所述属性主键均采用索引的方式存储。Optionally, the node primary key, the connection edge primary key, and the attribute primary key are all stored in an index manner.
在本实施例中,通过对节点主键、连接边主键和属性主键构建索引,可以提高检索和数据管理的便利性。In this embodiment, the convenience of retrieval and data management can be improved by constructing indexes on node primary keys, connection edge primary keys and attribute primary keys.
第二方面,本申请提供一种多重用户身份融合的装置,包括:In the second aspect, the present application provides a device for merging multiple user identities, including:
获取模块,用于获取用户身份数据,所述用户身份数据具有至少两个身份特征;An acquisition module, configured to acquire user identity data, where the user identity data has at least two identity characteristics;
处理模块,用于根据所述用户身份数据具有的至少两个身份特征,构建图谱网络,所述图谱网络包括:表征身份特征的节点和表征身份特征的关联关系的连接边;A processing module, configured to construct a graph network according to at least two identity features of the user identity data, the graph network including: nodes representing identity features and connection edges representing association relationships of identity features;
确定模块,用于根据所述图谱网络中节点之间的连接关系、节点和连接边之间的连接关系,确定同一用户的身份群组,所述身份群组包括:多个身份特征。The determination module is configured to determine the identity group of the same user according to the connection relationship between nodes in the graph network and the connection relationship between nodes and connection edges, and the identity group includes: a plurality of identity characteristics.
在第二方面的一种可能设计中,所述获取模块,具体用于获取预设的配置信息,所述配置信息包括:数据源类型、数据源路径、提取方式和提取周期;根据所述数据源路径、所述提取方式和所述提取周期,从所述数据源类型对应数据源中提取所述用户身份数据。In a possible design of the second aspect, the acquiring module is specifically configured to acquire preset configuration information, the configuration information including: data source type, data source path, extraction method, and extraction cycle; according to the data The source path, the extraction method, and the extraction period extract the user identity data from the data source corresponding to the data source type.
可选的,所述配置信息还包括:字段映射关系;Optionally, the configuration information further includes: a field mapping relationship;
相应的,所述处理模块,还用于根据所述字段映射关系,依次对获取到的所述用户身份数据进行解析,提取所述用户身份数据具有的至少两个身份特征。Correspondingly, the processing module is further configured to sequentially analyze the acquired user identity data according to the field mapping relationship, and extract at least two identity features of the user identity data.
在第二方面的另一种可能设计中,所述处理模块,具体用于以所述用户身份数据中的每个身份特征作为图谱网络的节点,以所述用户身份数据中的每两个身份特征的关联关系作为图谱网络的连接边,构建所述图谱网络,所述图谱网络中每个节点和每条连接边分别具有属性信息。In another possible design of the second aspect, the processing module is specifically configured to use each identity feature in the user identity data as a node of the graph network, and use every two identities in the user identity data The association relationship of features is used as the connection edge of the graph network to construct the graph network, and each node and each connection edge in the graph network has attribute information respectively.
在第二方面的再一种可能设计中,所述确定模块,具体用于根据所述图谱网络中节点之间的连接关系、节点和连接边之间的连接关系,确定所述图谱网络中相邻节点间的连接次数,基于所述图谱网络中相邻节点间的连接次数和预设的次数阈值,确定出第一连接关系和第二连接关系,所述第一连接关系为相邻节点间的连接次数大于所述次数阈值的连接关系,所述第二连接关系为相邻节点间的连接次数小于或等于所述次数阈值的连接关系;根据所述第一连接关系、所述第二连接关系,以目标节点为起点,依次向外遍历所述图谱网络的节点,确定出所述目标节点对应用户的身份群组。In yet another possible design of the second aspect, the determining module is specifically configured to determine the relative The number of connections between adjacent nodes, based on the number of connections between adjacent nodes in the graph network and the preset number of thresholds, determine the first connection relationship and the second connection relationship, the first connection relationship is between adjacent nodes The number of connection times is greater than the number of times threshold, and the second connection relationship is a connection relationship whose connection times between adjacent nodes is less than or equal to the number of times threshold; according to the first connection relationship, the second connection relationship, starting from the target node, traversing the nodes of the graph network outward in sequence, and determining the identity group of the user corresponding to the target node.
在第二方面的又一种可能设计中,所述确定模块,具体用于根据所述图谱网络中节点之间的连接关系、节点和连接边之间的连接关系以及各节点具有的属性信息,确定出节点间的关联关系;基于所述节点间的关联关系,对所述图谱网络中的节点进行聚合,确定同一用户的身份群组。In yet another possible design of the second aspect, the determination module is specifically configured to, according to the connection relationship between nodes in the graph network, the connection relationship between nodes and connection edges, and the attribute information of each node, Determine the association relationship between the nodes; based on the association relationship between the nodes, aggregate the nodes in the graph network to determine the identity group of the same user.
在第二方面的又一种可能设计中,所述确定模块,还用于根据同一用户的身份群组,确定所述身份群组中与目标身份特征具有关联关系的多个用户身份特征,所述目标身份特征为所述身份群组包括的用户身份特征中的任意一个;In yet another possible design of the second aspect, the determination module is further configured to determine, according to the identity group of the same user, a plurality of user identity features associated with the target identity feature in the identity group, so The target identity feature is any one of the user identity features included in the identity group;
所述装置还包括:推送模块;The device also includes: a push module;
所述推送模块,用于向所述多个用户身份特征中的至少一个身份特征推送消息。The push module is configured to push a message to at least one identity feature among the plurality of user identity features.
可选的,所述确定模块,还具体用于对同一用户的身份群组中的节点进行检索、遍历和筛选处理,确定所述身份群组中与目标身份特征具有关联关系的多个用户身份特征。Optionally, the determination module is further specifically configured to perform retrieval, traversal and screening on nodes in the identity group of the same user, and determine multiple user identities in the identity group that have an association relationship with the target identity feature feature.
在第二方面的又一种可能设计中,所述处理模块,还用于以图数据库的形式,存储所述图谱网络中节点与连接边的对应关系;In yet another possible design of the second aspect, the processing module is further configured to store the correspondence between nodes and connection edges in the graph network in the form of a graph database;
所述图数据库包括:点存储、连接边存储和属性存储;The graph database includes: point storage, connection edge storage and attribute storage;
所述点存储包括:节点主键、节点拥有的属性信息和节点连接的连接边;The point storage includes: node primary key, attribute information owned by the node and connection edges connected by the node;
所述连接边存储包括:连接边主键、连接边所连接的起始点和终止点,以及连接边所携带的属性信息;The storage of the connection edge includes: the primary key of the connection edge, the starting point and the termination point connected by the connection edge, and the attribute information carried by the connection edge;
所述属性存储包括:属性主键、属性所表示的含义,以及属性表示的具体内容。The attribute storage includes: the attribute primary key, the meaning represented by the attribute, and the specific content represented by the attribute.
可选的,所述节点主键、所述连接边主键、所述属性主键均采用索引的方式存储。Optionally, the node primary key, the connection edge primary key, and the attribute primary key are all stored in an index manner.
本申请第二方面提供的装置,可用于执行第一方面提供的方法,其实现原理和技术效果类似,在此不再赘述。The device provided in the second aspect of the present application can be used to execute the method provided in the first aspect, and its implementation principle and technical effect are similar, and will not be repeated here.
第三方面,本申请提供一种电子设备,包括:In a third aspect, the present application provides an electronic device, including:
至少一个处理器;以及at least one processor; and
与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行第一方面以及第一方面各可能设计所述的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the first aspect and all possible designs of the first aspect. described method.
第四方面,本申请提供一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使所述计算机执行第一方面以及第一方面各可能设计所述的方法。In a fourth aspect, the present application provides a non-transitory computer-readable storage medium storing computer instructions, the computer instructions are used to enable the computer to execute the method described in the first aspect and possible designs of the first aspect.
第五方面,本申请提供一种多重用户身份融合的方法,包括:In the fifth aspect, the present application provides a method for merging multiple user identities, including:
根据用户身份数据具有的至少两个身份特征,确定所述至少两个身份特征的关联关系;According to at least two identity characteristics of the user identity data, determine the association relationship of the at least two identity characteristics;
根据所述至少两个身份特征的关联关系,确定同一用户的身份群组。According to the association relationship of the at least two identity features, the identity group of the same user is determined.
上述申请中的一个实施例具有如下优点或有益效果:能够准确的确定出同一用户的多个身份特征对应的身份群组,而且其可以应用于任何场景,使用范围广泛。因为采用通过图谱网络的形式将用户身份数据具有的身份特征关联起来的技术手段,所以克服了现有融合方法无法应用于复杂的场景,使用范围受限的技术问题,进而达到可以应用于任何场景,使用范围广泛的技术效果。An embodiment in the above application has the following advantages or beneficial effects: the identity group corresponding to multiple identity features of the same user can be accurately determined, and it can be applied to any scene and has a wide range of applications. Because of the technical means of associating the identity characteristics of user identity data in the form of a graph network, it overcomes the technical problems that the existing fusion methods cannot be applied to complex scenarios and the scope of use is limited, and can be applied to any scenario. , using a wide range of technical effects.
上述可选方式所具有的其他效果将在下文中结合具体实施例加以说明。Other effects of the above optional manner will be described below in conjunction with specific embodiments.
附图说明Description of drawings
附图用于更好地理解本方案,不构成对本申请的限定。其中:The accompanying drawings are used to better understand the solution, and do not constitute a limitation to the application. in:
图1是根据本申请第一实施例提供的多重用户身份融合的方法的流程示意图;FIG. 1 is a schematic flowchart of a method for merging multiple user identities according to a first embodiment of the present application;
图2为本申请实施例中点存储对应的K-V形式的示意图;Fig. 2 is a schematic diagram of the K-V form corresponding to point storage in the embodiment of the present application;
图3为本申请实施例中连接边存储对应的K-V形式的示意图;Fig. 3 is a schematic diagram of the K-V format corresponding to the connection edge storage in the embodiment of the present application;
图4为本申请实施例中属性存储对应的K-V形式的示意图;Fig. 4 is a schematic diagram of the K-V format corresponding to attribute storage in the embodiment of the present application;
图5为同一人的身份群组的关联示意图;Fig. 5 is a schematic diagram of association of identity groups of the same person;
图6是根据本申请第二实施例提供的多重用户身份融合的方法的流程示意图;FIG. 6 is a schematic flowchart of a method for merging multiple user identities according to a second embodiment of the present application;
图7是根据本申请第三实施例提供的多重用户身份融合的方法的流程示意图;FIG. 7 is a schematic flowchart of a method for merging multiple user identities according to a third embodiment of the present application;
图8是根据本申请第四实施例提供的多重用户身份融合的方法的流程示意图;FIG. 8 is a schematic flowchart of a method for merging multiple user identities according to a fourth embodiment of the present application;
图9是根据本申请第五实施例提供的多重用户身份融合的方法的流程示意图;FIG. 9 is a schematic flowchart of a method for merging multiple user identities according to a fifth embodiment of the present application;
图10为本申请实施例提供的多重身份融合的装置的结构示意图;FIG. 10 is a schematic structural diagram of a multiple identity fusion device provided by an embodiment of the present application;
图11是用来实现本申请实施例的多重用户身份融合的方法的电子设备的框图。Fig. 11 is a block diagram of an electronic device used to implement the method for merging multiple user identities according to the embodiment of the present application.
具体实施方式Detailed ways
以下结合附图对本申请的示范性实施例做出说明,其中包括本申请实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本申请的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
在互联网普及的大环境下,每天都会产生大量的用户上网信息。通常情况下,用户上网信息中均包含用户身份的信息,该用户身份既可以通过虚拟用户身份表示,也可以通过真实身份表示。示例性的,虚拟用户身份包括:互联网上各类身份标识号(identitydocument,ID)(包括设备ID,如国际移动设备识别码(international mobile equipmentidentity,IMEI)、广告标识符(identifier for advertising,IDFA)等;网络ID,如互联网协议(internet protocol,IP)地址、接入点(access point,AP)、服务集标识(service setidentifier,SSID)等);真实用户身份包括:身份信息(如身份证号、手机号)以及用户各类资产信息(如车、房)。In the general environment of Internet popularization, a large amount of user online information is generated every day. Usually, the user's online information includes the information of the user's identity, and the user's identity can be represented by a virtual user's identity or a real identity. Exemplary, the virtual user identity includes: various types of identification numbers (identity document, ID) on the Internet (including device ID, such as international mobile equipment identity (international mobile equipment identity, IMEI), advertising identifier (identifier for advertising, IDFA) etc.; network ID, such as Internet protocol (internet protocol, IP) address, access point (access point, AP), service set identifier (service setidentifier, SSID) etc.); real user identity includes: identity information (such as ID number , mobile phone number) and various asset information of users (such as cars and houses).
在实际应用中,通过将用户虚拟身份和用户真实身份相关联,可以从不同的表现载体(车、房、手机设备码等)中还原人的完整行为,从而创造巨大的产品商业价值。例如,基于各类打通的ID可以刻画完整的用户画像;在不同设备上对用户进行精准推荐;针对用户的不同到店行为进行环境触达,投放精准广告;同时,还可以进行企业间的优势互补,如打通某个搜索网站和电商网站的身份,在电商网站上为用户推荐在搜索网站上搜索过的商品,从而有针对性的将该用户可能感兴趣的商品推荐给用户。In practical applications, by associating the user's virtual identity with the user's real identity, the complete human behavior can be restored from different manifestation carriers (cars, houses, mobile device codes, etc.), thereby creating huge product commercial value. For example, based on various connected IDs, it is possible to portray a complete user portrait; to make accurate recommendations for users on different devices; to reach out to the environment and place precise advertisements for different user behaviors in stores; at the same time, it can also carry out inter-enterprise advantages Complementary, such as opening up the identities of a search website and an e-commerce website, recommending products that users have searched on the search website on the e-commerce website, so as to recommend products that the user may be interested in to the user in a targeted manner.
然而,随着用户上网信息的不断积累,需要打通的数据急剧膨胀,由此需要解决如下挑战性的问题:However, with the continuous accumulation of users' online information, the data that needs to be opened up is expanding rapidly, so the following challenging problems need to be solved:
从计算存储上讲:需要解决PB级别的数据处理问题,其中,PB指petabyte,它是较高级的存储单位,1PB=1024TB。From the perspective of computing storage: it is necessary to solve the problem of data processing at the PB level, where PB refers to petabyte, which is a higher-level storage unit, and 1PB=1024TB.
从用户规模上讲:针对百亿级别的用户ID,需要进行两两是否同一人的判定,大约相当于10的20次方相似度计算,其就目前大型互联网企业的计算资源能力是无法处理的,需要优化。In terms of user scale: for tens of billions of user IDs, it is necessary to determine whether two pairs are the same person, which is approximately equivalent to 10 to the 20th power similarity calculation, which cannot be handled by the current computing resource capabilities of large Internet companies , needs to be optimized.
从市场场景上讲:越来越多的企业,希望将互联网设备的ID与企业自身数据相融合,增强自身产品的生命力。因此需要考虑在企业有限资源的情况下如何将虚实数据的融合进行私有化,迁移到企业内部。From the perspective of the market scenario: more and more companies hope to integrate the ID of Internet devices with their own data to enhance the vitality of their products. Therefore, it is necessary to consider how to privatize the fusion of virtual and real data and migrate it to the enterprise when the enterprise has limited resources.
从技术远景上讲:随着人工智能(artificial intelligence,AI)的发展,势必产生新类型的用户ID,如人脸ID、声纹ID、指纹ID,因而,需要考虑现有的虚实数据的融合方法如何自适应的适配各种新类型用户ID。From the perspective of technology: With the development of artificial intelligence (AI), new types of user IDs will inevitably emerge, such as face IDs, voiceprint IDs, and fingerprint IDs. Therefore, it is necessary to consider the fusion of existing virtual and real data. How to adaptively adapt to various new types of user IDs.
由背景技术的介绍可知,现有技术的多重身份融合方案是基于预设的规则实现的,无法应用于复杂的场景,使用范围受限,针对该问题,本申请实施例提供了一种多重用户身份融合方法,主要是利用图论技术进行用户虚实身份的融合,将用户的每种身份特征看成一个大规模图谱中的节点,通过图论算法,判定两两节点之间是否存在一条强相关的连接边,若存在则视为归属于同一用户,以达到身份融合的目的。该技术方案涉及到数据提取、数据存储、数据融合计算、关联检索等方面,支持私有化部署,方便小企业进行数据融合。It can be seen from the introduction of the background technology that the multiple identity fusion scheme in the prior art is realized based on preset rules, which cannot be applied to complex scenarios, and the scope of use is limited. To solve this problem, the embodiment of the present application provides a multi-user The identity fusion method mainly uses graph theory technology to fuse the user's virtual and real identities, regards each identity feature of the user as a node in a large-scale graph, and uses graph theory algorithms to determine whether there is a strong correlation between two nodes The connection edge of , if it exists, is regarded as belonging to the same user, so as to achieve the purpose of identity fusion. The technical solution involves data extraction, data storage, data fusion computing, and associated retrieval. It supports privatization deployment and facilitates data fusion for small businesses.
可以理解的是,本申请实施例的执行主体可以是电子设备,例如,计算机、平板电脑等终端设备,也可以是服务器,例如,后台的处理平台等。因而,本实施例以终端设备和服务器统称为电子设备进行解释说明,关于该电子设备具体为终端设备,还是服务器,其可以实际情况确定。It can be understood that, the execution subject of the embodiment of the present application may be an electronic device, for example, a terminal device such as a computer or a tablet computer, or a server, for example, a background processing platform and the like. Therefore, in this embodiment, a terminal device and a server are collectively referred to as an electronic device for explanation, and whether the electronic device is specifically a terminal device or a server can be determined according to actual conditions.
下面,通过具体实施例对本申请的技术方案进行详细说明。需要说明的是,下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。Below, the technical solution of the present application will be described in detail through specific embodiments. It should be noted that the following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments.
图1是根据本申请第一实施例提供的多重用户身份融合的方法的流程示意图。如图1所示,该方法可以包括如下步骤:Fig. 1 is a schematic flowchart of a method for merging multiple user identities according to the first embodiment of the present application. As shown in Figure 1, the method may include the following steps:
S101、获取用户身份数据,该用户身份数据具有至少两个身份特征。S101. Acquire user identity data, where the user identity data has at least two identity characteristics.
在实际应用中,一般用来进行融合的数据主要分为两大类,包括互联网虚拟数据(virtual network data,VND)和现实数据(real data,RD)。互联网虚拟数据涉及用户的上网浏览数据、搜索数据、地图定位数据、消费数据(例如,线上购物数据)、卡口摄像数据等;现实数据涉及社会数据、政务数据、企业数据、消费数据(例如,线下购物数据)、企业内部数据(例如,银行数据)等。In practical applications, data generally used for fusion is mainly divided into two categories, including Internet virtual data (virtual network data, VND) and real data (real data, RD). Internet virtual data involves users' online browsing data, search data, map positioning data, consumption data (for example, online shopping data), bayonet camera data, etc.; real data involves social data, government affairs data, enterprise data, consumption data (such as , offline shopping data), enterprise internal data (for example, bank data), etc.
用户身份数据具有的身份特征可以包括:虚拟身份特征和现实身份特征,其中,虚拟身份(virtual identity,VI)特征特指用户在互联网上一系列活动的身份特征,包括了用户的各类设备信息、账号信息等。从数据类型来看,现实身份(real identity,RI)特征包括公安登记的地址信息、籍贯、住址、户籍等,还可以包括房产、车产、债务、银行账户等政府或者职能部门信息和数据;该现实身份特征也可根据具体企业使用场景,定义为企业内部的身份ID。The identity features of user identity data may include: virtual identity features and real identity features, among which virtual identity (virtual identity, VI) features specifically refer to the identity features of a series of activities of the user on the Internet, including various types of equipment information of the user , account information, etc. From the perspective of data types, real identity (RI) features include address information registered by the public security, domicile, address, household registration, etc., and can also include government or functional department information and data such as real estate, car properties, debts, and bank accounts; The actual identity feature can also be defined as the internal identity ID of the enterprise according to the specific enterprise usage scenarios.
可以理解的是,本申请实施例并不对用户身份数据具有的身份特征进行限定,也不对虚拟身份特征和现实身份特征的具体表现形式进行限定,其可以根据实际情况确定,此处不再赘述。It can be understood that the embodiment of the present application does not limit the identity features of user identity data, nor does it limit the specific manifestations of virtual identity features and real identity features, which can be determined according to actual conditions, and will not be repeated here.
在本申请的实施例中,该步骤目的是获取进行多重用户身份融合的用户身份数据,该用户身份数据需要具有至少两个身份特征,在实际应用中,其主要从各类虚拟数据和真实数据中提取出。In the embodiment of the present application, the purpose of this step is to obtain user identity data for fusion of multiple user identities. The user identity data needs to have at least two identity characteristics. extracted from.
S102、根据上述用户身份数据具有的至少两个身份特征,构建图谱网络。S102. Construct a graph network according to at least two identity features of the above-mentioned user identity data.
其中,该图谱网络包括:表征身份特征的节点和表征身份特征的关联关系的连接边。Wherein, the graph network includes: nodes representing identity features and connection edges representing association relationships of identity features.
在实际应用中,通常的数据是关系型的表单数据,通过邻接表的方式刻画图谱结构,因而,一般需要创建点表和边表。但是,由于图谱结构复杂,尤其当节点规模过亿之后,需要利用分布式存储以保证系统的可扩展性。由于选用一般的关系型数据库,数据分布式存储之后,数据间的通信问题会变得比较复杂,增加设计成本。因此,本申请实施例中利用图数据库来存储表征身份特征的节点和表征身份特征的关联关系的连接边,由于图形数据库是一种非关系型数据库,它应用图形理论存储实体之间的关系信息,减少了点边的组织成本。In practical applications, the usual data is relational form data, and the graph structure is described by an adjacency list. Therefore, it is generally necessary to create a point table and an edge table. However, due to the complex structure of the graph, especially when the node scale exceeds 100 million, it is necessary to use distributed storage to ensure the scalability of the system. Due to the selection of a general relational database, after the data is distributed and stored, the communication problem between the data will become more complicated, which will increase the design cost. Therefore, in the embodiment of the present application, the graph database is used to store the nodes representing the identity features and the connection edges representing the association relationship of the identity features. Since the graph database is a non-relational database, it uses graph theory to store the relationship information between entities , which reduces the organization cost of point edges.
在本实施例中,基于用户身份数据具有的身份特征之间的关联关系及身份特征共同出现的记录刻画成图数据库的顶点和连接边,使得用户身份数据可以通过图论算法解决数据融合问题。In this embodiment, the vertices and connection edges of the graph database are described based on the association relationship between the identity features of the user identity data and the co-occurrence records of the identity features, so that the user identity data can solve the data fusion problem through the graph theory algorithm.
示例性的,在本申请的实施例中,本申请的方法还可以包括如下步骤:Exemplarily, in the embodiment of the present application, the method of the present application may further include the following steps:
以图数据库的形式,存储该图谱网络中节点与连接边的对应关系。In the form of a graph database, the corresponding relationship between nodes and connection edges in the graph network is stored.
其中,该图数据库包括:点存储、连接边存储和属性存储。Among them, the graph database includes: point storage, connection edge storage and attribute storage.
该点存储包括:节点主键、节点拥有的属性信息和节点连接的连接边。The point storage includes: the primary key of the node, the attribute information owned by the node and the connection edge of the node connection.
该连接边存储包括:连接边主键、连接边所连接的起始点和终止点,以及连接边所携带的属性信息。The connection edge storage includes: the primary key of the connection edge, the start point and the end point connected by the connection edge, and the attribute information carried by the connection edge.
该属性存储包括:属性主键、属性所表示的含义,以及属性表示的具体内容。The attribute storage includes: the attribute primary key, the meaning represented by the attribute, and the specific content represented by the attribute.
进一步的,在本申请的实施例中,节点主键、连接边主键、属性主键均采用索引的方式存储。Further, in the embodiment of the present application, the node primary key, the connection edge primary key, and the attribute primary key are all stored in an index manner.
具体的,在本申请的实施例中,图数据库主要涉及点的存储、边的存储、以及属性的存储。可选的,点-边-属性在存储介质内部以键值(Key-Value,K-V)对的形式进行组织、索引和存储。Specifically, in the embodiment of the present application, the graph database mainly involves storage of points, edges, and attributes. Optionally, the vertex-edge-attributes are organized, indexed and stored in the form of key-value (Key-Value, K-V) pairs within the storage medium.
例如,图2为本申请实施例中点存储对应的K-V形式的示意图。图3为本申请实施例中连接边存储对应的K-V形式的示意图。图4为本申请实施例中属性存储对应的K-V形式的示意图。如图2所示,点存储通过K-V形式即键(key)-值(value)表示时,点主键主要由节点主键(vertex_id)、节点所拥有的属性信息以及该节点相连接的边组成。其中,属性信息和连接边均存储所对应的主键,也即,图3所示的属性主键(property_id)和图4所示的边主键(edge_id)。For example, FIG. 2 is a schematic diagram of the K-V format corresponding to the point storage in the embodiment of the present application. Fig. 3 is a schematic diagram of the K-V format corresponding to the connection edge storage in the embodiment of the present application. FIG. 4 is a schematic diagram of the K-V format corresponding to attribute storage in the embodiment of the present application. As shown in Figure 2, when the point storage is represented by the K-V form (key)-value (value), the point primary key is mainly composed of the node primary key (vertex_id), the attribute information owned by the node, and the edges connected to the node. Wherein, both the attribute information and the connection edge store the corresponding primary key, that is, the attribute primary key (property_id) shown in FIG. 3 and the edge primary key (edge_id) shown in FIG. 4 .
值得说明的是,在实际应用中,节点主键实际上为顶点的标识,属性主键实际上为属性的标识,边主键实际上为连接边的标识。It is worth noting that in practical applications, the primary key of a node is actually the identifier of a vertex, the primary key of an attribute is actually the identifier of an attribute, and the primary key of an edge is actually the identifier of a connecting edge.
参照图2所示,比如,对于两个含有身份特征信息的节点,节点主键分别为12345和67894,具体表示某一个身份证和某一个手机号,那么,点12345具有两个属性P-123和P-124,同时,点12345具有两条边E-12323和E-86743;点67894具有两个属性P-376和P-377,同时,点67894具有两条边E-12323和E-86743。Referring to Figure 2, for example, for two nodes containing identity feature information, the primary keys of the nodes are 12345 and 67894 respectively, which specifically represent a certain ID card and a certain mobile phone number, then,
参照图3所示,连接边存储由边的主键(edge_id)、连接边所连接的起始点和终止点以及连接边上所带的属性组成。其中,起止点均为所对应点的主键(vertex_id),属性也即点存储中的属性,值对应有属性主键(property_id)。Referring to FIG. 3 , the connection edge storage is composed of the primary key (edge_id) of the edge, the start point and the end point connected by the connection edge, and the attributes carried on the connection edge. Among them, the start and end points are the primary key (vertex_id) of the corresponding point, and the attribute is the attribute in the point storage, and the value corresponds to the attribute primary key (property_id).
比如,主键为E-12323的边,其起点为12345,终点为67894,且连接边主键E-12323具有两个边属性P-73625和P-5325。主键为E-86743的边,起点为12345,终点为87251,且连接边主键E-86743具有两个边属性P-56342和P-672。For example, the edge whose primary key is E-12323 has a starting point of 12345 and an ending point of 67894, and the connecting edge primary key E-12323 has two edge attributes P-73625 and P-5325. The edge whose primary key is E-86743 has a starting point of 12345 and an ending point of 87251, and the connecting edge primary key E-86743 has two edge attributes P-56342 and P-672.
参照图4所示,属性存储由属性主键(property_id)、当前属性所表示的含义(属性含义:property_key)以及属性表示的具体内容(属性内容:property_value)组成。Referring to FIG. 4 , the property storage is composed of the property primary key (property_id), the meaning represented by the current property (property meaning: property_key), and the specific content represented by the property (property content: property_value).
比如,对于点12345具有的两个属性P-123和P-124,其中,P-123表示特征类型为手机,即,ID-TYPE=phone,P-124表示特征信息为具体的手机号,即,ID-INFO=138****1232。对于点67894具有的两个属性P-376和P-377,其中,P-376表示特征类型为身份证,即,ID-TYPE=idcard,P-124表示特征信息为具体的身份证号,即,ID-INFO=3401**********1273。For example, for the two attributes P-123 and P-124 that
值得说明的是,在本申请的实施例中,为了便于检索和数据管理,每个节点、连接边、属性的主键均构建了索引,以及支持对具体某一类属性构建索引,从而方便通过属性检索节点和/或连接边。It is worth noting that in the embodiment of this application, in order to facilitate retrieval and data management, each node, connection edge, and attribute's primary key are indexed, and support for a specific type of attribute is indexed, so that it is convenient to pass the attribute Retrieve nodes and/or connected edges.
示例性的,在本实施例中,基于上述对图数据库中各数据结构的分析,各类身份ID可以构建一个图谱关系,其中,连接边上的属性包含所连接的两个节点所表示的身份特征的具体连接信息,如连接时间,连接频次等,其可以为后续的计算提供特征表示。Exemplarily, in this embodiment, based on the above-mentioned analysis of each data structure in the graph database, various identity IDs can construct a graph relationship, wherein the attributes on the connecting edge include the identities represented by the two connected nodes The specific connection information of features, such as connection time, connection frequency, etc., can provide feature representation for subsequent calculations.
本实施例中,采用基于K-V的图数据库的形式存储用户身份数据,具体的,针对万亿级规模的数据可以采用分布式K-V存储,针对千万、亿以内规模的数据可以采用内存或单机K-V存储。In this embodiment, the user identity data is stored in the form of a K-V-based graph database. Specifically, distributed K-V storage can be used for trillion-scale data, and memory or stand-alone K-V storage can be used for data within tens of millions. storage.
S103、根据该图谱网络中节点之间的连接关系、节点和连接边之间的连接关系,确定同一用户的身份群组,该身份群组包括:多个身份特征。S103. Determine an identity group of the same user according to the connection relationship between nodes and the connection relationship between nodes and connection edges in the graph network, where the identity group includes: multiple identity features.
在本申请的实施例中,基于根据用户身份数据具有的至少两个身份特征构建得到图谱网络后,可以确定出该图谱网络中节点之间的连接关系、节点和连接边之间的连接关系,因而,基于对节点之间的连接关系、节点和连接边之间的连接关系可以确定出该同一用户的身份群组。In the embodiment of the present application, after the graph network is constructed based on at least two identity features of the user identity data, the connection relationship between nodes in the graph network and the connection relationship between nodes and connection edges can be determined. Therefore, the identity group of the same user can be determined based on the connection relationship between nodes and the connection relationship between nodes and connection edges.
具体的,在本申请的实施例中,基于创建的图谱网络确定同一用户的身份群组的步骤可以理解成数据融合的过程,也即,对图谱网络中的每个身份特征进行融合计算,具体的,根据该身份特征自身的拓扑性质以及自身挂载的节点属性,挖掘与该身份特征有隐含连接关系的其他身份特征,最终形成自然人维度一个身份群组(ID-group)。Specifically, in the embodiment of the present application, the step of determining the identity group of the same user based on the created graph network can be understood as a process of data fusion, that is, performing fusion calculation on each identity feature in the graph network, specifically Yes, according to the topological properties of the identity feature itself and the node attributes it mounts, other identity features that have an implicit connection relationship with the identity feature are mined, and finally an identity group (ID-group) in the dimension of natural persons is formed.
在实际应用中,身份融合计算的方法可以分为规则模式和模型模式。关于规则模式和模型模式的具体实现原理可参见下述实施例图7和图8所示实施例中的记载,此处不再赘述。In practical applications, the methods of identity fusion calculation can be divided into rule mode and model mode. For specific implementation principles of the rule mode and the model mode, reference may be made to the descriptions in the embodiments shown in FIG. 7 and FIG. 8 in the following embodiments, and details are not repeated here.
值得说明的是,在本申请的实施例中,不管是规则模式还是模型模式,都涉及到对图谱网络进行反复遍历的大量计算问题。对于数据规模较小的场景,可以通过缩小需求目标群体的途径进行数据裁剪(如按城市裁剪或每一个类型特征裁剪)得到数据量较小的数据集合,然后进行单机运算,确定同一用户的身份群组。对于数据规模较大的场景,可以借助分布式图计算,如GraphX、GraphLab、Giraph等计算框架等,确定同一用户的身份群组。It is worth noting that, in the embodiment of the present application, no matter it is a rule mode or a model mode, it involves a large number of computational problems of repeatedly traversing the graph network. For scenarios with small data scale, data clipping can be performed by narrowing down the target group of demand (such as clipping by city or by each type of feature) to obtain a data set with a small amount of data, and then perform stand-alone operations to determine the identity of the same user group. For scenarios with large data scale, distributed graph computing, such as computing frameworks such as GraphX, GraphLab, and Giraph, can be used to determine the identity group of the same user.
示例性的,在本申请的实施例中,上述确定的同一用户的身份群组ID-group,可以表达为一个自然人的身份标识集合。例如,图5为同一人的身份群组的关联示意图。参照图5所示,对一个自然人的身份标识分别展开(两两构建连接关系),并将每个身份特征对灌入到上述创建的图谱网络中,可以将身份特征之间的关系连通起来,得到身份特征的强连通图。Exemplarily, in the embodiment of the present application, the identity group ID-group of the same user determined above may be expressed as a set of identity identifiers of a natural person. For example, FIG. 5 is a schematic diagram of association of identity groups of the same person. Referring to Figure 5, the identity of a natural person is expanded separately (two-by-two to build a connection relationship), and each identity feature pair is poured into the graph network created above, which can connect the relationship between identity features. Get a strongly connected graph of identity features.
示例性的,图5所示的示意图中,身份特征的表现形式可以包括:IMEI、AP、IP地址、IDFA、手机号、身份证号、房、车、其他等。Exemplarily, in the schematic diagram shown in FIG. 5 , the representation forms of identity features may include: IMEI, AP, IP address, IDFA, mobile phone number, ID number, house, car, and others.
本申请实施例提供的多重用户身份融合的方法,通过获取用户身份数据,根据该用户身份数据具有的至少两个身份特征,构建图谱网络,该图谱网络包括:表征身份特征的节点和表征身份特征的关联关系的连接边,根据该图谱网络中节点之间的连接关系、节点和连接边之间的连接关系,确定同一用户的身份群组,该身份群组包括:多个身份特征。该技术方案中,通过图谱网络的形式将用户身份数据具有的身份特征关联起来,不仅能够准确的确定出同一用户的多个身份特征对应的身份群组,而且其可以应用于任何场景,避免了使用范围受限的问题。The method of multiple user identity fusion provided by the embodiment of the present application obtains user identity data and constructs a map network according to at least two identity features of the user identity data. The map network includes: nodes representing identity features and identity features According to the connection relationship between nodes in the graph network and the connection relationship between nodes and connection edges, the identity group of the same user is determined, and the identity group includes: multiple identity features. In this technical solution, the identity features of user identity data are associated in the form of graph network, not only can accurately determine the identity group corresponding to multiple identity features of the same user, but also it can be applied to any scene, avoiding the Issues of limited use.
示例性的,在上述实施例的基础上,图6是根据本申请第二实施例提供的多重用户身份融合的方法的流程示意图。如图6所示,在本实施例中,上述S101可以通过如下步骤实现:Exemplarily, on the basis of the above-mentioned embodiments, FIG. 6 is a schematic flowchart of a method for fusing multiple user identities according to the second embodiment of the present application. As shown in Figure 6, in this embodiment, the above S101 can be implemented through the following steps:
S601、获取预设的配置信息,该配置信息包括:数据源类型、数据源路径、提取方式和提取周期。S601. Obtain preset configuration information, where the configuration information includes: data source type, data source path, extraction mode, and extraction cycle.
在本申请的实施例中,由于进行融合的用户身份数据可以是互联网虚拟数据和现实数据,且不同的用户身份数据产生后可以存储在类型不同的存储系统中,例如,HDFS、HIVE、MYSQL、NoSQL等。因为,为了能够从不同的存储系统获取用户身份数据,需要获取预设的配置信息,以根据预设的配置信息从不同的存储系统中获取用户身份数据。In the embodiment of this application, since the user identity data to be fused can be Internet virtual data and real data, and different user identity data can be stored in different types of storage systems after being generated, for example, HDFS, HIVE, MYSQL, NoSQL etc. Because, in order to obtain the user identity data from different storage systems, it is necessary to obtain preset configuration information, so as to obtain the user identity data from different storage systems according to the preset configuration information.
相应的,在本实施例中,上述预设的配置信息可以包括数据源类型(HDFS、HIVE、MYSQL、NoSQL…)、数据源路径(主机:端口(host:port)、hdfspath…)、提取方式、提取周期。其中,数据源类型用于表征存储用户身份数据的系统类型,数据源路径用于表征提取用户身份数据时经过的路线,提取方式用于表征采用什么样的方式进行数据提取,提取周期用于表征多长时间自动执行一次数据提取。该提取周期也可以认为是调度频次(执行周期),用于指示按照天级别、小时级别或是单次执行用户数据提取任务。Correspondingly, in this embodiment, the above preset configuration information may include data source type (HDFS, HIVE, MYSQL, NoSQL...), data source path (host: port (host: port), hdfspath...), extraction method , Extraction cycle. Among them, the data source type is used to represent the type of system that stores user identity data, the data source path is used to represent the route passed when extracting user identity data, the extraction method is used to represent the method used for data extraction, and the extraction cycle is used to represent How often to automate data extraction. The extraction period can also be regarded as a scheduling frequency (execution period), which is used to indicate that the user data extraction task is executed on a daily level, an hourly level, or a single time.
在本实施例中,一条用户身份数据必须至少包含两个身份特征,每个身份特征用一个点表示,且这两点间存在一条弱连接边。例如,若身份特征可以是设备码、手机号、身份号、账号等,那么一条用户身份数据至少包括设备码、手机号、身份号、账号中的至少两个。In this embodiment, a piece of user identity data must contain at least two identity features, each identity feature is represented by a point, and there is a weak link between these two points. For example, if the identity feature can be equipment code, mobile phone number, identity number, account number, etc., then a piece of user identity data includes at least two of the equipment code, mobile phone number, identity number, and account number.
S602、根据该数据源路径、提取方式和提取周期,从该数据源类型对应数据源中提取用户身份数据。S602. Extract user identity data from the data source corresponding to the data source type according to the data source path, extraction method, and extraction period.
在本实施例中,基于预设的配置信息首先确定数据源类型、数据源路径以及提取方式和提取周期,其次按照提取方式选择hadoop/spark/单机,基于数据源路径每间隔提取周期的时长从数据源类型对应的数据源中提取用户身份数据。In this embodiment, based on the preset configuration information, first determine the data source type, data source path, extraction method and extraction cycle, and then select hadoop/spark/stand-alone according to the extraction method, based on the length of each interval extraction cycle of the data source path from Extract user identity data from the data source corresponding to the data source type.
可以理解的是,用户数据提取是基于预设的配置信息中的各信息依赖关系实现的,能够确保数据提取任务可以稳定有序的执行。It can be understood that the user data extraction is implemented based on the information dependencies in the preset configuration information, which can ensure the stable and orderly execution of the data extraction task.
可选的,在本申请的实施例中,若上述配置信息还包括:字段映射关系;那么,该方法还可以包括如下步骤:Optionally, in the embodiment of the present application, if the above configuration information further includes: field mapping relationship; then, the method may also include the following steps:
S603、根据该字段映射关系,依次对获取到的用户身份数据进行解析,提取该用户身份数据具有的至少两个身份特征。S603. According to the field mapping relationship, sequentially analyze the acquired user identity data, and extract at least two identity features of the user identity data.
其中,字段映射关系用于表示获取到的用户身份数据中提取哪些字段(时间上为身份特征)作为图谱网络的节点和连接边,以及标注哪些属性作为点的属性(例如,点属性包括:手机号,设备标识),哪些属性作为边的属性(边属性可以指登录时间等信息)。Among them, the field mapping relationship is used to indicate which fields (identity features in time) are extracted from the obtained user identity data as nodes and connection edges of the graph network, and which attributes are marked as point attributes (for example, point attributes include: mobile phone No., device identification), which attributes are used as edge attributes (edge attributes can refer to information such as login time).
因而,在本实施例中,可以根据字段映射关系对获取到的用户身份数据中的每一行数据进行解析,确定出每一行数据具有的至少两个身份特征,为后续的图谱网络构建提供了实现可能。Therefore, in this embodiment, each line of data in the obtained user identity data can be analyzed according to the field mapping relationship, and at least two identity features of each line of data can be determined, which provides an implementation for subsequent graph network construction possible.
相应的,在上述实施例的基础上,例如,参照图6所示,上述步骤102可以通过如下步骤实现:Correspondingly, on the basis of the foregoing embodiments, for example, as shown in FIG. 6 , the foregoing step 102 may be implemented through the following steps:
S604、以用户身份数据中的每个身份特征作为图谱网络的节点,以该用户身份数据中的每两个身份特征的关联关系作为图谱网络的连接边,构建图谱网络。S604. Construct a graph network by using each identity feature in the user identity data as a node of the graph network, and using an association relationship between every two identity features in the user identity data as a connection edge of the graph network.
其中,该图谱网络中每个节点和每条连接边分别具有属性信息。Wherein, each node and each connection edge in the graph network has attribute information respectively.
在本实施例中,根据字段映射关系确定出用户身份数据具有的至少两个身份特征时,确定出每两个身份特征的关联关系以及每个节点和每条连接边具有的属性信息,因而,通过将用户身份数据每个身份特征作为图谱网络的顶点,将每两个身份特征的关联关系作为图谱网络的连接边,可以得到构建的图谱网络。In this embodiment, when at least two identity features of the user identity data are determined according to the field mapping relationship, the association relationship between each two identity features and the attribute information of each node and each connection edge are determined. Therefore, By using each identity feature of the user identity data as the vertex of the graph network, and using the association relationship between each two identity features as the connection edge of the graph network, the graph network constructed can be obtained.
由上述分析可知,用户身份数据可以组织成有起始点、起始点属性、终止点、终止点属性、连接边、连接边属性的结构数据。该结构数据实际上一个有向图,节点之间的关系可以配置,该结构数据的输出形式可以是json、proto、csv。本申请实施例并不对输出的具体形式进行限定,其可以根据实际情况选定。From the above analysis, it can be seen that user identity data can be organized into structural data with start point, start point attribute, end point, end point attribute, connection edge, and connection edge attribute. The structural data is actually a directed graph, the relationship between nodes can be configured, and the output format of the structural data can be json, proto, csv. The embodiment of the present application does not limit the specific output form, which can be selected according to the actual situation.
进一步的,对基于用户身份数据形成的结构数据进行节点去重、属性合并等操作,以确保新增数据具有的身份特征对应的节点可以插入到图谱网络中,原有数据具有的身份特征对应的节点可以执行相应的更新操作。Further, operations such as node deduplication and attribute merging are performed on the structural data formed based on user identity data to ensure that the nodes corresponding to the identity characteristics of the newly added data can be inserted into the graph network, and the identity characteristics of the original data correspond to Nodes can perform corresponding update operations.
同理,还可以对基于用户身份数据形成的结构数据进行连接边去重、属性合并,也即,将多条数据中具有相同连接关系的边合并,去掉重复的连接边和属性,实现了通过合并形式对用户身份数据中的信息进行重构。最后,可以基于重构后的结构数据,将提取到的顶点、连接边等属性信息更新到图谱网络中。In the same way, it is also possible to deduplicate connection edges and merge attributes for structural data formed based on user identity data, that is, to merge edges with the same connection relationship in multiple pieces of data, and to remove duplicate connection edges and attributes. The merged form reconstructs the information in the user identity data. Finally, based on the reconstructed structural data, the extracted attribute information such as vertices and connection edges can be updated to the graph network.
本申请实施例提供的多重用户身份融合的方法,获取预设的配置信息,该配置信息包括:数据源类型、数据源路径、提取方式和提取周期,根据该数据源路径、提取方式和提取周期,从数据源类型对应数据源中提取用户身份数据,以及在配置信息还包括:字段映射关系时,可以根据该字段映射关系,依次对获取到的用户身份数据进行解析,提取该用户身份数据具有的至少两个身份特征,并以用户身份数据中的每个身份特征作为图谱网络的节点,以每两个身份特征的关联关系作为图谱网络的连接边,构建图谱网络。该技术方案中,针对不同的数据可能来源不同系统的问题,通过预设的配置信息实现了用户身份数据的提取、用户身份数据中身份特征的识别和提取,并且基于提取的身份特征实现了图谱网络的构建,自动化程度高,成本低。The method for merging multiple user identities provided in the embodiment of the present application obtains preset configuration information, which includes: data source type, data source path, extraction method, and extraction cycle, according to the data source path, extraction method, and extraction cycle , extract user identity data from the data source corresponding to the data source type, and when the configuration information also includes: field mapping relationship, the obtained user identity data can be analyzed in turn according to the field mapping relationship, and the user identity data extracted has At least two identity features of user identity data, and each identity feature in the user identity data is used as a node of the graph network, and the association relationship between each two identity features is used as a connection edge of the graph network to construct a graph network. In this technical solution, aiming at the problem that different data may come from different systems, the extraction of user identity data, the identification and extraction of identity features in user identity data are realized through preset configuration information, and the map is realized based on the extracted identity features. The construction of the network has a high degree of automation and low cost.
示例性的,在上述实施例的基础上,图7是根据本申请第三实施例提供的多重用户身份融合的方法的流程示意图。如图7所示,在本实施例中,上述S103可以通过如下步骤实现:Exemplarily, on the basis of the above-mentioned embodiments, FIG. 7 is a schematic flowchart of a method for fusing multiple user identities according to the third embodiment of the present application. As shown in Figure 7, in this embodiment, the above S103 can be implemented through the following steps:
S701、根据该图谱网络中节点之间的连接关系、节点和连接边之间的连接关系,确定该图谱网络中相邻节点间的连接次数。S701. Determine the number of connections between adjacent nodes in the graph network according to the connection relationship between nodes in the graph network and the connection relationship between nodes and connection edges.
在图谱网络中,根据节点之间的连接关系可以确定哪些节点之间具有连通关系,根据节点和连接边之间的连接关系可以确定节点之间连通的基本信息,例如,连通时间,连通次数等,因而,根据节点之间具有连通关系和节点之间连通的基本信息可以确定该图谱网络中相邻节点间的连接次数。In the graph network, which nodes have a connected relationship can be determined according to the connection relationship between nodes, and the basic information of the connection between nodes can be determined according to the connection relationship between nodes and connection edges, such as connection time, connection times, etc. , thus, the number of connections between adjacent nodes in the graph network can be determined according to the connection relationship between nodes and the basic information of connectivity between nodes.
在本实施例中,由于节点之间的连接关系、节点和连接边之间的连接关系也就是节点连接信息,因此,图谱网络中相邻节点间的连接次数还可以解释为根据多个预设时间段内的节点连接信息确定相邻节点间的连接次数。In this embodiment, since the connection relationship between nodes and the connection relationship between nodes and connection edges are also node connection information, the number of connections between adjacent nodes in the graph network can also be interpreted as according to multiple presets The node connection information in the time period determines the number of connections between adjacent nodes.
S702、基于该图谱网络中相邻节点间的连接次数、以及预设的次数阈值,确定出第一连接关系和第二连接关系。S702. Determine a first connection relationship and a second connection relationship based on the number of connections between adjacent nodes in the graph network and a preset number of times threshold.
其中,该第一连接关系为相邻节点间的连接次数大于次数阈值的连接关系,该第二连接关系为相邻节点间的连接次数小于或等于该次数阈值的连接关系。Wherein, the first connection relationship is a connection relationship in which the number of connections between adjacent nodes is greater than a threshold, and the second connection relationship is a connection relationship in which the number of connections between adjacent nodes is less than or equal to the threshold.
本申请实施例主要基于规则模式进行身份融合的计算,具体的,在该种实现方式中,首先确定出该图谱网络中相邻节点间的连接次数、在不同时间段内的节点连接信息以及定义预设的次数阈值,其次根据不同时间段内的节点连接信息和相邻节点间的连接次数,判定相邻节点间的连接次数与次数阈值之间的关系,从而确定出每两个相邻节点属于哪种连接关系。The embodiment of the present application mainly performs the calculation of identity fusion based on the rule mode. Specifically, in this implementation mode, the number of connections between adjacent nodes in the graph network, the node connection information in different time periods, and the definition The preset number of thresholds, and then according to the node connection information in different time periods and the number of connections between adjacent nodes, determine the relationship between the number of connections between adjacent nodes and the number of times threshold, so as to determine the Which connection relationship does it belong to.
在实际应用中,第一连接关系也可以称为强连接关系,第二连接关系也可以称为弱连接关系。In practical applications, the first connection relationship may also be called a strong connection relationship, and the second connection relationship may also be called a weak connection relationship.
S703、根据该第一连接关系、第二连接关系,以目标节点为起点,依次向外遍历图谱网络的节点,确定出该目标节点对应用户的身份群组。S703. According to the first connection relationship and the second connection relationship, starting from the target node, traverse the nodes of the graph network outward in sequence, and determine the identity group of the user corresponding to the target node.
在本实施例中,在确定出相邻节点之间的连接关系后,可以针对图谱网络中的每一个节点,以自身为起点,向周边进行遍历,对于经过的任意节点进行第一连接关系、第二连接关系的判定,从而将所有满足第一连接关系的节点集合对应的身份特征作为一个身份群组。同理,对于目标节点,通过遍历全部的身份群组,根据某种策略(例如,求最小连通子图),可以确定出该目标节点对应用户的身份群组。In this embodiment, after the connection relationship between adjacent nodes is determined, each node in the graph network can be used as a starting point to traverse to the surrounding area, and the first connection relationship, For the determination of the second connection relationship, the identity features corresponding to all the node sets satisfying the first connection relationship are taken as an identity group. Similarly, for the target node, by traversing all the identity groups, according to a certain strategy (for example, finding the smallest connected subgraph), the identity group of the user corresponding to the target node can be determined.
可以理解的是,在本申请的实施例中,节点遍历的深度(出度)可根据实际情况确定,本实施例中不对其进行限定。通常情况下,遍历的深度一般为3度。It can be understood that, in the embodiment of the present application, the depth (out-degree) of node traversal can be determined according to actual conditions, which is not limited in this embodiment. Typically, the depth of traversal is generally 3 degrees.
由上述分析可知,在本实施例中,基于图谱网络中相邻节点间的连接次数和预设的次数阈值,确定出第一连接关系和第二连接关系,该第一连接关系为相邻节点间的连接次数大于次数阈值的连接,第二连接关系为相邻节点间的连接次数小于或等于次数阈值的连接,根据该第一连接关系、第二连接关系,以目标节点为起点,依次向外遍历所述图谱网络的节点,确定出该目标节点对应用户的身份群组,也即,该方案基于规则模式的融合方式,得到的结果准确率高。It can be seen from the above analysis that in this embodiment, based on the number of connections between adjacent nodes in the graph network and the preset number of thresholds, the first connection relationship and the second connection relationship are determined. The first connection relationship is the connection between adjacent nodes. The number of connections between nodes is greater than the number of times threshold, and the second connection relationship is the connection between adjacent nodes whose number of connections is less than or equal to the number of times threshold. According to the first connection relationship and the second connection relationship, starting from the target node, the The nodes of the graph network are traversed externally to determine the identity group of the user corresponding to the target node, that is, the solution is based on the fusion method of the rule pattern, and the obtained result has a high accuracy rate.
示例性的,在上述实施例的基础上,图8是根据本申请第四实施例提供的多重用户身份融合的方法的流程示意图。本实施例与图7所示实施例的区别在于两个实施例确定同一用户的身份群组的方式不同。具体的,如图8所示,在本实施例中,上述S103可以通过如下步骤实现:Exemplarily, on the basis of the above-mentioned embodiments, FIG. 8 is a schematic flowchart of a method for fusing multiple user identities according to the fourth embodiment of the present application. The difference between this embodiment and the embodiment shown in FIG. 7 lies in the way that the two embodiments determine the identity group of the same user. Specifically, as shown in FIG. 8, in this embodiment, the above S103 may be implemented through the following steps:
S801、根据该图谱网络中节点之间的连接关系、节点和连接边之间的连接关系以及各节点具有的属性信息,确定出节点间的关联关系。S801. Determine the association relationship between nodes according to the connection relationship between nodes in the graph network, the connection relationship between nodes and connection edges, and the attribute information of each node.
在本实施例中,主要基于模型模式进行身份融合的计算,具体的,根据节点之间的连接关系、节点和连接边之间的连接关系以及各节点具有的属性信息,对节点进行特征化,例如,进行标识嵌入处理(ID-Embedding),该ID-Embedding的目的是为图谱网络中的每个节点提取一组特征(一组向量),以实现节点间数学运算的目的。In this embodiment, the calculation of identity fusion is mainly based on the model mode. Specifically, the nodes are characterized according to the connection relationship between nodes, the connection relationship between nodes and connection edges, and the attribute information of each node. For example, ID-Embedding is performed. The purpose of ID-Embedding is to extract a set of features (a set of vectors) for each node in the graph network to achieve the purpose of mathematical operations between nodes.
可以理解的是,节点特征化的方法有很多,例如,node2vec、graph-embedding等。关于实际采用的方法可以根据实际需要确定,此处不再赘述。It is understandable that there are many methods for node characterization, for example, node2vec, graph-embedding, etc. The method actually adopted can be determined according to actual needs, and will not be repeated here.
在本申请的实施例中,为满足更多的需求场景,可以采用基于属性数据类型的特征化方案,其是一种通用性的特征表示方法,具体的,可以根据属性特征的不同类型:string、int、double、enum,进行相关的编码。例如,string类型可以根据onehot字符编码、int进行数值归一化编码。In the embodiment of this application, in order to meet more demand scenarios, a characterization scheme based on attribute data types can be used, which is a general feature representation method. Specifically, it can be based on different types of attribute features: string , int, double, enum, for related encoding. For example, the string type can perform numerical normalization encoding according to onehot character encoding and int.
S802、基于节点间的关联关系,对该图谱网络中的节点进行聚合,确定同一用户的身份群组。S802. Based on the association relationship between the nodes, aggregate the nodes in the graph network to determine the identity group of the same user.
在本实施例中,基于节点间的关联关系对节点进行聚类,也即,采用相似度算法对相似度(相关联度)较高的节点进行聚合,形成强相关群体。其中,相似度算法可以根据S801中提取的特征属性的不同进行选择。例如,离散型特征可以使用Jaccard相似度算法,连续性特征向量可以采用余弦相似度,数值型特征可以采用欧几里得相似度。In this embodiment, the nodes are clustered based on the association relationship between the nodes, that is, the nodes with high similarity (association) are aggregated using a similarity algorithm to form a strongly related group. Wherein, the similarity algorithm may be selected according to the difference of the feature attributes extracted in S801. For example, the Jaccard similarity algorithm can be used for discrete features, the cosine similarity can be used for continuous feature vectors, and the Euclidean similarity can be used for numerical features.
由上述分析可知,确定出节点间的关联关系之后,相应的得到节点间的关联度,因而,可以选取不同的算法对节点进行聚类,例如,使用经典的Louvain社团发现算法,也可以使用传统的k-means算法进行聚类。最终的输出是一个置信度较高的针对同一用户的身份群组。From the above analysis, it can be known that after determining the association relationship between nodes, the corresponding degree of association between nodes can be obtained. Therefore, different algorithms can be selected to cluster nodes. For example, the classic Louvain community discovery algorithm can also be used. The k-means algorithm for clustering. The final output is a high-confidence identity group for the same user.
本申请实施例提供的多重用户身份融合的方法,根据该图谱网络中节点之间的连接关系、节点和连接边之间的连接关系以及各节点具有的属性信息,确定出节点间的关联关系,基于节点间的关联关系,对该图谱网络中的节点进行聚合,确定同一用户的身份群组。该技术方案中,基于模型模式的融合方式,可以保证高融合率。The method for merging multiple user identities provided by the embodiment of the present application determines the association relationship between nodes according to the connection relationship between nodes in the graph network, the connection relationship between nodes and connection edges, and the attribute information of each node. Based on the association relationship between nodes, the nodes in the graph network are aggregated to determine the identity group of the same user. In this technical solution, a fusion method based on a model mode can ensure a high fusion rate.
值得说明的是,在实际应用中,基于规则模式和模型模式的融合方式可以结合使用,也即,两者互补可以构成完整的融合系统。本申请的实施例中,可以根据数据规模选择分布式和非分布式计算,从而方便了在工程角度独立部署的方案。It is worth noting that in practical applications, the fusion methods based on the rule mode and the model mode can be used in combination, that is, the two complement each other to form a complete fusion system. In the embodiment of the present application, distributed and non-distributed computing can be selected according to the data scale, thus facilitating independent deployment from an engineering point of view.
示例性的,在上述任意实施例的基础上,图9是根据本申请第五实施例提供的多重用户身份融合的方法的流程示意图。参照图9所示,该方法还可以包括如下步骤:Exemplarily, on the basis of any of the foregoing embodiments, FIG. 9 is a schematic flowchart of a method for fusing multiple user identities according to the fifth embodiment of the present application. Referring to Figure 9, the method may also include the following steps:
S901、根据同一用户的身份群组,确定该身份群组中与目标身份特征具有关联关系的多个用户身份特征,该目标身份特征为身份群组包括的用户身份特征中的任意一个。S901. According to the identity group of the same user, determine a plurality of user identity features associated with a target identity feature in the identity group, where the target identity feature is any one of the user identity features included in the identity group.
在本实施例中,根据身份群组确定目标身份特征的多个关联用户身份特征的方案实际上是关联检索的过程。具体的,关联检索主要是基于图1所示实施例中S103和图5所示关联关系形成的强连通图进行图遍历检索的过程。In this embodiment, the scheme of determining multiple associated user identity features of the target identity feature according to the identity group is actually a process of associated retrieval. Specifically, the association retrieval is mainly a process of graph traversal retrieval based on the strongly connected graph formed by S103 in the embodiment shown in FIG. 1 and the association relationship shown in FIG. 5 .
具体的,基于给定的源节点类型、源节点的身份特征,以及目标节点类型等,可以输出连通的节点标识以及对应分值,该分值表征连通的节点与源节点和目标节点的关联程度。由于节点间的连通关系已形成一个连通图,因而,如图5所示,可以根据目标身份特征很直观的进行目标身份特征的身份打通推导,进而得到所有与目标身份特征具有关联关系的所有用户身份特征。Specifically, based on the given source node type, identity characteristics of the source node, and target node type, etc., the connected node identifier and corresponding score can be output, which represents the degree of association between the connected node and the source node and the target node. . Since the connection relationship between nodes has formed a connected graph, as shown in Figure 5, the identity of the target identity feature can be intuitively deduced according to the target identity feature, and then all users who have an association relationship with the target identity feature can be obtained identity traits.
示例性的,在本实施例中,该步骤可以通过如下方式实现:Exemplarily, in this embodiment, this step can be implemented in the following manner:
对同一用户的身份群组中的节点进行检索、遍历和筛选处理,确定该身份群组中与目标身份特征具有关联关系的多个用户身份特征。The nodes in the identity group of the same user are retrieved, traversed and screened to determine multiple user identity features associated with the target identity feature in the identity group.
在本申请的实施例中,身份特征连通推导的过程可分为:节点身份检索、节点身份的广度遍历和节点身份特征的筛选等。In the embodiment of the present application, the process of deriving identity feature connectivity can be divided into: node identity retrieval, node identity breadth traversal, and node identity feature screening.
具体的,节点身份检索:由上述图1所示实施例中的介绍可知,点存储主要包括节点主键(KEY)、节点拥有的属性信息和节点连接的连接边。通过对属性主键构建索引(例如,对ID-TYPE和ID-INFO构建索引),可以提供提高检索性能。例如,若需要检索手机号为139xxxxxxxx的节点,检索节点的过程,即查询ID-TYPE=phone,ID-INFO=139xxxxxxxx的数据。Specifically, node identity retrieval: From the introduction in the embodiment shown in FIG. 1 above, it can be seen that the point storage mainly includes the node primary key (KEY), the attribute information owned by the node and the connection edges of the node connection. By building an index on the attribute primary key (for example, building an index on ID-TYPE and ID-INFO), it can provide improved retrieval performance. For example, if it is necessary to retrieve the node whose mobile phone number is 139xxxxxxxx, the process of retrieving the node is to query the data with ID-TYPE=phone and ID-INFO=139xxxxxxxx.
节点身份的广度遍历:基于图的广度优先搜索(breadth first search,BFS)的节点遍历。从目标身份特征对应的目标节点开始,由内向外,依次遍历与它有1层、2层…n层(可参数控制)连接的节点。Breadth traversal of node identity: node traversal of graph-based breadth first search (BFS). Starting from the target node corresponding to the target identity feature, from the inside to the outside, sequentially traverse the nodes connected to it with 1 layer, 2 layers...n layers (can be controlled by parameters).
示例性的,在图5所示的示意图中,对于图5中的手机号,第一层得到IMEI、IDFA、AP、IP地址等,第二层得到车、房等。由上述图2所示的点存储可知,BFS的过程即遍历节点的值中存储的连接边的过程,一个节点的一度遍历仅只进行一次K-V检索,大大提升的遍历效率。Exemplarily, in the schematic diagram shown in Figure 5, for the mobile phone number in Figure 5, the first layer obtains IMEI, IDFA, AP, IP address, etc., and the second layer obtains the car, house, etc. From the point storage shown in Figure 2 above, it can be seen that the process of BFS is the process of traversing the connection edges stored in the value of the node. One-degree traversal of a node only requires one K-V search, which greatly improves the traversal efficiency.
节点身份特征的筛选:通过对图谱网络中的节点进行遍历可以获取到所有与目标身份特征对应节点联通的身份群组,由于节点的属性主键带有索引,因而可以支持根据属性主键的索引对检索结果进行过滤。例如,要获取与手机联通的idfa,遍历一轮既可以得到结果,非ID-TYPE=idfa的节点会被过滤掉。Screening of node identity characteristics: By traversing the nodes in the graph network, all the identity groups connected to the nodes corresponding to the target identity characteristics can be obtained. Since the attribute primary key of the node has an index, it can support retrieval based on the index of the attribute primary key The results are filtered. For example, to obtain the idfa connected to the mobile phone, one round of traversal can get the result, and the nodes with non-ID-TYPE=idfa will be filtered out.
由此,通过上述节点身份检索、节点身份的广度遍历和节点身份特征的筛选等过程可快速推导获得与目标身份特征对应顶点有连通关系的顶点。Therefore, through the above processes of node identity retrieval, node identity breadth traversal, and node identity feature screening, vertices that are connected to the corresponding vertex of the target identity feature can be quickly derived.
本实施例中,基于图遍历的在线身份特征推导,通过利用图谱网络的存储及索引方案,相比于现有通过正方向表存储身份特征的映射关系的方案,平均响应效果显著提升,尤其当连通的通路步长(度数)大于2步时,平均响应性能提升10倍。In this embodiment, the online identity feature derivation based on graph traversal, by using the storage and indexing scheme of the graph network, compared with the existing scheme of storing the mapping relationship of identity features through the positive direction table, the average response effect is significantly improved, especially when When the connected path step length (degree) is greater than 2 steps, the average response performance is increased by 10 times.
S902、向多个用户身份特征中的至少一个身份特征推送消息。S902. Push a message to at least one identity feature among the plurality of user identity features.
在本实施例中,对于同一用户的身份群组,当确定出身份群组中与目标身份特征具有关联关系的多个用户身份特征之后,可以向多个用户身份特征中的至少一个推送消息。In this embodiment, for the identity group of the same user, after determining multiple user identity features in the identity group that are associated with the target identity feature, a message may be pushed to at least one of the multiple user identity features.
例如,当打通某个搜索网站和电商网站的身份,当用户在搜索网站上搜索商品之后,可以在用户登录电商网站时,在电商网站上为用户推荐搜索过的该商品,从而有针对性的将该用户可能感兴趣的商品推荐给用户。For example, when the identities of a search website and an e-commerce website are connected, after the user searches for a product on the search website, the product that the user has searched for can be recommended on the e-commerce website when the user logs in to the e-commerce website, thereby having The products that the user may be interested in are recommended to the user in a targeted manner.
由上述分析可知,在本实施例中,通过根据同一用户的身份群组,确定该身份群组中与目标身份特征具有关联关系的多个用户身份特征,该目标身份特征为身份群组包括的用户身份特征中的任意一个,向该多个用户身份特征中的至少一个身份特征推送消息。该技术方案实现了有针对性的向用户推送消息,提高了产品商业价值。It can be seen from the above analysis that in this embodiment, according to the identity group of the same user, multiple user identity features that have an association relationship with the target identity feature in the identity group are determined, and the target identity feature is the identity group included. Any one of the user identity features, pushing a message to at least one of the plurality of user identity features. The technical solution realizes targeted pushing of messages to users and improves the commercial value of products.
综上所述,本申请实施例的技术实现,将用户身份数据中的每个身份特征看成图谱网络中的一个节点,通过图论算法,将若干弱关系的身份特征对进行连接关系加强,形成强关系对,以达到身份融合的目的,其涉及到数据提取、数据存储、融合计算、关联检索一整套方案,支持私有化部署,方便小企业进行数据融合。To sum up, in the technical implementation of the embodiment of the present application, each identity feature in the user identity data is regarded as a node in the graph network, and the connection relationship between several weakly related identity feature pairs is strengthened through the graph theory algorithm. Form a strong relationship pair to achieve the purpose of identity fusion, which involves a complete set of solutions for data extraction, data storage, fusion computing, and associated retrieval, supports privatization deployment, and facilitates data fusion for small enterprises.
上述介绍了本申请实施例提到的多重身份融合的方法的具体实现,下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。The above describes the specific implementation of the multiple identity fusion method mentioned in the embodiment of the present application. The following is the embodiment of the device of the present application, which can be used to implement the method embodiment of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
图10为本申请实施例提供的多重身份融合的装置的结构示意图。该装置可以集成在电子设备中或通过电子设备实现,该电子设备可以为终端设备,也可以是服务器。如图10所示,在本实施例中,该多重身份融合的装置100可以包括:FIG. 10 is a schematic structural diagram of an apparatus for fusion of multiple identities provided by an embodiment of the present application. The apparatus can be integrated in or implemented by electronic equipment, and the electronic equipment can be a terminal equipment or a server. As shown in Figure 10, in this embodiment, the
获取模块1001,用于获取用户身份数据,所述用户身份数据具有至少两个身份特征;An
处理模块1002,用于根据所述用户身份数据具有的至少两个身份特征,构建图谱网络,所述图谱网络包括:表征身份特征的节点和表征身份特征的关联关系的连接边;The
确定模块1003,用于根据所述图谱网络中节点之间的连接关系、节点和连接边之间的连接关系,确定同一用户的身份群组,所述身份群组包括:多个身份特征。The
在本申请实施例的一种可能设计中,该获取模块1001,具体用于获取预设的配置信息,所述配置信息包括:数据源类型、数据源路径、提取方式和提取周期;根据所述数据源路径、所述提取方式和所述提取周期,从所述数据源类型对应数据源中提取所述用户身份数据。In a possible design of the embodiment of the present application, the
示例性的,所述配置信息还包括:字段映射关系;Exemplarily, the configuration information further includes: field mapping relationship;
相应的,该处理模块1002,还用于根据所述字段映射关系,依次对获取到的所述用户身份数据进行解析,提取所述用户身份数据具有的至少两个身份特征。Correspondingly, the
在本申请的另一种可能设计中,所述处理模块1002,具体用于以所述用户身份数据中的每个身份特征作为图谱网络的节点,以所述用户身份数据中的每两个身份特征的关联关系作为图谱网络的连接边,构建所述图谱网络,所述图谱网络中每个节点和每条连接边分别具有属性信息。In another possible design of the present application, the
在本申请实施例的再一种可能设计中,该确定模块1003,具体用于根据所述图谱网络中节点之间的连接关系、节点和连接边之间的连接关系,确定所述图谱网络中相邻节点间的连接次数,基于所述图谱网络中相邻节点间的连接次数和预设的次数阈值,确定出第一连接关系和第二连接关系,所述第一连接关系为相邻节点间的连接次数大于所述次数阈值的连接关系,所述第二连接关系为相邻节点间的连接次数小于或等于所述次数阈值的连接关系;根据所述第一连接关系、所述第二连接关系,以目标节点为起点,依次向外遍历所述图谱网络的节点,确定出所述目标节点对应用户的身份群组。In yet another possible design of the embodiment of the present application, the determining
在本申请实施例的又一种可能设计中,该确定模块1003,具体用于根据所述图谱网络中节点之间的连接关系、节点和连接边之间的连接关系以及各节点具有的属性信息,确定出节点间的关联关系;基于所述节点间的关联关系,对所述图谱网络中的节点进行聚合,确定同一用户的身份群组。In yet another possible design of the embodiment of the present application, the
在本申请实施例的上述任意一种可能设计中,该确定模块1003,还用于根据同一用户的身份群组,确定所述身份群组中与目标身份特征具有关联关系的多个用户身份特征,所述目标身份特征为所述身份群组包括的用户身份特征中的任意一个;In any of the above-mentioned possible designs of the embodiment of the present application, the
相应的,该装置还包括:推送模块;Correspondingly, the device also includes: a push module;
该推送模块,用于向所述多个用户身份特征中的至少一个身份特征推送消息。The push module is configured to push a message to at least one identity feature among the plurality of user identity features.
在本申请的实施例中,该确定模块1003,还具体用于对同一用户的身份群组中的节点进行检索、遍历和筛选处理,确定所述身份群组中与目标身份特征具有关联关系的多个用户身份特征。In the embodiment of the present application, the
在本实施例的上述任意一种可能设计中,上述处理模块1002,还用于以图数据库的形式,存储所述图谱网络中节点与连接边的对应关系;In any of the above-mentioned possible designs of this embodiment, the above-mentioned
其中,所述图数据库包括:点存储、连接边存储和属性存储;Wherein, the graph database includes: point storage, connection edge storage and attribute storage;
所述点存储包括:节点主键、节点拥有的属性信息和节点连接的连接边;The point storage includes: node primary key, attribute information owned by the node and connection edges connected by the node;
所述连接边存储包括:连接边主键、连接边所连接的起始点和终止点,以及连接边所携带的属性信息;The storage of the connection edge includes: the primary key of the connection edge, the starting point and the termination point connected by the connection edge, and the attribute information carried by the connection edge;
所述属性存储包括:属性主键、属性所表示的含义,以及属性表示的具体内容。The attribute storage includes: the attribute primary key, the meaning represented by the attribute, and the specific content represented by the attribute.
可选的,所述节点主键、所述连接边主键、所述属性主键均采用索引的方式存储。Optionally, the node primary key, the connection edge primary key, and the attribute primary key are all stored in an index manner.
本申请实施例提供的装置,可用于执行图1至图9所示实施例中的方法,其实现原理和技术效果类似,在此不再赘述。The device provided in the embodiment of the present application can be used to execute the method in the embodiment shown in FIG. 1 to FIG. 9 , and its implementation principle and technical effect are similar, and details are not repeated here.
需要说明的是,应理解以上装置的各个模块的划分仅仅是一种逻辑功能的划分,实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且这些模块可以全部以软件通过处理元件调用的形式实现;也可以全部以硬件的形式实现;还可以部分模块通过处理元件调用软件的形式实现,部分模块通过硬件的形式实现。例如,确定模块可以为单独设立的处理元件,也可以集成在上述装置的某一个芯片中实现,此外,也可以以程序代码的形式存储于上述装置的存储器中,由上述装置的某一个处理元件调用并执行以上确定模块的功能。其它模块的实现与之类似。此外这些模块全部或部分可以集成在一起,也可以独立实现。这里所述的处理元件可以是一种集成电路,具有信号的处理能力。在实现过程中,上述方法的各步骤或以上各个模块可以通过处理器元件中的硬件的集成逻辑电路或者软件形式的指令完成。It should be noted that it should be understood that the division of each module of the above device is only a division of logical functions, and may be fully or partially integrated into one physical entity or physically separated during actual implementation. And these modules can all be implemented in the form of calling software through processing elements; they can also be implemented in the form of hardware; some modules can also be implemented in the form of calling software through processing elements, and some modules can be implemented in the form of hardware. For example, the determining module may be a separate processing element, or may be integrated in a chip of the above-mentioned device. In addition, it may be stored in the memory of the above-mentioned device in the form of program code, and a certain processing element of the above-mentioned device may Call and execute the functions of the modules identified above. The implementation of other modules is similar. In addition, all or part of these modules can be integrated together, and can also be implemented independently. The processing element mentioned here may be an integrated circuit with signal processing capability. In the implementation process, each step of the above method or each module above can be completed by an integrated logic circuit of hardware in the processor element or an instruction in the form of software.
例如,以上这些模块可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(application specific integrated circuit,ASIC),或,一个或多个微处理器(digital signal processor,DSP),或,一个或者多个现场可编程门阵列(field programmable gate array,FPGA)等。再如,当以上某个模块通过处理元件调度程序代码的形式实现时,该处理元件可以是通用处理器,例如中央处理器(centralprocessing unit,CPU)或其它可以调用程序代码的处理器。再如,这些模块可以集成在一起,以片上系统(system-on-a-chip,SOC)的形式实现。For example, the above modules may be one or more integrated circuits configured to implement the above method, for example: one or more specific integrated circuits (application specific integrated circuit, ASIC), or one or more microprocessors (digital signal processor, DSP), or, one or more field programmable gate arrays (field programmable gate array, FPGA), etc. For another example, when one of the above modules is implemented in the form of a processing element scheduling program code, the processing element may be a general-purpose processor, such as a central processing unit (central processing unit, CPU) or other processors that can call program codes. For another example, these modules can be integrated together and implemented in the form of a system-on-a-chip (SOC).
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘solid state disk(SSD))等。In the above embodiments, all or part of them may be implemented by software, hardware, firmware or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server, or data center by wired (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media. The available medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, DVD), or a semiconductor medium (for example, a solid state disk (SSD)).
进一步的,根据本申请的实施例,本申请还提供了一种电子设备和一种可读存储介质。Further, according to the embodiments of the present application, the present application also provides an electronic device and a readable storage medium.
图11是用来实现本申请实施例的多重用户身份融合的方法的电子设备的框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本申请的实现。Fig. 11 is a block diagram of an electronic device used to implement the method for merging multiple user identities according to the embodiment of the present application. Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the applications described and/or claimed herein.
如图11所示,该电子设备包括:一个或多个处理器1101、存储器1102,以及用于连接各部件的接口,包括高速接口和低速接口。各个部件利用不同的总线互相连接,并且可以被安装在公共主板上或者根据需要以其它方式安装。处理器可以对在电子设备内执行的指令进行处理,包括存储在存储器中或者存储器上以在外部输入/输出装置(诸如,耦合至接口的显示设备)上显示GUI的图形信息的指令。在其它实施方式中,若需要,可以将多个处理器和/或多条总线与多个存储器和多个存储器一起使用。同样,可以连接多个电子设备,各个设备提供部分必要的操作(例如,作为服务器阵列、一组刀片式服务器、或者多处理器系统)。图11中以一个处理器1101为例。As shown in FIG. 11 , the electronic device includes: one or
存储器1102即为本申请所提供的非瞬时计算机可读存储介质。其中,所述存储器存储有可由至少一个处理器执行的指令,以使所述至少一个处理器执行本申请所提供的多重用户身份融合的方法。本申请的非瞬时计算机可读存储介质存储计算机指令,该计算机指令用于使计算机执行本申请所提供的多重用户身份融合的方法。The
存储器1102作为一种非瞬时计算机可读存储介质,可用于存储非瞬时软件程序、非瞬时计算机可执行程序以及模块,如本申请实施例中的多重用户身份融合的方法对应的程序指令/模块(例如,附图10所示的获取模块1001、处理模块1002和确定模块1003)。处理器1101通过运行存储在存储器Y02中的非瞬时软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例中的多重用户身份融合的方法。The
存储器1102可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据多重用户身份融合的电子设备的使用所创建的数据等。此外,存储器1102可以包括高速随机存取存储器,还可以包括非瞬时存储器,例如至少一个磁盘存储器件、闪存器件、或其他非瞬时固态存储器件。在一些实施例中,存储器1102可选包括相对于处理器1101远程设置的存储器,这些远程存储器可以通过网络连接至多重用户身份融合的电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The
多重用户身份融合的电子设备还可以包括:输入装置1103和输出装置1104。处理器1101、存储器1102、输入装置1103和输出装置1104可以通过总线或者其他方式连接,图11中以通过总线连接为例。The electronic device with multiple user identities fused may also include: an
输入装置1103可接收输入的数字或字符信息,以及产生与多重用户身份融合的电子设备的用户设置以及功能控制有关的键信号输入,例如触摸屏、小键盘、鼠标、轨迹板、触摸板、指示杆、一个或者多个鼠标按钮、轨迹球、操纵杆等输入装置。输出装置1104可以包括显示设备、辅助照明装置(例如,LED)和触觉反馈装置(例如,振动电机)等。该显示设备可以包括但不限于,液晶显示器(LCD)、发光二极管(LED)显示器和等离子体显示器。在一些实施方式中,显示设备可以是触摸屏。The
此处描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、专用ASIC(专用集成电路)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described herein can be implemented in digital electronic circuitry, integrated circuit systems, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
这些计算程序(也称作程序、软件、软件应用、或者代码)包括可编程处理器的机器指令,并且可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。如本文使用的,术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如,磁盘、光盘、存储器、可编程逻辑装置(PLD)),包括,接收作为机器可读信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。These computing programs (also referred to as programs, software, software applications, or codes) include machine instructions for a programmable processor and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine language calculation program. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or means for providing machine instructions and/or data to a programmable processor ( For example, magnetic disks, optical disks, memories, programmable logic devices (PLDs), including machine-readable media that receive machine instructions as machine-readable signals. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide for interaction with the user, the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。A computer system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
本申请实施例还提供一种多重用户身份融合的方法,包括:The embodiment of the present application also provides a method for merging multiple user identities, including:
根据用户身份数据具有的至少两个身份特征,确定所述至少两个身份特征的关联关系;According to at least two identity characteristics of the user identity data, determine the association relationship of the at least two identity characteristics;
根据所述至少两个身份特征的关联关系,确定同一用户的身份群组。According to the association relationship of the at least two identity features, the identity group of the same user is determined.
该实施例的具体实现原理可以参见上述图1至图9所示实施例的记载,此处不再赘述。For the specific implementation principles of this embodiment, refer to the descriptions of the above-mentioned embodiments shown in FIG. 1 to FIG. 9 , and will not be repeated here.
根据本申请实施例的技术方案,通过获取用户身份数据,根据该用户身份数据具有的至少两个身份特征,构建图谱网络,该图谱网络包括:表征身份特征的节点和表征身份特征关系的连接边,根据该图谱网络中节点之间的关联关系、节点和连接边之间的关联关系,确定同一用户的身份群组,该身份群组包括:多个身份特征。该技术方案中,通过图谱网络的形式将用户身份数据具有的身份特征关联起来,不仅能够准确的确定出同一用户的多个身份特征对应的身份群组,而且其可以应用于任何场景,避免了使用范围受限的问题。According to the technical solution of the embodiment of the application, by acquiring user identity data, a graph network is constructed according to at least two identity features of the user identity data, and the graph network includes: nodes representing identity features and connecting edges representing identity feature relationships , according to the association relationship between nodes in the graph network, and the association relationship between nodes and connection edges, the identity group of the same user is determined, and the identity group includes: a plurality of identity features. In this technical solution, the identity features of user identity data are associated in the form of graph network, not only can accurately determine the identity group corresponding to multiple identity features of the same user, but also it can be applied to any scene, avoiding the Issues of limited use.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本申请公开的技术方案所期望的结果,本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, the steps described in the present application may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present application can be achieved, no limitation is imposed herein.
上述具体实施方式,并不构成对本申请保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本申请的精神和原则之内所作的修改、等同替换和改进等,均应包含在本申请保护范围之内。The above specific implementation methods are not intended to limit the protection scope of the present application. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of this application shall be included within the protection scope of this application.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910831646.7A CN110543586B (en) | 2019-09-04 | 2019-09-04 | Multi-user identity fusion method, device, equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910831646.7A CN110543586B (en) | 2019-09-04 | 2019-09-04 | Multi-user identity fusion method, device, equipment and storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN110543586A CN110543586A (en) | 2019-12-06 |
| CN110543586B true CN110543586B (en) | 2022-11-15 |
Family
ID=68712484
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910831646.7A Active CN110543586B (en) | 2019-09-04 | 2019-09-04 | Multi-user identity fusion method, device, equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN110543586B (en) |
Families Citing this family (17)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111143627B (en) * | 2019-12-27 | 2023-08-15 | 北京百度网讯科技有限公司 | User identity data determination method, device, equipment and medium |
| CN111259090B (en) * | 2020-02-03 | 2023-10-24 | 北京百度网讯科技有限公司 | Graph generation method and device of relational data, electronic equipment and storage medium |
| SG10202001002UA (en) * | 2020-02-04 | 2020-07-29 | Alipay Labs Singapore Pte Ltd | Methods and systems for identity authentication |
| CN113283921A (en) * | 2020-02-19 | 2021-08-20 | 华为技术有限公司 | Business data processing method and device and cloud server |
| CN111459999B (en) * | 2020-03-27 | 2023-08-18 | 北京百度网讯科技有限公司 | Identity information processing method, device, electronic equipment and storage medium |
| CN111506737B (en) * | 2020-04-08 | 2023-12-19 | 北京百度网讯科技有限公司 | Graph data processing method, retrieval method, device and electronic equipment |
| CN113556368A (en) * | 2020-04-23 | 2021-10-26 | 北京达佳互联信息技术有限公司 | User identification method, device, server and storage medium |
| CN111752943B (en) * | 2020-05-19 | 2024-11-05 | 北京网思科平科技有限公司 | A graph relationship path positioning method and system |
| CN111640477A (en) * | 2020-05-29 | 2020-09-08 | 京东方科技集团股份有限公司 | Identity information unifying method and device and electronic equipment |
| CN112115367B (en) * | 2020-09-28 | 2024-04-02 | 北京百度网讯科技有限公司 | Information recommendation method, device, equipment and medium based on fusion relation network |
| CN112115381B (en) * | 2020-09-28 | 2024-08-02 | 北京百度网讯科技有限公司 | Method, device, electronic device and medium for constructing fusion relationship network |
| CN112883170B (en) * | 2021-01-20 | 2023-08-18 | 中国人民大学 | User feedback guided self-adaptive dialogue recommendation method and system |
| CN113672653B (en) * | 2021-08-09 | 2024-10-29 | 杭州蚂蚁酷爱科技有限公司 | Method and device for identifying private data in database |
| CN114064705B (en) * | 2021-10-19 | 2025-01-07 | 广州数说故事信息科技有限公司 | User information fusion method, terminal, storage medium and system under multi-layer association |
| CN114547279B (en) * | 2022-02-21 | 2023-04-28 | 电子科技大学 | A Judicial Recommendation Method Based on Hybrid Filtering |
| CN114676288B (en) * | 2022-03-17 | 2024-06-28 | 北京悠易网际科技发展有限公司 | ID pull-through method and device |
| CN117349358B (en) * | 2023-12-04 | 2024-02-20 | 中国电子投资控股有限公司 | Data matching and merging method and system based on distributed graph processing framework |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105099729A (en) * | 2014-04-22 | 2015-11-25 | 阿里巴巴集团控股有限公司 | User ID (Identification) recognition method and device |
| CN107682344A (en) * | 2017-10-18 | 2018-02-09 | 南京邮数通信息科技有限公司 | A kind of ID collection of illustrative plates method for building up based on DPI data interconnection net identifications |
| CN108664480A (en) * | 2017-03-27 | 2018-10-16 | 北京国双科技有限公司 | A kind of multi-data source user information integration method and device |
| CN109347787A (en) * | 2018-08-15 | 2019-02-15 | 阿里巴巴集团控股有限公司 | A kind of recognition methods of identity information and device |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9264329B2 (en) * | 2010-03-05 | 2016-02-16 | Evan V Chrapko | Calculating trust scores based on social graph statistics |
-
2019
- 2019-09-04 CN CN201910831646.7A patent/CN110543586B/en active Active
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN105099729A (en) * | 2014-04-22 | 2015-11-25 | 阿里巴巴集团控股有限公司 | User ID (Identification) recognition method and device |
| CN108664480A (en) * | 2017-03-27 | 2018-10-16 | 北京国双科技有限公司 | A kind of multi-data source user information integration method and device |
| CN107682344A (en) * | 2017-10-18 | 2018-02-09 | 南京邮数通信息科技有限公司 | A kind of ID collection of illustrative plates method for building up based on DPI data interconnection net identifications |
| CN109347787A (en) * | 2018-08-15 | 2019-02-15 | 阿里巴巴集团控股有限公司 | A kind of recognition methods of identity information and device |
Also Published As
| Publication number | Publication date |
|---|---|
| CN110543586A (en) | 2019-12-06 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110543586B (en) | Multi-user identity fusion method, device, equipment and storage medium | |
| CN111782965B (en) | Intended to recommend methods, devices, equipment and storage media | |
| US11281793B2 (en) | User permission data query method and apparatus, electronic device and medium | |
| CN107545046B (en) | Method and device for fusion of multi-source heterogeneous data | |
| Wang et al. | Efficiently estimating motif statistics of large networks | |
| US20230127055A1 (en) | Global column indexing in a graph database | |
| CN111324643A (en) | Knowledge graph generation method, relation mining method, device, equipment and medium | |
| CN104077723B (en) | A kind of social networks commending system and method | |
| KR20200099602A (en) | Layered graph data structure | |
| CN111046237A (en) | User behavior data processing method and device, electronic equipment and readable medium | |
| US11636111B1 (en) | Extraction of relationship graphs from relational databases | |
| CN112528067A (en) | Graph database storage method, graph database reading method, graph database storage device, graph database reading device and graph database reading equipment | |
| CN107391506A (en) | Method and apparatus for inquiring about data | |
| CN112860811A (en) | Method and device for determining data blood relationship, electronic equipment and storage medium | |
| CN111241225B (en) | Method, device, equipment and storage medium for judging change of resident area | |
| Elagib et al. | Big data analysis solutions using MapReduce framework | |
| WO2021027331A1 (en) | Graph data-based full relationship calculation method and apparatus, device, and storage medium | |
| CN114902246A (en) | A system for fast interactive exploration of big data | |
| CN114398520A (en) | Data retrieval method, system, device, electronic device and storage medium | |
| CN116756330A (en) | Knowledge graph construction method and device, electronic equipment and storage medium | |
| CN103036726A (en) | Method and device for network user management | |
| CN105653523A (en) | Energy consumption supervise network of things basis platform system building method | |
| Li et al. | Matching large scale ontologies based on filter and verification | |
| CN113987345A (en) | Data processing method, apparatus, device, storage medium and program product | |
| CN107291875A (en) | A kind of metadata organization management method and system based on metadata graph |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |