[go: up one dir, main page]

CN106447503A - Calculation method for searching public contact and path thereof in large-scale network graph based on Map Reduce - Google Patents

Calculation method for searching public contact and path thereof in large-scale network graph based on Map Reduce Download PDF

Info

Publication number
CN106447503A
CN106447503A CN201610777592.7A CN201610777592A CN106447503A CN 106447503 A CN106447503 A CN 106447503A CN 201610777592 A CN201610777592 A CN 201610777592A CN 106447503 A CN106447503 A CN 106447503A
Authority
CN
China
Prior art keywords
node
contact person
path
public
target group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610777592.7A
Other languages
Chinese (zh)
Inventor
杨国辉
刘晓慧
王树辰
宣明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Chile Mdt Infotech Ltd
Original Assignee
Guangzhou Chile Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Chile Mdt Infotech Ltd filed Critical Guangzhou Chile Mdt Infotech Ltd
Priority to CN201610777592.7A priority Critical patent/CN106447503A/en
Publication of CN106447503A publication Critical patent/CN106447503A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention relates to the social network field, concretely to a calculation method for searching a public contact and a path thereof in a large-scale network graph based on Map Reduce. Through the Map Reduce operation mechanism and the combination of a Mapper and a Reducer, the method comprises: establishing a distribution calculation frame for searching all the public contacts and the shortest contact paths thereof of a target group in a plurality of mapper amounts, calculating the shortest path between the target group and the public contacts, and recording and storing the shortest path into the database. The calculation method for searching the public contact and the path thereof in the large-scale network graph based on the Map Reduce can successfully combine a Map model and a Reduce model to realize the calculation and recording of the shortest path between any node of the target group and the public contact; and moreover, the shortest path optimization calculation method with high efficiency is provided to push a new algorithm progress thinking for further development of the social network.

Description

A kind of catenet in figure based on Map Reduce finds public contact person and its road The computational methods in footpath
Technical field
The present invention relates to field of social network is and in particular to a kind of catenet in figure based on Map Reduce finds public affairs The computational methods in contact person and its path altogether.
Background technology
Map Reduce is a kind of distributed high-volume data Computational frame, and it is very easy to programming personnel will not In the case of distributed parallel programming, the program of oneself is operated in distributed system.The data of Map Reduce is based on Key-Value form is processed, and the data reading a line a line in file line by line in the Mapper stage does logical process, The Reducer stage does reduction according to the key value that Mapper exports to multirow data, it is advantageous that set multiple stage common machines Operational capability completes something parallel.How to be based on the so simple two-stage model of Map and Reduce, find in network Multilayer relation connects, and this is a very significant problem.
Inventor finds, does not also have the R. concomitans by Map and Reduce model at present, realizes in catenet in figure Find the effective calculation method of public contact person.Therefore, the present invention provides a kind of catenet in figure based on Map Reduce Find the computational methods in public contact person and its path, realize finding institute in jumping figure K for the target group in catenet in figure There are public contact person and its shortest internuncial pathway, to solve the deficiency of prior art presence.
Content of the invention
The purpose of the present invention is that:For the problems referred to above that presently, there are, provide a kind of big based on Map Reduce Find the computational methods in public contact person and its path in type network, realize finding target group in jump in catenet in figure All public contact person in number K and its shortest internuncial pathway, to solve the deficiency of prior art presence.
To achieve these goals, the technical solution used in the present invention is:
A kind of catenet in figure based on Map Reduce finds the computational methods in public contact person and its path, described Computational methods pass through Map Reduce operating mechanism, in conjunction with principle of skipping a grade (Mapper) and combination principle (Reducer), set up All public contact person in some jumping figures for the target group and its shortest contact road is found in catenet figure or social networks The distributed computing framework in footpath, calculates the shortest path between described public contact person for the described target group, records and store In database.
Described computational methods include step is calculated as below:
(1) original state:The newly-built one any name text with " .txt " as suffix, its content comprises three line numbers Be target group's node colony according to, the wherein first column data, after each column data be beeline between target group's node, Direct correlation between described target group is recorded in described text;
(2) first jump Mapper:Obtain the direct contact person of each target group's node from HBase, and calculate these institutes State direct contact person the distance between to described target group's node;If certain node is to meet one of public contact person that K jumps, Then the direct contact of this node is to meet the public contact person that K+1 jumps per capita;
(3) first jump Reducer:Using " combination principle ", if having multiple different in derivation to certain node Distance record, it should be merged, takes minimum of a value;
Specifically rule is:Comparison between non-negative distance, takes the minimum path of distance;If the minimum of a value of non-negative distance has Multiple, then calculate the different paths preserving in all minimum ranges during path;Apart from unknown and between known comparison, take Apart from known path;If distance is all -1, then path is also -1;
(4) second jump Map Reduce:Using the distance matrix file of first time Map Reduce output as second Map The input of Reduce, then repeats described Mapper (principle of skipping a grade) and described Reduce (combination principle) process, obtains double bounce Interior public contact person and its internuncial pathway;
(5) K jump Map Reduce:Using the described distance matrix file of the K-1 time Map Reduce output as kth The input of Map Reduce, then repeats Mapper (principle of skipping a grade) and Reducer (combination principle) process, obtains in K jump Public contact person and its internuncial pathway;
(6) result output:According to " termination principle ", after action of repeatedly skipping a grade, finally can become destination node group The node of body public contact person in K jumps, necessarily meets a condition, that is, the distance arriving destination node colony all within 0-K, The only distance no more than node of K meets condition, and specific rules are:If finding the public contact person in K jumps, through K Map After Reduce calculates, the node meeting condition in distance matrix is Ni (D1, D2, D3 ..., Dm), wherein 0<Dj(0<j<=m) <=K;All public contact persons meeting in the as K jump of this regular node, in distance vector, the path of each element is public affairs Contact person is to the shortest internuncial pathway of destination node altogether.
The concrete function that the present invention realizes is as follows:
1) catenet figure can reach more than hundred million grades of nodes, and the external side of per node on average can be in hundred ranks More than;
2) if target group can be arbitrarily designated dry contact, rather than simply look for 2 points of A, B;
3) search out all public contact person in jumping figure K for the target group, rather than whether simply detect in K jumps Up to;
4) for any one public contact person, target group's arbitrary node can be recorded between this public contact person Shortest path.
It is assumed that ID uses the MD5 value of 32 bytes to represent, then the direct correlation relation of each user at least needs 100* 32B about 32KB memory space, 100,000,000 users then at least need 100,000,000 * 3.2KB about 320GB memory space.If considering, backup etc. consumes, Total memory space is in T rank.
HBase columnar database is taken, with the storage of Key-Value form, rowkey is ID in this programme, Qualifier and value is the corresponding direct contacts list of this user.HBase management T rank data has no pressure, with Rowkey extraction data is very quick, simultaneously can be with MapReduce distributed computing framework seamless combination.
Preferably, if described target group can be doing of being arbitrarily designated;For any one of public contact person, Described computational methods can calculate the arbitrary node of the described target group shortest path between described public contact person.This Bright computational methods are applied to finds all public contact person in jumping figure K for the target group and its shortest in catenet in figure Internuncial pathway.
Preferably, described catenet figure can reach more than hundred million grades of nodes, and the external side of per node on average Can be beneficial to, more than hundred ranks, the social demand meeting in catenet data platform.
Preferably, can be multi-thread false relation between the node of described target group, be beneficial to satisfaction wrong in relation In the case of comprehensive complexity, K jumps the interior quick realization finding public contact person and the shortest internuncial pathway.
Preferably, the direct correlation between described target group's node, is stored in literary composition in the form of distance matrix In part, and the input calculating as follow-up Map Reduce.
Preferably, described database total storage capacity more than T rank or T rank, preferably to meet a large number of users number According to effective storage.
Preferably, described database adopts HBase columnar database, with the storage of Key-Value form.
Preferably, described Map Reduce operating mechanism is a kind of distributed batch data Computational frame, for extensive The programming model of the concurrent operation of data set.
Preferably, in the artificial network chart of described public contact any two node N1 and N2 within K jumps all up to section Point.
Due to employing technique scheme, the invention has the beneficial effects as follows:
First, successfully by Map model together with Reduce models coupling it is achieved that target group's arbitrary node to should Shortest path between public contact person calculates and records;Secondly it is proposed that a kind of higher shortest path first of efficiency calculates Method;Again, the computational methods of the present invention be social networks be pushed further into and develop provide a kind of new algorithm propulsion Thinking.
Brief description
Fig. 1 is the three public contact person's schematic diagrames (1) of jump interior between 2 points of the present invention;
Fig. 2 is the three public contact person's schematic diagrames (2) of jump interior between 2 points of the present invention;
Fig. 3 is a public contact person's schematic diagram of jump interior between 3 points of the present invention;
Fig. 4 is the public contact person's schematic diagram of double bounce interior between 3 points of the present invention;
Fig. 5 jumps the interior time comparison diagram calculating public contact person for the present invention three;
Fig. 6 jumps the interior time comparison diagram calculating public contact person and path for the present invention three.
In figure:Node in A, B, C, D, E, F, G- relational network.
Specific embodiment
Purpose, technical scheme and advantage for making the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described it is clear that described embodiment is The a part of embodiment of the present invention, rather than whole embodiments, based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment being obtained under the premise of not making creative work, broadly falls into the scope of protection of the invention.
Embodiment 1, as shown in Figure 1:
It is assumed that there being seven nodes of A, B, C, D, E, F, G in certain relational network, between wherein AC, between CD, between DE, EF it Between, related contact person each other between FG, between GB.For A, C, D, E respectively a jump of A, double bounce, three jump contact persons (in figure is marked with the number, similarly hereinafter), in the same manner for B, G, F, E respectively a jump of B, double bounce, three jump contact persons.
Now with A, B two node as research object (in figure green mark, similarly hereinafter), find public within three jumps between A, B Contact person, then only E node meets condition (the blue mark of in figure, similarly hereinafter), because the distance between A, B to E are all in three jumps Interior (containing three jumps), other put equal condition is not satisfied (such as B is four jumps apart from D).And the shortest internuncial pathway between A, B to E It is respectively A->C->D->E and B->G->F->E.
Embodiment 2, as shown in Figure 2:
If removing now G node, then finding the public contact person within three jumps between A, B, now meeting the public affairs of condition Contact person has D, E node altogether.A to D is double bounce, and shortest path is A->C->D, B to D are three jumps, and shortest path is B->F->E-> D;In the same manner, A to E is three jumps, and shortest path is A->C->D->E, B to E are double bounce, and shortest path is B->F->E.
Embodiment 3, as shown in figs. 34:
Goals research point is 3 points of A, B, C, is no longer simple single-line link between points, but multi-thread staggered, than The institute having included this micronet in figure as contact person in double bounce for the F node a little, this also exactly real social networks close One distinct epitome of system represents.As " Six Degrees " theory thinks, " you are spaced and any one stranger between Not over six, that is to say, that at most passing through six people, you just can recognize any one stranger to people ".
In this example, in a jump between A, B, C, public contact is artificially empty (as shown in the left diagram), and in double bounce, public contact person has 2 points of E, F (as shown at right).Although the beeline of A, B to D is within double bounce, the beeline of C to D is three jumps, institute Condition cannot be met with D node.Here it is worth noting that the shortest internuncial pathway in B to F double bounce has two, it is B- respectively> D->F and B->A->F.
Embodiment 4, as shown in figs. 34:
The storage scheme of the present invention is:It is assumed that ID uses the MD5 value of 32 bytes to represent, then the direct pass of each user Connection relation at least needs 100*32B about 32KB memory space, and 100,000,000 users then at least need 100,000,000 * 3.2KB about 320GB storages empty Between.If considering, backup etc. consumes, and total memory space is in T rank.
HBase columnar database is taken, with the storage of Key-Value form, rowkey is ID in this programme, Qualifier and value is the corresponding direct contacts list of this user.HBase management T rank data has no pressure, with Rowkey extraction data is very quick, simultaneously can be with Map Reduce distributed computing framework seamless combination.May in HBase There are following a few row data:
Table 1 HBase stores direct contact person's example
Embodiment 5, as shown in figs. 34:
The calculating process of the present invention is:
1. original state
A newly-built empty text, name does not limit, and with " .txt " suffix, content is as shown in the table:
Table 2 original state
A 0 1 -1
B 1 0 -1
C -1 -1 0
Comprise three row data, first row is destination node colony, and such as we will find node A, B, C tri- in this scenario Public contact person in person's double bounce, after three leus time be beeline to tri- nodes of node A, B, C, if destination node Colony is 5, then corresponding here is 5 row.Separator between every data line can freely define.Such as, the first row A01-1 The beeline of expression node A to node A, B, C is 0,1, -1 successively.Wherein:
1) 0 represents the beeline between node oneself and oneself, such as A and A, B and B, C and C;
2) 1 represents direct correlation, such as A and B, B and A between node;
3) -1 represent between node distance unknown in other words temporarily never up between such as A and C.
Original state record is direct correlation between destination node colony, is stored in literary composition in the form of distance matrix Among part, as the input of follow-up MapReduce calculating.For 0,1, -1 such one group of data, subsequently it is collectively referred to as certain section To the distance vector of destination node colony, the referred to as distance vector of certain node, this node of each element representation arrives point The distance of destination node, the such as distance vector of node A are (0,1, -1).
If shortest path will be calculated simultaneously, can record path after element in initial text.As Shown in following table:
Table 3 original state (comprising path)
A 0 1(A->B) -1
B 1(B->) 0 -1
C -1 -1 0
Node B is the direct contact person of node A, so the shortest path of node A to node B is A->B, in the same manner node B arrive Shortest path between node A is B->A.
2. the first jump Mapper
There is the input of original state, the calculating of public contact person in the first jump and the shortest internuncial pathway can be carried out ?.Specific practice is as shown in the table:
Mapper state jumped by table 4 first
As shown in table 4, from HBase, in the Mapper stage, obtain the direct contact person of A, B, C node, with Section five " principle of skipping a grade " of description, calculates these direct contact persons successively to the distance of A, B, C node.Need exist for paying special attention to It is that -1 represents apart from unknown, so when skipping a grade, -1 still keeps constant.
Such as, the direct contact person of A has 3 points of E, F, B, and the distance vector due to A is (0,1, -1), then according to skipping a grade Principle Ea, Fa, the distance vector of Ba are all (1,2, -1).In the same manner, the direct contact person of B has A, D, then their distance vector is equal It is (2,1, -1).In the same manner, node C obtains the distance vector (- 1, -1,1) of Ec.
As shown in the table, in the process of Mapper state computation shortest path.
Mapper state (comprising path) jumped by table 5 first
The rule calculating shortest path with " principle of skipping a grade " is fairly simple:Add because jumping before path before skipping a grade The node that level is brought.Such as, node E is the direct contact person of A, and from node A is skipped a grade and can be brought node E (in figure is expressed as Ea), the path of A to B is A->B, then the path of E to B is E->A->B.Note special circumstances:If the path before skipping a grade Distance is -1, and the distance after skipping a grade is also -1, and corresponding path also is indicated as -1.
3. the first jump Reducer
As shown in table 6, simple utilization " principle of skipping a grade ", not only has more these new sections of D, E, F in distance matrix now Point record, also has more the situation of original node simultaneously.Such as, the distance vector of node A existing (0,1, -1), also have (2,1, - 1).For ease of distinguishing, the distance vector identical colour code of the same node of in figure.At this moment need to use Section of five description " combination principle ".
The output key in Mapper stage is nodename, and value is all distance vectors of this node.According to Map Reduce principle, the Reducer stage will get all distance vectors according to nodename polymerization.As can be seen that dexterously Using the operating mechanism of Map Reduce, greatly simplify our handling process.
The treatment effect of combination principle is as shown in the table:
Reducer state jumped by table 6 first
With " combination principle ", the comparative result between non-negative distance takes minimum, apart from unknown (i.e. -1) with apart from The comparative result known if being all -1, takes -1 known to taking.As shown in table 7 below, in Reducer state computation shortest path Process.
Reducer state (comprising path) jumped by table 7 first
The rule calculating shortest path with " combination principle " is as follows:
1) comparison between non-negative distance, takes the minimum path of distance, and such as the distance between A and Ab to A is respectively 0 He 2, result takes 0;
2) if the minimum of a value of non-negative distance have multiple, calculate path when preserve not going the same way in all minimum ranges Footpath;
3) apart from unknown and between known comparison, take apart from known path, such as between Ea and Ec to A away from From respectively 1 and -1, result takes E->A;
4) if distance is all -1, then path is also -1.
4. the second jump Map Reducer
According to the derivation description of first two steps, can very easily make the result of the second jump.By first time Map Reduce The distance matrix file of output (notices that this matrix does not have the relation between strict row and row it is possible to be distributed to multiple literary compositions Part stores) as second Map Reduce input, then repeat Mapper (principle of skipping a grade) and Reducer (combination principle) Process, can obtain public contact person and its internuncial pathway in double bounce.Shown in table 8-9 specific as follows:
Mapper and Reducer state jumped by table 8 second
Mapper and Reducer state (comprising path) jumped by table 9 second
5. K jumps Map Reduce
In the same manner, using the distance matrix file of the K-1 time MapReduce output as kth Map Reduce input, Then repeat Mapper (principle of skipping a grade) and Reducer (combination principle) process, the public contact person in K jump and its contact can be obtained Path.
6. result output
Observation table 8 the right it is found that in the relational network of complexity distance vector in double bounce for all nodes all calculate Go out, condition is met according to the node that " termination principle " only has distance no more than K (K=2 herein).Specific rules are as follows:
If finding the public contact person in K jumps, after K MapReduce calculates, in distance matrix, meet condition Node be Ni (D1, D2, D3 ..., Dm), wherein 0<Dj(0<j<=m)<=K.
All nodes meeting above rule are the public contact person in K jump, the path of each element in distance vector It is public contact person to the shortest internuncial pathway of destination node.
Embodiment 6, as shown in Fig. 5~6:
The validity of the method proposing for checking this patent, we simulate one and have with microblog users as background 40000000 nodes, per node on average generate the catenet on 200 sides at random, then do Contrast on effect with two methods.
First method, is named as " MR stream ", the computational methods based on MapReduce that is, this patent proposes, and starts many Secondary MapReduce process, the number of starts is equal to jumping figure K.Second method, is named as " multimachine multithreading ", by each two node Between public contact person calculate and distribute to an independent thread, combination of two destination node needs multiple threads, by these Thread is distributed to multiple stage machine, to reduce unit pressure, finally merges result and takes common factor.The machine that two methods are run is joined Put, quantity etc. is consistent.
Experiment one:Only calculate public contact person
Destination node colony is set as 2,4,8,16,32 successively, finds them public in three jump Contact person.
Can draw, when destination node colony is within the scope of general 10, the method for multimachine multithreading is better than the side of MR stream Method, increases with node number, and the method advantage of MR stream displays gradually, and increasing degree is substantially than the side of multimachine multithreading Method is low.
Experiment two:Calculate public contact person and path
Can draw, after adding path computing, the scheme of MR stream is better than the scheme of multimachine multithreading always, and MR stream Time loss increase basicly stable with node number, and the time loss of multimachine multithreading is persistently significantly increased.This enters one Step illustrates the validity of method proposed by the present invention.
The above, the only specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, and any Those familiar with the art, in the technical scope of present disclosure, can readily occur in change or replacement, all should contain Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be defined by scope of the claims.

Claims (9)

1. a kind of catenet in figure based on Map Reduce finds the computational methods in public contact person and its path, its feature It is:Described computational methods pass through Map Reduce operating mechanism, in conjunction with principle of skipping a grade (Mapper) and combination principle (Reducer), set up and find all public contact in some jumping figures for the target group in catenet figure or social networks People and its distributed computing framework of the shortest internuncial pathway, calculate described target group the shortest between described public contact person Path, is recorded and stored in database;
Described computational methods include step is calculated as below:
(1) original state:The newly-built one any name text with " .txt " as suffix, its content comprises three row data, Wherein first column data be target group's node colony, after each column data be target group's node between beeline, institute The direct correlation stated between target group is recorded in described text;
(2) first jump Mapper:Obtain the direct contact person of each target group's node from HBase, and it is described straight to calculate these Meet contact person the distance between to described target group's node;If certain node is to meet one of public contact person that K jumps, should The direct contact of node is to meet the public contact person that K+1 jumps per capita;
(3) first jump Reducer:Using " combination principle ", if there are multiple different distances in derivation to certain node Record, it should be merged, takes minimum of a value;
Specifically rule is:Comparison between non-negative distance, takes the minimum path of distance;If the minimum of a value of non-negative distance has many Individual, then calculate the different paths preserving in all minimum ranges during path;Apart from unknown and between known comparison, take away from From known path;If distance is all -1, then path is also -1;
(4) second jump Map Reduce:Using the distance matrix file of first time Map Reduce output as second Map The input of Reduce, then repeats described Mapper (principle of skipping a grade) and described Reduce (combination principle) process, obtains double bounce Interior public contact person and its internuncial pathway;
(5) K jump Map Reduce:Using the described distance matrix file of the K-1 time Map Reduce output as kth Map The input of Reduce, then repeats Mapper (principle of skipping a grade) and Reducer (combination principle) process, obtains public in K jump Contact person and its internuncial pathway;
(6) result output:According to " termination principle ", after action of repeatedly skipping a grade, finally can become destination node colony in K The node of public contact person in jump, necessarily meets a condition, that is, the distance arriving destination node colony all within 0-K, only The distance no more than node of K meets condition, and specific rules are:If finding the public contact person in K jumps, through K Map After Reduce calculates, the node meeting condition in distance matrix is Ni(D1,D2,D3,…,Dm), wherein 0<Dj(0<j<=m)< =K;All public contact persons meeting in the as K jump of this regular node, in distance vector, the path of each element is public affairs Contact person is to the shortest internuncial pathway of destination node altogether.
2. computational methods as claimed in claim 1 it is characterised in that:If described target group are doing of being arbitrarily designated;Right In any one of public contact person, described computational methods are used for calculating the arbitrary node of described target group to described public Shortest path between contact person.
3. computational methods as claimed in claim 1 it is characterised in that:The nodes of described catenet figure reach hundred million grades with On, and the external side of per node on average can be connected with nonoriented edge between node and node more than hundred ranks.
4. computational methods as claimed in claim 1 it is characterised in that:Between the node of described target group for single line connect or Multi-thread it is cross-linked relation.
5. the computational methods as described in claim 1 or 4 it is characterised in that:Directly mutual between described target group's node Relation, is stored hereof in the form of distance matrix, and the input calculating as follow-up Map Reduce.
6. computational methods as claimed in claim 5 it is characterised in that:Described database total storage capacity T rank or T rank with On.
7. the computational methods as described in claim 1 or 6 it is characterised in that:Described database adopts HBase columnar database, With the storage of Key-Value form, data is extracted as ID using row key, described ID is distributed with Map Reduce Computational frame seamless combination.
8. computational methods as claimed in claim 1 it is characterised in that:Described Map Reduce operating mechanism is a kind of distributed Batch data Computational frame, for the programming model of the concurrent operation of large-scale dataset.
9. computational methods as claimed in claim 1 it is characterised in that:Any two section in the artificial network chart of described public contact Point N1 and N2 K jump within all up to node.
CN201610777592.7A 2016-08-30 2016-08-30 Calculation method for searching public contact and path thereof in large-scale network graph based on Map Reduce Pending CN106447503A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610777592.7A CN106447503A (en) 2016-08-30 2016-08-30 Calculation method for searching public contact and path thereof in large-scale network graph based on Map Reduce

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610777592.7A CN106447503A (en) 2016-08-30 2016-08-30 Calculation method for searching public contact and path thereof in large-scale network graph based on Map Reduce

Publications (1)

Publication Number Publication Date
CN106447503A true CN106447503A (en) 2017-02-22

Family

ID=58090314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610777592.7A Pending CN106447503A (en) 2016-08-30 2016-08-30 Calculation method for searching public contact and path thereof in large-scale network graph based on Map Reduce

Country Status (1)

Country Link
CN (1) CN106447503A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110264649A1 (en) * 2008-04-28 2011-10-27 Ruey-Lung Hsiao Adaptive Knowledge Platform
US20150286702A1 (en) * 2014-04-08 2015-10-08 International Business Machines Corporation Adaptive variable selection for data clustering
CN105393265A (en) * 2013-07-12 2016-03-09 微软技术许可有限责任公司 Active featuring in computer-human interactive learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110264649A1 (en) * 2008-04-28 2011-10-27 Ruey-Lung Hsiao Adaptive Knowledge Platform
CN105393265A (en) * 2013-07-12 2016-03-09 微软技术许可有限责任公司 Active featuring in computer-human interactive learning
US20150286702A1 (en) * 2014-04-08 2015-10-08 International Business Machines Corporation Adaptive variable selection for data clustering

Similar Documents

Publication Publication Date Title
CN113254669B (en) Knowledge graph-based power distribution network CIM model information completion method and system
CN104158840B (en) A kind of method of Distributed Calculation node of graph similarity
CN111754345A (en) A Bitcoin Address Classification Method Based on Improved Random Forest
Díaz-Morales Cross-device tracking: Matching devices and cookies
Du et al. Structural balance in fully signed networks
CN110347881A (en) A kind of group&#39;s discovery method for recalling figure insertion based on path
US20180314951A1 (en) Reasoning system, reasoning method, and recording medium
CN102508874A (en) A method for generating navigation learning paths on knowledge maps
Csar et al. Winner determination in huge elections with mapreduce
CN117408298B (en) Information propagation prediction method based on prototype perception dual-channel graph neural network
RU2017102903A (en) SYSTEM AND METHOD FOR IDENTIFICATION OF RELEVANT INFORMATION FOR THE ENTERPRISE
CN113111193A (en) Data processing method and device of knowledge graph
CN111159420A (en) An Entity Optimization Method Based on Attribute Calculation and Knowledge Template
CN103744933A (en) Community discovery method based on parallelization modularity optimization
Guo et al. Feature selection based on Rough set and modified genetic algorithm for intrusion detection
Xuan et al. Node matching between complex networks
CN109492677A (en) Time-varying network link prediction method based on bayesian theory
Potter et al. Querying distributed RDF graphs: the effects of partitioning
CN106447503A (en) Calculation method for searching public contact and path thereof in large-scale network graph based on Map Reduce
Fang et al. Signed network label propagation algorithm with structural balance degree for community detection
Senapati et al. A method for scalable first-order rule learning on Twitter data
CN112579831B (en) Network community discovery method, device and storage medium based on SimRank global matrix smooth convergence
CN111967671B (en) Cross-border active user identification method and device based on support vector data domain description
CN109299849A (en) A Calculation Method of Group Needs Hierarchy in Social Networks
CN116975703A (en) Data processing method, apparatus, electronic device and computer program product

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned

Effective date of abandoning: 20200721

AD01 Patent right deemed abandoned