CN113392332A

CN113392332A - Simplified visual analysis method for large-scale multi-element network data

Info

Publication number: CN113392332A
Application number: CN202110535193.0A
Authority: CN
Inventors: 周志光; 张汝敏; 胡淼鑫; 刘玉华; 王毅刚
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-05-17
Filing date: 2021-05-17
Publication date: 2021-09-14
Anticipated expiration: 2041-05-17
Also published as: CN113392332B

Abstract

The invention discloses a simplified visual analysis method for large-scale multivariate network data. The method of the invention firstly builds an attribute-enhanced network representation learning model based on the large-scale data of the original network, and converts nodes into high-dimensional vector representations embedded in topology structure and attribute information; and then uses the attribute-enhanced network representation learning model to construct multi-level clustering In the vectorized space, nodes are divided into hierarchical categories according to structural tightness, attribute homogeneity and number of clusters; finally, a simplified expression visual analysis scheme is designed to build a simplified visual analysis system for large-scale multivariate network data; The simplified visual analysis system described above constitutes a visual representation through clustering views and collaborative views. The method of the invention performs visual simplification, exploration and clustering on large-scale multivariate network data, effectively reduces visual confusion, and improves the readability and analysis efficiency of the large-scale multivariate network.

Description

Simplified visual analysis method for large-scale multi-element network data

Technical Field

The invention belongs to the technical field of graphics and visualization, and particularly relates to a simplified visual analysis method for large-scale multivariate network data.

Background

In a multi-element network, nodes and edges contain rich attribute information, such as attributes including name, age, gender, nationality, region, occupation, and the like. In the field of visualization research, experts have designed various visualization methods for such multivariate network data, mapping attribute information by modifying the visual appearance (size, color, shape) of nodes on the force guidance graph, or combining the force guidance graph with auxiliary views (e.g., tables, parallel coordinates, radar charts) to cooperatively analyze topology and multidimensional attribute information. However, as the scale of multivariate networks continues to increase, simultaneous analysis of topology and properties becomes a challenge. In force-guided maps, visual clutter largely hinders the visual perception of topology. Clustering is a more flexible strategy that allows users to classify nodes into different categories based on node attributes or network topology. Topological properties of intra-class and inter-class graphs are defined as by Batagelj et al (V.Batagelj, F.Brandenburg, W.Didimo, G.Liotta, P.Palladino, and M.Patrigani.visual analysis of large graphs using (x, y) -clustering and hybrid visualizations.IEEE transactions on visualization and computer graphics,17:1587-1598, 122010. doi:10.1109/TVCG.2010.265), allowing users to interactively explore graphs by expanding/contracting clusters. OnonionGraph (L.Shi, Q.Liao, H.Tong, Y.Hu, Y.ZHao, and C.Lin.Hierarchical focus + context semantic network visualization. pp.89-96, 032014. doi:10.1109/Pacific Vis.2014.44) takes into account attributes and topology, clusters heterogeneous network nodes to different levels, and the top-down hierarchy provides more semantic information.

Clustering of multivariate networks is divided into two categories, topology-based clustering (J.Abello, F.ham, and N.Krishan.ask-graph: A large scale graph visualization system. ie trans vis compilation. IEEE transactions on visualization and computer graphics,12: 669-676, 092006. doi:10.1109/TVCG.2006.120) and attribute-based clustering (M.Wattenberg.visual interpretation of multivariate graphs.811-819, 012. doi: 10.1145/1124772.1124891). However, in the topology-based clustering method, the attributes are randomly distributed, which makes it difficult to explain the formation of clusters. In the attribute clustering-based method, the topology of the network is severely damaged, which makes the relationships between the nodes difficult to explore and analyze. Therefore, integrating topology and attributes into the visualization clusters of large-scale multivariate networks remains a challenging task.

In addition, the network characterization learning method can handle large networks with millions of nodes and edges by preserving the context features to project the network nodes into the vector space. Among them, learning algorithms based on random walk are simple to implement, easy to expand and parallelize, and various methods have been proposed, such as deep walk (b.perozzi, r.al-Rfou, and s.skiiana.deep walk: Online learning of social representation.pp.701-710,2014) and node2vec (a.grover and j.leskovic.node 2vec: Scalable learning for networks.vol.2016, pp.855-864, 072016. doi:10.1145/2939672.2939754), which both well maintain the high-order proximity between nodes.

Disclosure of Invention

The invention aims to provide a novel clustering-based large-scale multivariate network data simplified visual analysis method aiming at the problems in the background art, the use of attribute information in network characterization learning is emphatically researched, and the large-scale multivariate network is explored through the network characterization learning.

In order to achieve the purpose, the invention adopts the following technical scheme:

step (1) constructing an attribute-enhanced network representation learning model based on original network large-scale data, and converting nodes into high-dimensional vector representation embedded with a topological structure and attribute information;

constructing a multi-level clustering model by using an attribute-enhanced network representation learning model, and dividing nodes into hierarchical categories according to structure compactness, attribute homogeneity and clustering quantity in a vectorization space;

designing a simplified expression visual analysis scheme, and constructing a simplified visual analysis system of large-scale multi-element network data; the simplified visual analysis system forms visual expression through a cluster view and a collaborative view.

Further, the step (1) is specifically:

(1-1) constructing a corpus based on attribute similarity:

quantifying the attribute similarity between nodes, and if the adjacent nodes have the same category, determining the similarity sim (v)_i,v_i,k′) Set to 1, otherwise set to 0; v. of_i,k′For wandering an initial node v_iK 'th neighbor of (1, K), K' being v_iThe number of neighboring nodes of (2); probability of jumping

By node v_iNode v of the front w_i-wAs a starting point, v_i-wMigration rw (v) based on attribute similarity_i-w)＝(v_i-w,…,v_i-1,v_i,v_i+1,…,v_i+w) Uniformly sampling in the neighbor of the last node in the current walking path until the maximum walking step length L is reached, wherein L is 2w + 1;

and generating a wandering path by taking other nodes as initial nodes in the same way, wherein all the nodes are taken as the initial nodes to carry out the wandering based on the attribute similarity.

(1-2) repeatedly traversing the nodes by a fixed number of times T, generating a walking path starting from each node by a fixed walking step length L, and creating a corpus consisting of all walking paths of each attribute;

forming a compound corpus minize from corpora generated by multiple attributes_φ(-logPr({v_i-w,…,v_i+w}\v_i|φ(v_i) ()) as input to the Skip-Gram model; wherein phi (v)_i) Denotes v_iVector, Pr ({ v)_i-w,…,v_i+w}\v_i|φ(v_i) Represents node v)_iContext occurrence v_i-wTo v_i+wThe probability of (d); the Skip-Gram model enables the resulting vector to be maximized from v_iPresume its context v_i-wTo v_i+wAnd obtaining high-dimensional vector representation of the nodes.

Further, the step (2) is specifically:

(2-1) projecting the obtained high-dimensional vector of the node to a two-dimensional space, applying a k mean value to the projection vector to perform k-means clustering on the node, and generating a hierarchical clustering structure by using a hierarchical clustering method, namely performing iterative merging on paired clusters until one cluster is left;

(2-2) guiding the cutting of the hierarchy using three metrics, respectively: structure compactness, attribute homogeneity and clustering quantity;

setting a multi-objective function E (C) w₁E_sc(C)+w₂E_ah(C)+w₃E_str(C) (ii) a Wherein a set of clusters C ═ { C ] generated from the multivariate network₁,C₂,…,C_kIs part of a hierarchy with a set of leaf nodes, C_k″Set k "leaf nodes, k ═ 1,2, …, k; w is a₁、w₂、w₃To adjust the parameters, to balance the metrics;

compactness function E of the structure_sc(C) 1-dense (c); wherein the cluster density

Representing the structural compactness of clusters derived from C; the larger the clustering density is, the smaller the structural compactness is;

attribute homogeneity function

Wherein N is the number of attributes,

is C_mProperty a of_xThe lower the information entropy is, the more uniform the node attribute of the cluster is; the smaller the average entropy value, the higher the probability that a cluster becomes a leaf node;

number of clusters

Representing a behavior of punishing to divide the multi-element nodes into a plurality of small clusters; wherein, dis (C)_xRoot) represents C in the hierarchy_xAnd the distance between the initial point root,

is represented by the formula_xAverage distance between leaf nodes of its descendants, p being C_xOne of all descendant leaf nodes in the tree.

Further, in the step (3), the clustering view provides simplified expression of the multivariate graph by using a glyph embedded network graph, leaf nodes are arranged by using a force guide model, the clustering of each leaf node is visualized by using the glyph, and nodes with similar attributes are gathered into a cluster;

the collaborative view is used for providing a group of collaborative views to help a user to intuitively explore the clustering of the multivariate graph, and a high-dimensional vector is projected to a two-dimensional space by utilizing a traditional dimension reduction model t-SNE; for each node, finding adjacent nodes in the surrounding radius of the node, then calculating the information entropy of the nodes on a plurality of attributes, taking the reciprocal of the entropy as the weight of the node, and participating in the calculation of the density of the kernel, thereby determining the final drawing of the heat map; the projection view is interactively connected with the tree graph so as to explore clusters on different levels; and visualizing the process of changing the multi-objective function value along with the interaction of the user by adopting the river graph at the top.

Compared with the prior art, the invention has the beneficial effects that: the invention provides a network representation learning model with enhanced attributes, which extracts topology information and attribute information simultaneously and represents nodes as vectors for retaining topology connection and multi-attribute similarity; defining a multi-target energy function considering structure compactness, attribute homogeneity and clustering quantity to obtain an optimal cut layer of a hierarchical structure, so as to generate multi-level clusters for a user to explore a multi-element network in a hierarchical mode; a multivariate network simplified visual analysis system is developed, and attribute-enhanced network representation learning, hierarchical clustering and cluster-based simplification are visually expressed, so that a user can conveniently evaluate node clustering and explore semantic features of an original multivariate network.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic diagram of an attribute enhanced network characterization learning model;

FIG. 3 is a diagram of a multi-level clustering model;

FIG. 4-1 is a diagram of clustering glyph design of a multi-attribute model;

FIG. 4-2 is a diagram of clustering glyph design of a single attribute mode;

FIG. 5-1 is a schematic diagram of the evaluation of the clustering quality of the patent network in terms of clustering density;

FIG. 5-2 is a schematic diagram of the evaluation of the clustering quality of the patent network in the aspect of information entropy;

FIG. 6-1 is a schematic diagram of Vis network clustering quality evaluation in terms of clustering density;

FIG. 6-2 is a schematic diagram of Vis network cluster quality evaluation in the aspect of information entropy.

Detailed Description

The simplified visual analysis method for large-scale multivariate network data of the present invention is further described below with reference to the accompanying drawings and examples.

The flow of this embodiment is shown in fig. 1, and includes the following steps:

step (1) constructing an attribute-enhanced network representation learning model based on original network large-scale data, and converting nodes into high-dimensional vector representation embedded with a topological structure and attribute information; the method comprises the following steps:

(1-1) constructing a corpus based on attribute similarity:

by increasing the probability that nodes with similar attribute values co-occur in the walk, the semantic relationship between the nodes is preserved. Suppose node v_iAs an initial point of walk, it is associated with K nodes v_i,1,v_i,2,…,v_i,KAdjacent to each other. Jump to node v in next round_i,k′Is dependent on v_i,KAnd v_iSimilarity of attributes therebetween.Quantifying the attribute similarity between nodes, and if the adjacent nodes have the same category, determining the similarity sim (v)_i,v_i,k′) Set to 1, otherwise set to 0, K' ∈ (1, K). When considering an attribute, the probability of a jump

By node v_iNode v of the front w_i-wAs a starting point, v_i-wMigration rw (v) based on attribute similarity_i-w)＝(v_i-w,…,v_i-1,v_i,v_i+1,…,v_i+w) Uniformly sampling in the neighbor of the last node in the current walking path until the maximum walking step length L is reached, wherein L is 2w + 1; the wandering paths with other nodes as initial points are also generated in the same manner, wherein all nodes are used as initial points to perform the wandering based on the attribute similarity.

The original method is to simply construct a corpus based on a random walk mode, the method of the invention changes the probability of random jump based on attribute similarity, and the nodes with similar attributes can be jumped to, so that the constructed corpus is more enhanced for the characteristic description of the nodes with similar attributes and structure association.

The (1-2) nodes often have multiple attributes, and the method designs a novel corpus generation strategy, as shown in fig. 2. For each attribute, a walk path is generated by the similarity of attributes between adjacent nodes. The nodes are repeatedly traversed by a fixed number of times T, the walk path starting with each node is generated by a fixed walk step L, thereby creating a corpus consisting of all the walk paths of the attribute, and a corpus of other attributes is created in the same manner.

Forming a compound corpus minize from corpora generated by multiple attributes_φ(-logPr({v_i-w,…,v_i+w}\v_i|φ(v_i) ()) as input to the Skip-Gram model; wherein phi (v)_i) Denotes v_iVector, Pr ({ v)_i-w,…,v_i+w}\v_i|φ(v_i) Is shown in (a)Node v_iContext occurrence v_i-wTo v_i+wThe probability of (c). The Skip-Gram model enables the resulting vector to be maximized from v_iPresume its context v_i-wTo v_i+wAnd obtaining high-dimensional vector representation of the nodes.

Taking two adjacent nodes as an example, the probability of jumping from one node to another is determined according to the similarity of multiple attributes. The more common values they share across multiple attributes, the higher the probability that they will appear in the same wander. When the co-occurrence probability between the two nodes is very large, a pair of obviously similar node vectors is obtained to serve as the network characterization learning model with enhanced attributes.

Constructing a multi-level clustering model by using an attribute-enhanced network representation learning model, and dividing nodes into hierarchical categories according to structure compactness, attribute homogeneity and clustering quantity in a vectorization space; the method comprises the following steps:

(2-1) projecting the obtained high-dimensional vector of the node to a two-dimensional space, applying k-means to the projection vector, and performing k-means clustering on the node (in the present embodiment, initially designated k is 60); and further generating a hierarchical clustering structure by using a hierarchical clustering method, namely performing iterative combination on paired clusters until one cluster is remained.

(2-2) in order to exhibit a meaningful multivariate network visualization effect, appropriate cutting of the hierarchical structure is required. In this method, three metrics are used to guide the slicing of the hierarchy, respectively: compactness of structure, homogeneity of attributes, and number of clusters.

Setting a multi-objective function E (C) w₁E_sc(C)+w₂E_ah(C)+w₃E_str(C) (ii) a Wherein a set of clusters C ═ { C ] generated from the multivariate network₁,C₂,…,C_kIs part of a hierarchy with a set of leaf nodes, C_k″Set k "is the k" th leaf node, k ═ 1,2, …, k. w is a₁、w₂、w₃To adjust the parameters, the metrics are balanced.

Compactness function E of the structure_sc(C)＝1-density(C)；

Wherein the cluster density

Representing the structural compactness of clusters derived from C; the larger the clustering density, the smaller the structural compactness.

Attribute homogeneity function

Wherein N is the number of attributes,

is C_mProperty a of_xThe lower the information entropy, the more uniform the node attributes of the cluster. Therefore, the smaller the average entropy value, the higher the probability that the cluster becomes a leaf node.

Number of clusters

In this embodiment, the tuning parameter w is initially specified₁＝1.0、w₂＝5.0、w₃0.1. To minimize the multi-objective function, a minimal cut algorithm is applied to generate the required cut levels for the hierarchy, where the layout of the leaf nodes can better weight the metrics. Fig. 3 is a schematic diagram of a multi-level clustering model, which shows an optimal cut layer of a hierarchical clustering structure, and reconstructs clusters by an optimal segmentation method of a minimum multi-objective function, so that nodes in the same cluster are closely connected and have similar attributes. Through the steps, the optimal cutting layer in the multi-level clustering is obtained.

Designing a simplified expression visual analysis scheme, and constructing a simplified visual analysis system of large-scale multi-element network data; the simplified visual analysis system forms visual expression through clustering view and collaborative view; the method comprises the following steps:

(3-1) clustering view: a glyph (glyph) embedded network graph is employed to provide a simplified representation of the multivariate graph. First, leaf nodes are laid out using a force-guided model. The clustering of each leaf node is visualized by using a glyph which consists of an inner circle and an outer circle. In particular, the topological pattern is represented by an inner circle, where the radius encodes the number of nodes inside the cluster. The lower left corner shading represents the clustering coefficient measured in 2h/y (y-1), where h and y are the number of nodes and edges in the cluster. The outer ring side focuses on the distribution characteristics conveying various attributes. As described above, nodes with similar attributes are aggregated into a cluster, where only a subset of the attributes play a significant role. For these isomorphic nodes in a cluster, the clustering effect of each attribute is important to help the user identify important attributes, which can be measured by the entropy of the information. Thus, the outer circle is divided into a plurality of fan-shaped circles having equal center angles according to the user-specified attribute set. Each sector ring is colored with a unique color, the height of which encodes the information entropy value on this attribute. FIG. 4-1 is a design diagram of clustering glyph of the multi-attribute mode: the category colors encode different attributes, while the outer sector radius encodes the inverse of the entropy value; the larger the height, the lower the information entropy value, and the stronger the aggregation effect. If the user specifies only one attribute, the fan-shaped ring conveys a detailed distribution of attribute values to compare the aggregate effect of this attribute in the cluster. In this case, the height of each sector ring would encode the number of nodes having corresponding attribute values, as shown in FIG. 4-2.

(3-2) collaborative view: providing a set of collaborative views helps users intuitively explore the clustering of the multivariate graph. Wherein the projection view displays vectors obtained from the attribute enhanced network representation learning model. And projecting the high-dimensional vector to a two-dimensional space by using a traditional dimension reduction model t-SNE. Due to the advantages of t-SNE, nodes with similar structure and attribute information are distributed very closely in the vectorization space. Through visual perception, a user can easily perceive the similarity between nodes. A thermodynamic diagram is also provided to enhance the representation of the homogeneity of the attributes. For each node, its neighboring nodes are found within its surrounding radius, and then the information entropy of these nodes over multiple attributes is calculated. The reciprocal of the entropy value is used as the weight of the point and participates in the calculation of the kernel density, thereby determining the final drawing of the heat map. In addition, the projection view is also connected with the tree graph through interaction so as to explore the clusters on different levels. The tree graph is designed to visualize the hierarchical clustering structure, where nodes link the tree graph highlighting the nodes in the selected cut level, and the middle nodes represent hierarchical clusters distributed from bottom to top. Leaf nodes link rectangles at the bottom to show the characteristics of the attribute distribution. The regions of different colors within the rectangle encode the information entropy of the multiple attributes. If only one attribute is explored, the area within the rectangle will show the number of attribute values. In addition, a river graph is designed at the top to visualize the process of multi-objective function value change with user interaction. The three-layer distribution represents the structural compactness, property homogeneity and energy variation of the tree structure. The user may adjust the parameters through a set of buttons listed at the top. When a cluster is specified in the tree view, the bounding box of the node contained by the cluster will be highlighted in the projection view. The detailed information of the plurality of attributes in the cluster of interest is displayed in an "attribute view". After the clustering is specified, the corresponding lines are highlighted in the parallel coordinates, so that a user can intuitively perceive important attributes, easily evaluate the clustering effect and deeply understand hidden semantics. To further indicate the proportion of different attribute values on the attribute, bars are distributed on parallel axes, the size of which encodes the number of intersecting lines it contains.

The effectiveness of the invention was evaluated as follows:

the attribute enhanced network characterization learning model is subjected to experimental research and is combined with a clustering and multilevel clustering model (w) generated by other three methods₁＝1.0、w₂＝5.0、w₃0.1) the clusters in the generated optimal cut were compared:

(1) an attribute-based method (Attr-based) that takes multi-element nodes as multi-dimensional vectors, regardless of structural information; (2) deepwalk embeds nodes into a low-dimensional space only considering structural information; (3) attribute Awareness Graph Embedding (AAGE) combines node attributes with graph structures by quantifying the sum of multiple attribute similarities to weight values for edges, and learns implicit expressions for the weighted network nodes. The default parameter values in our method are set as follows: the repetition number T of the random walk for each node is 15, and the length L of the random walk is 20.

For a set of clusters C ═ { C) generated from a multivariate network based on the above method₁,C₂,…,C_kAnd (5) evaluating the clustering quality by using the clustering density (formula 6) and the clustering entropy (formula 7). We first studied different methods in the dataset (patent network) and clustered the nodes using k-means. Wherein the patent network is a large network, wherein the nodes represent patents and the edges represent citations between the patents. And extracting a connected subgraph containing 10,178 patents and 18,809 edges. Each patent has multiple attributes such as year of grant, country, patent category, technology category, and subcategory. The experimental results are as follows: as in fig. 5-1, this approach outperforms AAGE, DeepWalk, and attribute-based approaches on all k, means in terms of cluster density. Meanwhile, entropy values of the graph clusters generated by the method are smaller than those of AAGE. Deepwalk is readily observed to have a maximum entropy value, as shown in FIG. 5-2. This means that the method and AAGE generated graph clusters have more uniform node properties compared to DeepWalk, since they consider both properties and topology. In contrast, the attribute-based approach has the smallest entropy and smallest density value. The results show that for clusters that only consider attributes, the homogeneity of the attributes is well maintained, but the structural compactness is poor.

The different network clustering methods described above were studied using another data set (Vis network) and the results of the experiments are shown in FIGS. 6-1 and 6-2. The Vis network data was abstracted from the ArnetMiner database, which contains 2592 papers from 1990 to 2014 at three major meetings (SciVis, InfoVis and VAST). There are three types of papers, namely C (journal recommendation corpus), M (monograph), and T (meeting record). The publication years are divided into five categories, each of which is in units of five years. There are a total of 8984 links to represent the reference relationships between these papers. It can be seen that the performance of DeepWalk is best when the number of clusters k is 3 (fig. 6-1). The performance of the method of the invention and Deepwalk is similar in the cases where k is 5 and k is 7, both of which have a slight advantage over AAGE. Whatever the value of k, the attribute-based approach has the smallest entropy value and the smallest density value. The process of the invention and AAGE have similar properties in terms of homogeneity of properties (FIG. 6-2). Their entropy values are also significantly smaller than Deepwalk. The result shows that the attribute enhanced network characterization learning method has good clustering result quality and stable performance under the condition of considering both the topological structure and the node attribute information.

Claims

1. A simplified visual analysis method for large-scale multivariate network data, characterized in that the method is specifically:

Step (1) build an attribute-enhanced network representation learning model based on the large-scale data of the original network, and convert nodes into high-dimensional vector representations embedded with topology and attribute information;

Step (2) constructing a multi-level clustering model by using the attribute-enhanced network representation learning model, and dividing the nodes into hierarchical categories in the vectorized space according to structural tightness, attribute homogeneity and number of clusters;

Step (3) Design a simplified visualization analysis scheme, and construct a simplified visualization analysis system for large-scale multivariate network data; the simplified visualization analysis system constitutes a visual expression through a clustering view and a collaborative view.

2. the simplified visual analysis method for large-scale multivariate network data as claimed in claim 1, is characterized in that, step (1) is specifically:

(1-1) Build a corpus based on attribute similarity:

Quantify the attribute similarity between nodes. If the adjacent nodes have the same category, the similarity sim(vi,vi, _k _' ) is set to 1, otherwise it is set to 0; vi _,k' is the k'th adjacent node of the initial node v _i , k'∈(1,K), K is the number of adjacent nodes of v _i ; the jump probability

Taking the node v _iw before the node v _i as the starting point, the v _iw walk based on attribute similarity rw(v _iw )=(v _iw ,...,v _i-1 ,v _i ,v _i+1 ,... ,v _i+w ), sample evenly in the neighbors of the last node in the current walk path, until the maximum walk step size L is reached, L=2w+1;

The walk paths with other nodes as initial nodes are generated in the same way, and all nodes must be used as initial nodes to perform a walk based on attribute similarity;

(1-2) Repeatedly traverse the nodes with a fixed number of times T, generate a walking path starting with each node with a fixed walking step L, and create a corpus composed of all the walking paths of each attribute;

The corpus generated by multiple attributes forms a composite corpus minimize _φ (-logPr({v _iw ,…,v _i+w }\v _i |φ(v _i ))), as the input of the Skip-Gram model; where φ (v _i ) represents the v _i vector, Pr({v _iw ,...,v _i+w }\v _i |φ(vi )) represents the probability that the node v _i context appears from v _iw to v _i _+w ; Skip- The Gram model makes the resulting vector maximize the probability of inferring its context v _iw to v _i+w from v _i , and obtains a high-dimensional vector representation of the node.

3. the simplified visual analysis method for large-scale multivariate network data as claimed in claim 1, is characterized in that, step (2) is specifically:

(2-1) Project the high-dimensional vector of the obtained node into a two-dimensional space, apply k-means to the projection vector, perform k-means clustering on the nodes, and use the hierarchical clustering method to generate a hierarchical clustering structure, that is, pair pairs Iteratively merge the clusters until one cluster remains;

(2-2) Use three metrics to guide the cutting of hierarchical structures, namely: structural tightness, attribute homogeneity, and number of clusters;

Set the multi-objective function E(C)=w ₁ E _sc (C)+w ₂ E _ah (C)+w ₃ E _str (C); wherein, the clustering set C={C ₁ , _C ₂ _, _. _{_} w ₃ is an adjustment parameter used to balance the measurement;

Structural density function E _sc (C)=1-density(C);

Among them, the cluster density

Represents the structural compactness of clusters derived from C; the greater the clustering density, the smaller the structural compactness;

property homogeneity function

where N is the number of attributes,

is the information entropy on the attribute a _x of C _m , the lower the information entropy, the more uniform the node attributes of the cluster; the smaller the average entropy value, the higher the probability that the cluster becomes a leaf node;

Number of clusters

Represents the behavior of penalizing the division of multivariate nodes into multiple small clusters; where dis(C _x ,root) represents the distance between C _x and the initial point root in the hierarchy,

Represents the average distance from C _x and its descendant leaf nodes, and p is one of all descendant leaf nodes in C _x .

4. the simplified visual analysis method oriented to large-scale multivariate network data as claimed in claim 1, is characterized in that: in step (3),

The clustering view uses the glyph-embedded network graph to provide a simplified representation of the multivariate graph, uses the force-guided model to lay out the leaf nodes, and the clustering of each leaf node is visualized using glyphs, which have similar attributes. Nodes are clustered into a cluster;

The collaborative view is to provide a set of collaborative views to help users intuitively explore the clustering of multivariate graphs, and use the traditional dimensionality reduction model t-SNE to project high-dimensional vectors into a two-dimensional space; for each node, the radius around it is Find its adjacent nodes, and then calculate the information entropy of these nodes on multiple attributes, take the reciprocal of the entropy value as the weight of the point, and participate in the calculation of the kernel density to determine the final drawing of the heat map; projection view and tree map Interactions are connected to explore clusters at different levels; a river plot is used on top to visualize how the value of the multi-objective function changes as the user interacts.