Disclosure of Invention
The invention provides a similarity measurement method based on learning, which extracts two different types of information, namely view characteristic information and spatial structure information of a stereo model, so that the description of the stereo model is more comprehensive, the quantification of the similarity is more accurate and scientific, and the details are described as follows:
a learning-based similarity metric method, the method comprising the steps of:
given the views of the stereo model, screening representative views to construct a view-based hypergraph to represent the relationship between stereo objects;
extracting spatial structure circle descriptors from each of the stereoscopic models using the stereoscopic model data, and generating simple model-based maps using distances between each of the stereoscopic models to explore associations between the stereoscopic models;
selecting a proper learning frame, generating initial learning weight, using the generated hypergraph based on the view and the generated graph based on the solid model as the input of the learning frame, and learning the optimal combined weight of the two graphs through a joint learning frame, thereby estimating the correlation between the solid objects through the generated hypergraph based on the view and the graph based on the solid model.
Wherein, the screening of the representative views constructs a hypergraph based on the views to represent the relationship between the three-dimensional objects, specifically:
1) calculating Zernike moments between any two views, selecting the first K views closest to the first view as a representative view of each view cluster, and forming a new view set;
2) and constructing a view-based hypergraph by adopting star unfolding according to the new view set so as to express the relationship between the three-dimensional models.
Further, the air conditioner is provided with a fan,
hierarchically clustering a given view comprising a plurality of views, dividing the views into view clusters,
in the hypergraph, each vertex is an object, each edge is a view cluster, and the weight of each edge is defined according to the similarity between any two views in the view cluster.
Wherein the extracting spatial structure circular descriptors from each of the stereo models, and using the distances between each of the stereo models to generate a simple model-based map to explore the relevance between the stereo models are specifically:
extracting a spatial structure circular descriptor of the three-dimensional model as a three-dimensional model feature, wherein the purpose of the spatial structure circular descriptor is to represent depth information of the surface of the three-dimensional model on a projection minimum bounding box of the 3D model and generate a depth histogram as the feature of the 3D model;
bipartite matching is performed to measure the distance between each two 3D models,i.e. dSSCD(Oi,Oj)。
In a specific implementation, the defining the weight of each edge according to the similarity between any two views in the view cluster specifically includes:
wherein d isSSCD(vi,vj) Is a 3D object OiAnd OjDistance between, σsSet to the median of the distances between all pairs of stereoscopic models.
Further, the step of learning the optimal combination weights of the two graphs through a joint learning framework so as to estimate the correlation between the stereo objects through the view-based hypergraph and the stereo model-based graph is:
1) setting an initial learning frame, namely setting two modes of a view-based hypergraph and a graph model based on a stereo model as the same weight, wherein the retrieval task of the stereo model is to learn the optimal pair-wise object correlation under the two information of the view-based hypergraph and the graph based on the stereo model;
2) learning the combined weight, and further learning the optimal weight of the view information and the stereo model data according to different influences caused by the view information and the stereo model information;
and optimizing and exploring the stereo model-based data and the view-based data simultaneously by using the learned combined weight, and obtaining a vector of the related similarity measurement, wherein the vector is the correlation of all the stereo models in the data set relative to the query model.
Further, the constructing a view-based hypergraph by star unfolding specifically includes:
a hypergraph for constructing a three-dimensional model using star expansion is denoted as GH=(VH,EH,WH) (ii) a Wherein V represents a vertex, E represents an edge, W is the weight of the edge E, and H is a correlation matrix;
hypergraph GHWeight W inH:
Wherein v iscIs a central view of the super edge, vxIs and vcOne of the closest K views, d (v)x,vc) Is vxAnd vcDistance between, σHEmpirically set as the median of the distances between all views;
the correlation matrix H is generated by the following formula:
wherein, h (v)H,eH) Is the data in the correlation matrix H, where the vertex vH∈VHEdge eH∈EH。
Vertex vHThe degree of (d) is defined as:
wherein, w (e)H)∈W。
Edge eHThe degree of (d) is defined as:
in specific implementation, the setting of an initial learning framework, namely setting two modes of a hypergraph based on a view and a graph model based on a three-dimensional model as the same weight specifically comprises the following steps of;
setting an initial learning frame, setting the same weight for the two data, and expressing the learning process by the following objective function:
in this formula, f is the correlation vector to be learned, ΩV(f) Is a regularization term on the view-based hypergraph structure, ΩM(f) Is a regularization term on the graph structure based on a stereo model, r (f) is an empirical penalty, and μ > 0 is a weighting parameter.
Wherein the objective function is further modified to:
in the calculation process, let:
then:
ΔH=I-ΘH
ΔS=I-ΘS
wherein, DeltaHAnd ΔSLaplacian as hypergraph; y is an initial label vector; eta > 0 is a weighting parameter, and H is a correlation matrix; w is a diagonal matrix of edge weights; dvA diagonal matrix which is a vertex degree; deIs a diagonal matrix of edge degrees.
The technical scheme provided by the invention has the beneficial effects that:
1. by extracting the view characteristic information and the spatial structure information of the three-dimensional model, the three-dimensional model is more comprehensively described, and the similarity is more accurate and scientific in quantification;
2. according to the method, when the similarity expression vector is calculated, a network model based on learning is used, so that the obtained weight is ensured to be an optimal solution, and the flexibility and the stability of the similarity measurement are improved;
3. by selecting representative view information to represent the three-dimensional model, the calculation amount is reduced, and the similarity measurement is efficient;
4. the present invention is the first task of co-exploring view-based and stereo-model-based correlations between stereo models in a graph-based framework;
5. the method avoids incomplete extraction of the stereo model information caused by only adopting a stereo model-based or view-based method, and can ensure the scientificity and accuracy of calculating the similarity of the stereo model.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.
At this stage, most approaches employ only model-based or view-based approaches, which may result in incomplete representation of the stereo model information. The present patent proposes joint learning to characterize a stereo model based on view and based on model information. In the view-based section, a representative view is first selected for each object, and then a view distance is calculated. Following reference [9 ]]The method of (1), constructing a view-based hypergraph using a view star extension. In the model part, a spatial structure circular descriptor is extracted[10]And generates a simple model-based map using the pair-wise object distances. Thus, the view information and the model data can be represented by two graphs. Learn the two figuresIn order to estimate the correlation between stereo objects, the weights that the maps occupy in learning can also be optimized. The results of comparison between the method and other methods are provided at the end of the text, and evaluation of three data sets shows excellent three-dimensional object retrieval accuracy.
Example 1
The most important part of the similarity measurement method based on learning is two different relational graphs of a previous extraction model, namely a view-based hypergraph and a model-based graph, aiming at the view of the model and passing through an HAC[11](HAC is one of hierarchical clustering methods, and the basic idea is that a single document is regarded as different classes, then different methods are utilized to combine the different classes, the number of the classes is gradually reduced until a class is finally obtained or the required class number is obtained), a view cluster is constructed, and a view-based hypergraph is established according to the view cluster; for model-specific data, by spatially structuring the circular descriptor SSCD[12](in SSCD, the spatial structure of the 3D model is described by a 2D image, and the attribute value of each pixel represents 3D spatial information, SSCD can preserve the global spatial structure of the 3D model with rotation and scaling invariance.) method extracts the depth histogram of the model and then establishes a model-based graph; and finally, learning according to the two relation graphs to obtain a final similarity quantization vector.
The method provided by the embodiment of the invention is a method for searching by jointly learning the relevance between the view of the three-dimensional model and the three-dimensional model, and the specific implementation steps are as follows:
101: given the views of the stereo model, screening representative views to construct a view-based hypergraph to represent the relationship between stereo objects;
102: extracting the SSCD from each of the stereoscopic models using the stereoscopic model data and using the distance between each of the stereoscopic models to generate a simple model-based map to explore the correlation between the stereoscopic models;
103: selecting a proper learning frame, generating initial learning weight, using the generated hypergraph based on the view and the generated graph based on the solid model as the input of the learning frame, and learning the optimal combined weight of the two graphs through a joint learning frame, thereby estimating the correlation between the solid objects through the generated hypergraph based on the view and the graph based on the solid model.
Wherein, the step 101 of screening the representative views to construct a view-based hypergraph to represent the relationship between the three-dimensional objects specifically includes:
1) performing hierarchical clustering on given views comprising a plurality of view angles, dividing the views into view clusters, firstly calculating Zernike moments between any two views in order to reduce redundant data, and selecting the first K views closest to the views as representative views of each cluster to form a new view set;
2) and constructing a view-based hypergraph by adopting star unfolding according to the new view set so as to express the relationship between the three-dimensional models, wherein each vertex is an object and each edge is a view cluster in the hypergraph. Thus, an edge connects multiple vertices, and the weight of each edge is defined according to the similarity between any two views within the cluster.
The specific steps of generating a simple model-based map using the distance between each three-dimensional model to explore the relationship between the three-dimensional models in step 102 to construct a three-dimensional model-based map are as follows:
extracting a spatial structure circular descriptor SSCD of the stereoscopic model as a stereoscopic model feature (wherein the purpose of SSCD is to generate a depth histogram as a feature of the 3D model on a projection minimum bounding box of the 3D model for representing depth information of a surface of the stereoscopic model), and after extracting SSCD, performing bipartite graph matching to measure a distance between every two 3D models, namely DSSCD(Oi,Oj)。
The relationship between objects is represented by a simple graph structure G ═ (V, E, W), where each vertex in the graph structure G represents an object, i.e., there are n vertices in G. According to two corresponding 3D objects OiAnd OjThe weight of the edge e (i, j) in the graph structure G is calculated as:
wherein d isSSCD(vi,vj) Is OiAnd OjA distance between, and σsSet to the median of the distances between all pairs of stereoscopic models.
The specific steps of learning the optimal combination weights of the two graphs through the joint learning framework in the step 103, and estimating the correlation between the stereo objects through the view-based hypergraph and the stereo model-based graph are as follows:
1) setting an initial learning framework, namely setting two modes of a view-based hypergraph and a three-dimensional model-based graph model as the same weight, and establishing a retrieval task of the three-dimensional model as a class of classification work, wherein the main aim is to learn the optimal pair-wise object correlation under two information of the view-based hypergraph and the three-dimensional model-based graph.
Where, given initial labeled data (in this case, a model of the query), an empirical loss term may be added as a constraint to the learning process.
2) Learning the combined weight, further learning the optimal weight of the view information and the stereo model data according to different influences caused by the view information and the stereo model information, and then adding the combined weight into a learning frame, wherein the target of the learning process consists of three parts, namely a structure regulator, an experience loss regulator and a combined weight regulator based on a hypergraph of the view and a graph based on the stereo model, so as to determine a final learning model;
the contents of the structure adjuster, the experience loss adjuster, and the combination weight adjuster are well known to those skilled in the art, and are not described in detail in the embodiments of the present invention.
By utilizing the learned combination weight, the data based on the stereo model and the data based on the view can be optimized and explored at the same time, and a vector f of the related similarity measurement is obtained, wherein the vector f is the correlation of all the stereo models in the data set relative to the query model, and a larger correlation value represents the high similarity between the stereo model and the query model. The higher the corresponding correlation value, the more similar the two objects are.
In summary, in the embodiment of the present invention, two different types of information, namely view feature information and stereo model spatial structure information, are extracted through the above steps 101 to 103, so that the description of the stereo model is more comprehensive, and the similarity quantization is more accurate and scientific.
Example 2
The scheme in example 1 is further described below with reference to specific calculation formulas, fig. 1 and fig. 2, and is described in detail below:
using O ═ O1,O2,...,OnDenotes n stereoscopic models, and Vi={vi1,vi2,...,vimRepresents a plurality of views of the ith stereoscopic model, from which a representative view is selected, assuming that the selected representative view is Vi={vi1,vi2,...,vimAnd then constructing a hypergraph of the three-dimensional model by adopting star expansion, wherein the hypergraph is represented as GH=(VH,EH,WH) (ii) a Where V represents a vertex, E represents an edge, W is the weight of edge E, and H is the correlation matrix.
Assuming n total stereo modelsrAnd (3) firstly calculating the distance between every two representative views based on Zernike moments, and generating top K nearest views for each representative view, wherein the value of K is set to be 10 in the embodiment of the invention. Calculate the hypergraph G byHWeight W inH:
Wherein v iscIs a central view of the super edge, vxIs and vcOne of the closest K views, d (v)x,vc) Is vxAnd vcDistance between, σHEmpirically set as the median of the distances between all views.
The correlation matrix H may be generated by the following equation:
wherein, h (v)H,eH) Is the data in the correlation matrix H, where the vertex vH∈VHEdge eH∈EH。
Further, vertex vH∈VHVertex vHThe degree of (c) can be defined as:
wherein, w (e)H)∈W。
Further, edge eH∈EHEdge eHThe degree of (c) can be defined as:
in concrete implementation, the vertex degree matrix vHAnd an edge degree matrix eHTwo diagonal matrices D may be usedvAnd DeAnd (4) showing.
In the constructed hypergraph GHWhen two stereo models share more similar views, they may be connected by a higher weight super-concealment, which may indicate a high correlation between these stereo models.
Given the stereo model data of the stereo object, the object relationships based on the stereo model are further explored here. Here, a Spatial Structure Circle Descriptor (SSCD) is used as a feature of the stereoscopic model. The method for generating the model-based map has been described above.
After the view-based hypergraph and the model-based hypergraph are obtained, an initial learning framework is set, the same weight is set for the two data, and the learning process can be expressed by the following formula:
in this formula, f is the correlation vector to be learned, ΩV(f) Is a regularization term on the view-based hypergraph structure, ΩM(f) Is a regularization term on the graph structure based on a stereo model, r (f) is an empirical penalty, and μ > 0 is a weighting parameter.
This objective function aims to minimize the empirical loss values of the stereomodel-based map and the view-based hypergraph, which can generate the optimal correlation vector f for retrieval.
Where the vector f is the relevance of all objects in the dataset with respect to the query object. A larger relevance value represents a high degree of similarity between the object and the query. The higher the corresponding correlation value, the more similar the two objects are. With the generated object correlation vector f, all objects in the data set may be sorted in descending order of the vector f.
Learning the combining weights: note that the view information and the stereo model information may not have the same effect on the object representation. In some cases, the view information may be more important, while in other cases, the stereo model data may play an important role. In this case, the optimal weights of the view information and the stereo model data are further learned.
Let α and β denote the combining weights of the view-based and the stereo-model-based information, respectively, where α + β is 1. Adding/to the combining weights2After the norm, the objective function can be further modified as:
in the calculation process, let:
then:
ΔH=I-ΘH
ΔS=I-ΘS
wherein, DeltaHAnd ΔSLaplacian that can be considered as a hypergraph; y is an initial label vector; eta > 0 is a weighting parameter.
The above alternative optimization can be processed under the optimal vector f value, and thus can be used for solving the object similarity measure. By using the learned combining weights, the exploration of the stereo model-based and view-based data can be optimized simultaneously, and the correlation vector f is obtained.
In summary, the embodiments of the present invention enhance the expressiveness of the stereo model through the above steps, and eliminate the influence of the single feature of the stereo model on the similarity calculation result, so that the accuracy of the stereo model search is improved, the calculation amount is reduced, and the search efficiency is improved.
Example 3
The following examples are presented to demonstrate the feasibility of the embodiments of examples 1 and 2, and are described in detail below:
the database in the embodiment of the invention is based on NTU and PSB[7]To proceed with. The three-dimensional models are drawn by workers through three-dimensional model processing software such as 3DMax or collected from websites with different domain names, the three-dimensional model database has different storage formats including an obj format, an off format and the like, a representative NTU549 database is used in the experimental design, 47 types of three-dimensional models are contained in the database, the PSB database contains 161 types and 1814 types of three-dimensional models in total, and the SHREC database contains 40 types and 800 types of three-dimensional models in total. Some models used in the embodiment of the present invention are shown in fig. 2, and the used three-dimensional model is in an off format.
An example of a stereo model dataset proposed by an embodiment of the present invention is shown in fig. 2, where F-measure considers the top 20 returned results for each query, and ANMRR (average normalized retrieval rank) evaluates ranking performance by considering ranking order. A low ANMRR value indicates the highest accuracy of the returned results. The above-mentioned metrics are used for evaluation, and the algorithm is shown in FIG. 3 to FIG. 5, wherein VMJR is the method of the present invention, in NTU, and,ANMRR value ED of VMJR on SHREC and PSB data sets[13]ERD[14]QVS[15]HL[9]and DC[15]The F-measure has a value higher than those of the methods, thereby highlighting the superiority of the method.
Wherein Precision is Precision, Recall is Recall, and the larger the area enclosed by the Recall curve and the horizontal and vertical coordinates, the better the retrieval performance is represented. Fig. 6 to 8 show that the performance of the method is better under the NTU and PSB databases, and on the NTU, SHREC and PSB data sets, the PR curve of the VMJR method is above the PR curves of the other five methods, and the area around the abscissa and ordinate axes is the largest, thereby verifying the feasibility of the method and meeting various requirements in practical application.
Reference to the literature
[1]J.W.H.Tangelder and R.C.Veltkamp,“A survey of content based 3D shape retrieval methods,”Multimedia Tools and Applications,vol.39,pp.441–471,2008.
[2]Y.Yang,H.Lin,and Y.Zhang,“Content-based 3D model retrieval:A survey,”IEEE Transactions on Systems,Man,and Cybernetics-Part C:Applications and Reviews,vol.37,pp.1081–1035,2007.
[3]K.Lu,Q.Wang,J.Xue,and W.Pan,“3d model retrieval and classification by semi-supervised learning with content-based similarity,”Information Sciences,vol.281,pp.703–713,2014.
[4]K.L¨u,N.He,and J.Xue,“Content-based similarity for 3d model retrieval and classification,”Progress in Natural Science,vol.19,no.4,pp.495–499,2009.
[5]A.E.Johnson and M.Hebert,“Using spin images for efficient object recognition in cluttered3D scenes,”IEEE Transactions on Pattern Analysis and Machine Intelligence,vol.21,no.5,pp.433–449,1999.
[6]R.Osada,T.Funkhouser,B.Chazelle,and D.Dobkin,“Shape distributions,”ACM Transactions on Graphic,vol.21,no.4,pp.807–832,2002.
[7]E.Paquet and M.Rioux,“Nefertiti:A query by content system for threedimensional model and image databases management,”Image Vision Computing,vol.17,pp.157–166,1999.
[8]B.Leng and Z.Xiong,“Modelseek:an effective 3D model retrieval system,”Multimedia Tools and Applications,2010.
[9]Gao Y,Wang M,Tao D,et al.3-D Object Retrieval and Recognition With Hypergraph Analysis[J].Acta Electronica Sinica,2012,21(9):4290-4303.
[10]Gao Y,Dai Q,Zhang N Y.3D model comparison using spatial structure circular descriptor[J].Pattern Recognition,2010,43(3):1142-1151.
[11]Steinbach M,Karypis G,Kumar V.A Comparison of Document Clustering Techniques[C]//2000.
[12]Gao Y,Dai Q,Zhang N Y.3D model comparison using spatial structure circular descriptor[J].Pattern Recognition,2010,43(3):1142-1151.
[13]Shih J L,Lee C H,Wang J T.A new 3D model retrieval approach based on the elevation descriptor[J].Pattern Recognition,2007,40(1):283-295.
[14]Vranic D V.An improvement of rotation invariant 3D-shape based on functions on concentric spheres[C]//Image Processing,2003.ICIP 2003.Proceedings.2003International Conference on.IEEE,2003,3:III-757.
[15]Gao Y,Wang M,Zha Z J,et al.Less is more:Efficient 3-D object retrieval with query view selection[J].IEEE Transactions on Multimedia,2011,13(5):1007-1018.
In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.
Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.