CN114625952B - Information recommendation method and system based on VSM and AMMK-means - Google Patents
Information recommendation method and system based on VSM and AMMK-meansInfo
- Publication number
- CN114625952B CN114625952B CN202011432407.3A CN202011432407A CN114625952B CN 114625952 B CN114625952 B CN 114625952B CN 202011432407 A CN202011432407 A CN 202011432407A CN 114625952 B CN114625952 B CN 114625952B
- Authority
- CN
- China
- Prior art keywords
- information
- user
- item
- browsed
- clustering
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an information recommendation method and system based on VSM and AMMK-means, comprising the steps of obtaining object images of candidate information, substituting the object images of the candidate information into a pre-constructed interest model to obtain the similarity between the candidate information and a user image, and recommending the candidate information with the highest similarity to the user, wherein the interest model is constructed based on the object images of the VSM, the AMMK-means and the information browsed by the user. Because the interest model is constructed based on the VSM, the AMMK-means and the item portrait of the information browsed by the user, which is equivalent to customizing based on the item portrait of the user's interest, the invention avoids the deviation between the information category recommended to the user and the information category actually interested by the user according to the classification standard set by the editor, and improves the recommendation accuracy compared with the traditional collaborative filtering algorithm.
Description
Technical Field
The invention relates to the field of information retrieval, in particular to an information recommendation method and system based on VSM and AMMK-means.
Background
The situation assessment refers to the analysis, reasoning and judgment of multi-source information based on the relation between object assessment understanding battlefield and is used for supporting command layer decision. Because battlefield information has the characteristics of large data volume and multiple data types, each seat (namely a user) has the problem of difficult information selection, and an auxiliary decision-making system is required to recommend interesting information for different seats according to browsing records of each seat and characteristics of the battlefield information.
Recommendation algorithms, typically based on collaborative filtering, make recommendations based on user browsing records or feedback records. In addition to the browse records or feedback records according to users, some studies consider increasing information categories when generating recommended information, but the information categories are manually classified by information editors, and classification standards can only represent edited opinions, thereby causing deviation of recommendation effects.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides an information recommendation method and system based on VSM and AMMK-means.
In a first aspect, an information recommendation method based on VSM and AMMK-means is provided, including:
acquiring an item portrait of each candidate information;
Substituting the item portraits of the candidate information into a pre-constructed interest model to obtain the similarity between the candidate information and the user portraits;
recommending the candidate information with the highest similarity to a user;
the interest model is constructed based on the VSM, AMMK-means and the item representation of the information that the user has browsed.
Preferably, the construction of the interest model includes:
Acquiring an article portrait of information which has been browsed by a user, and characterizing the article portrait of the information which has been browsed by the user by utilizing a VSM;
clustering the object portraits of the information browsed by the user through AMMK-means, and taking a clustering result as an information category of interest of the user;
Respectively calculating the weight of each information category according to the information quantity browsed by the user and the total information quantity browsed by the user contained in each information category;
generating a user representation based on the information categories of interest to the user and the weights of the information categories;
and calculating the similarity between the item portrait of the candidate information and the item portrait in the user portrait.
Further, the clustering of the item portraits of the information browsed by the user through the AMMK-means uses the clustering result as the information category of interest of the user, and the clustering method comprises the following steps:
generating a dataset based on the representation of the item of information that the user has browsed;
determining a clustering center and the number of the clustering centers by using a maximum and minimum distance clustering algorithm for samples in the data set;
Taking the number of the clustering centers as a K value in a K-means algorithm, and taking all the obtained clustering centers as initial clustering centers in the K-means clustering algorithm;
Based on the distance between each sample in the dataset and each initial clustering center, a clustering result is obtained when the set constraint condition is met;
and taking the clustering result as the information category of interest to the user.
Further, the determining the clustering center and the number of the clustering centers for the samples in the dataset by using a maximum and minimum distance clustering algorithm includes:
calculating the average value of the sample attributes, calculating the distance between each sample and the average value, and taking the sample corresponding to the minimum value of the distance as a first clustering center C 1;
Selecting a sample farthest from C 1 as a second cluster center C 2;
Calculating distances D i1 and D i2 from all remaining samples to C 1 and C 2, if D l=max{min(Di1,Di2), i=1, 2,..n }, and D l>θD12, θ is a given value, D 12 is the distance between C 1 and C 2, taking x l as a third cluster center C 3;
if C 3 is present, calculate D j=max{min(Di1,Di2,Di3), i=1, 2,..n, if D j>θD12, establish a fourth cluster center;
and analogically, ending the calculation of searching the cluster centers until the maximum and minimum distances are not more than thetad 12, and obtaining the cluster centers and the number of the cluster centers.
Preferably, the expression of the interest model is as follows:
Vseat=(w1*T1,w2*T2,...,wm*Tm)T
Wherein V seat represents a user image, w m represents a weight of the mth information category, and T m represents a feature vector of the mth information category.
Preferably, the similarity is calculated as follows:
Wherein, the seal is the item portrait in the user portrait, w i is the weight of the information category to which the candidate information d i belongs, T i T is the feature vector of the information category to which the candidate information d i belongs, Is the eigenvector of d i.
In a second aspect, there is provided an information recommendation system based on VSM and AMMK-means, comprising:
The acquisition module is used for acquiring the object portraits of each piece of candidate information;
the similarity calculation module is used for substituting the item portraits of the candidate information into a pre-constructed interest model to obtain the similarity between the candidate information and the user portraits;
the recommending module is used for recommending the candidate information with the highest similarity to the user;
the interest model is constructed based on the VSM, AMMK-means and the item representation of the information that the user has browsed.
Preferably, the system further comprises a building module of an interest model, wherein the building module of the interest model comprises:
The first construction unit is used for acquiring the article portraits of the information which the user has browsed and characterizing the article portraits of the information which the user has browsed by utilizing the VSM;
the information category construction unit is used for clustering the object portraits of the information browsed by the user through AMMK-means, and taking the clustering result as the information category of interest to the user;
a weight calculation unit for calculating the weight of each information category according to the number of information browsed by the user and the total number of information browsed by the user contained in each information category;
A user portrait construction unit, which is used for generating a user portrait based on the information category of interest of the user and the weight of the information category;
and the calculating unit is used for calculating the similarity between the article portrait of the candidate information and the article portrait in the user portrait.
In a third aspect, a storage device is provided, in which a plurality of program codes are stored, the program codes are adapted to be loaded and executed by a processor to perform the information recommendation method based on VSM and AMMK-means according to any of the above technical solutions.
In a fourth aspect, a control device is provided, including a processor and a storage device, the storage device being adapted to store a plurality of program codes, the program codes being adapted to be loaded and executed by the processor to perform the information recommendation method based on VSM and AMMK-means according to any of the above-mentioned aspects.
The technical scheme provided by the invention has at least one or more of the following beneficial effects:
In this embodiment, first, an item representation of each candidate information is acquired; the method comprises the steps of obtaining object images of all candidate information, substituting the object images of all candidate information into a pre-constructed interest model to obtain the similarity between all candidate information and the user image, recommending the candidate information with the highest similarity to a user, and because the interest model in the embodiment is constructed based on VSM, AMMK-means and the object images of the information browsed by the user, the method is equivalent to customizing based on the object images of interest of the user, avoids deviation between information types recommended to the user according to classification standards set by editors and information types actually interested by the user, and improves recommendation accuracy compared with a traditional collaborative filtering algorithm.
Drawings
FIG. 1 is a flow chart illustrating the main steps of a method for information recommendation based on VSM and AMMK-means according to an embodiment of the present invention;
FIG. 2 is a flowchart of the main steps of the modified algorithm AMMK-means according to the embodiment of the present invention;
FIG. 3 is a main block diagram of an information recommendation method based on VSM and AMMK-means according to an embodiment of the present invention.
Detailed Description
For a better understanding of the present invention, reference is made to the following description, drawings and examples.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order to solve the problems of the existing information recommendation model, the embodiment provides an information recommendation method based on VSM and AMMK-means, wherein AMMK-means is used for improving a K-means clustering algorithm based on a maximum and minimum distance clustering algorithm, an interest model for representing information content is built according to important influences of information content and information category on user interest, quantized information is classified at the same time, in the process, the VSM is firstly adopted to represent text information of the information, then AMMK-means algorithm is adopted to cluster information, an interest model of the user is built, similarity calculation is carried out on the interest model of the user and candidate information, and the information of interest is recommended to the user. The embodiment avoids the problems existing in editing classification by utilizing the improved clustering algorithm while inheriting the explanatory advantage of the recommendation algorithm of the existing content. Experiments prove that compared with the traditional collaborative filtering algorithm, the prediction accuracy of the method provided by the embodiment is high.
In the embodiment of the invention, referring to fig. 1, fig. 1 is a flowchart of an information recommendation method based on VSM and AMMK-means. As shown in FIG. 1, the information recommendation method based on VSM and AMMK-means in the embodiment of the invention mainly comprises the following steps:
s1, acquiring an article portrait of each piece of candidate information;
s2, substituting the object portraits of the candidate information into a pre-constructed interest model to obtain the similarity between the candidate information and the user portraits;
s3, recommending the candidate information with the highest similarity to a user;
the interest model is constructed based on the VSM, AMMK-means and the item representation of the information that the user has browsed.
In this embodiment, the information fusion is also called data fusion or multi-sensor fusion, and may be defined as an information processing process that uses computer technology to automatically analyze and integrate several sensor observation information obtained in time sequence under a certain criterion so as to complete task decision and evaluation.
In this embodiment, the information keyword refers to the most representative word in the information, which can characterize the uniqueness and uniqueness of the information, and is generally extracted by a text processing algorithm.
In this embodiment, the information feature vector is usually represented by a vector d= { w 1,w2,...,wi,...wn } because the content included in the information belongs to the text type, and the result of vectorization of the information text is called an information feature vector.
In one embodiment, the construction of the interest model in S2 includes:
Acquiring an article portrait of information which has been browsed by a user, and characterizing the article portrait of the information which has been browsed by the user by utilizing a VSM;
clustering the object portraits of the information browsed by the user through AMMK-means, and taking a clustering result as an information category of interest of the user;
Respectively calculating the weight of each information category according to the information quantity browsed by the user and the total information quantity browsed by the user contained in each information category;
generating a user representation based on the information categories of interest to the user and the weights of the information categories;
and calculating the similarity between the item portrait of the candidate information and the item portrait in the user portrait.
The VSM may be utilized in this embodiment to characterize representations of items of information that a user has viewed, including:
In the vector space model VSM, each document uses a feature vector to represent multidimensional information in the document, the feature vector is an article portrait, and the article portrait for constructing the information not only can embody high dimensionality of the information, but also is convenient to be used as clustering information for clustering so as to construct a user interest model;
The present embodiment adopts a vector space model to represent information feature vectors, and given an information set x= { X 1,X2,...,Xi,...Xn }, vectorization of information is represented as:
Where w ij represents the weight of keyword j in information i.
In the VSM construction process, firstly, the dimension m of a keyword set is determined, the keywords are used for representing the characteristics of the document, and when the number of the keywords is increased, the time complexity is increased along with the increase of m. On the premise of ensuring the characterization effect, in order to reduce time overhead, the embodiment extracts the first 5 keywords in each piece of information to characterize the piece of information (generally takes 3 and 5 to have the best effect), then adopts a TF-IDF algorithm to obtain the dimension m of the keyword set in the information set, and adopts the TF-IDF algorithm to calculate the weight w ij.
The process of calculating the weight w ij by the TF-IDF algorithm includes that the calculation of the TF-IDF algorithm can be divided into a word frequency (TF) part and an Inverse Document Frequency (IDF) part, and the products of the two parts jointly determine the weight of the document words.
Wherein the calculation formula of TF is:
Where count (i, j) represents the frequency of the keyword i in the information document j, and size (j) represents the total number of information j.
The IDF calculation formula is:
n represents the total number of information sets, and N (i) represents the number of information in which the keyword i appears.
The weights are calculated by TF and IDF as:
wij=TF(i,j)*IDF(i)
W ij after the weights are processed by using a normalization mode:
in one embodiment, the inventors have found that the conventional vsm+kmeans based recommendation method has the following drawbacks:
1) Recommendation algorithms, typically based on collaborative filtering, make recommendations based on user browsing records or feedback records. In addition, some methods consider adding information category factors, but the information categories are manually classified by information editors, and classification standards only represent edited opinions, so that recommendation effect deviation is caused.
2) The K-means algorithm is used for clustering, and has the limitations that firstly, the number of clusters is preset in the algorithm, but the accurate number of clusters is difficult to give in practical application, reference of the number of clusters is not selected for different data sets, a large number of training experiments are needed, and secondly, the initial cluster center of the algorithm is obtained in a random mode, if the initial center position selection is unsuitable, the operation amount is likely to be increased, and the global optimal solution is not obtained.
Therefore, the maximum and minimum distance clustering algorithm is earliest used in the field of pattern recognition, and samples with the greatest distance as far as possible are clustered by probing Euclidean distance between clusters, so that the situation that clustering results are poor due to too close initial center selection can be effectively avoided, the number of clusters hopefully generated is naturally increased after the initial cluster center selection is completed, and the defect of the number of unknown classes in K-means clustering is overcome.
Specifically, clustering the item portraits of the information browsed by the user through AMMK-means, and taking the clustering result as the information category of interest of the user, wherein the clustering result comprises the following steps:
generating a dataset based on the representation of the item of information that the user has browsed;
determining a clustering center and the number of the clustering centers by using a maximum and minimum distance clustering algorithm for samples in the data set;
Taking the number of the clustering centers as a K value in a K-means algorithm, and taking all the obtained clustering centers as initial clustering centers in the K-means clustering algorithm;
Based on the distance between each sample in the dataset and each initial clustering center, a clustering result is obtained when the set constraint condition is met;
and taking the clustering result as the information category of interest to the user.
Specifically, the determining the clustering center and the number of the clustering centers for the samples in the dataset by using a maximum and minimum distance clustering algorithm includes:
calculating the average value of the sample attributes, calculating the distance between each sample and the average value, and taking the sample corresponding to the minimum value of the distance as a first clustering center C 1;
Selecting a sample farthest from C 1 as a second cluster center C 2;
Calculating distances D i1 and D i2 from all remaining samples to C 1 and C 2, if D l=max{min(Di1,Di2), i=1, 2,..n }, and D l>θD12, θ is a given value, D 12 is the distance between C 1 and C 2, taking x l as a third cluster center C 3;
if C 3 is present, calculate D j=max{min(Di1,Di2,Di3), i=1, 2,..n, if D j>θD12, establish a fourth cluster center;
and analogically, ending the calculation of searching the cluster centers until the maximum and minimum distances are not more than thetad 12, and obtaining the cluster centers and the number of the cluster centers.
The clustering algorithm does not need to set the clustering number, does not need to pre-estimate the clustering number by using a large number of experiments, has smaller calculated amount, optimizes the selection of an initial clustering center, and avoids the defect of obtaining a local optimal solution of the initial clustering center of the conventional algorithm in a random mode.
In one embodiment, the modified algorithm AMMK-means referring to FIG. 2, FIG. 2 is a flow chart of the main steps of the modified algorithm AMMK-means according to an embodiment of the present invention. The specific steps of the improved algorithm AMMK-means are as follows:
For a given dataset x= { X 1,X2,...,Xn }:
Step 1, calculating a sample attribute average value, calculating the distance between each sample and the average value, and taking a sample corresponding to the minimum distance value as a first clustering center C 1;
step 2, giving theta, wherein 0< theta <1;
Step 3, selecting a sample point corresponding to D i1 farthest from C 1 as a second clustering center C 2;
Step 4, calculating the distances D i1 and D i2 from the sample to C 1 and C 2, if D l=max{min(Di1,Di2), i=1, 2,..n }, and D l>θD12,D12 is the distance between C 1 and C 2, taking x l as the third cluster center C 3;
Step 5, if C 3 is present, calculate D j=max{min(Di1,Di2,Di3), i=1, 2. If D j>θD12 is not greater than θD 12, continuing to find and establish a cluster center, and so on;
Step 6, taking the number of the cluster centers obtained in the steps 3, 4 and 5 as a K value in a K-means algorithm, and taking all the obtained cluster centers as initial cluster centers in the K-means clustering algorithm;
step 7, calculating the distance between each sample in the sample set and the center of the cluster, distributing the rest samples into the centroid cluster closest to the rest samples according to the nearest principle of the distance, and updating the centroid of each cluster;
and 8, repeating the step 7 until the error square sum criterion is met and the objective function is minimized to finish clustering, and obtaining a clustering result.
The objective function in the above step 8 is as follows:
Where p represents the data object, C i represents the centroid, and J c represents the sum of all the object squared errors in the data set.
The AMMK-means algorithm in the method provided by the embodiment does not need cosine to preset the number of clusters during clustering, does not need to pre-estimate the number of clusters by using a large number of experiments in advance, has smaller calculated amount, and has proper selection of the initial cluster center, thereby avoiding the defects that the addition amount is caused by the random acquisition of the initial cluster center and the global optimal solution is not obtained.
In the existing clustering process based on Chamelon algorithm, on one hand, when constructing a K nearest neighbor graph (Gk graph), similarity between every two data points needs to be calculated, and the first K values are taken according to the order of the similarity from big to small, the K value of the K-nearest neighbor graph and the threshold value of a similarity function need to be manually given, a large amount of priori knowledge is needed for giving the parameters, and the difficulty is high. The clustering algorithm provided by the embodiment avoids the need of manually giving the K value of the K-nearest neighbor graph and the threshold value of the similarity function when constructing the G K graph, avoids a great deal of priori knowledge required by giving the parameters, and reduces the difficulty. On the other hand, when dividing the G K graph into unconnected sub-graphs and taking the unconnected sub-graphs as an initial cluster of clustering, dividing the G K graph into two approximately equal sub-graphs according to a minimum truncation principle, and then taking the sub-graphs obtained by dividing as the initial cluster, and continuously repeating the previous process until the dividing standard is reached, thereby completing the dividing process. The partitioning technique employed in partitioning the G K graph increases the complexity of the algorithm and makes the minimum binary selection used difficult. The clustering method provided by the embodiment reduces the algorithm complexity and avoids the difficulty of using minimum binary selection.
In one embodiment, the user builds a user interest model through the browsed information, wherein the first layer of nodes of the model are users, the second layer of nodes are information categories, and the third layer of nodes are battlefield information browsed by the users.
If the user browses m different battlefield information, the user interest model may be expressed as:
seat={(T1,w1,n1),...,(Tm,wm,nm)}。
Where T i represents a feature vector of the i-th information category, w i represents a weight of the i-th information category, and n i represents the number of information categories included in the i-th information category that the user browses.
The feature vector of a certain information category is obtained from a weighted average of all browsed information feature vectors contained in the category.
The calculation formula of the feature vector T i of the i-th information category is:
Wherein E j represents a set of information browsed by the user in the information category I, E j represents an information feature vector, I j represents a user interest level of the j-th information in the category, and the information browsed by the user represents the interest of the user in the information, so that if I j is set to 1, the formula can be simplified as follows:
Further, the value of w i is calculated according to the ratio of the information browsed by the user to the total browsed information in the ith information category, and the calculation formula is as follows:
At the time of calculation, the user interest model is expressed as:
Vseat=(w1*T1,w2*T2,...,wm*Tm)T
Finally, the cosine similarity is used for calculating the similarity between the candidate battlefield information d i and the user, and the formula is calculated:
where w i*Ti T is the feature vector of the battlefield information category to which the candidate news d i belongs, Is the eigenvector of d i.
According to the method and the device for recommending the user, the interests of the user can be accurately expressed, the recommending effect is improved, and the defect that the classification standard only can represent edited opinions because the information category is obtained by manually classifying information editors when the recommendation algorithm based on collaborative filtering carries out recommendation according to the user browsing records or feedback records is overcome.
In a specific embodiment, the characteristic attribute of the information is acquired first, the information browsed by the user is analyzed to generate a user portrait, the characteristic similarity between the user portrait and the candidate information is calculated, and finally the information with high similarity is recommended to the user according to the similarity.
The content-based recommendation method generally includes three steps of item portraits, user portraits, and recommendation generation.
The object representation is characterized in that the object is represented by characteristic information, and the attribute describing the object has structured data and unstructured data, and the unstructured data needs to be converted into the structured data to be used in the model.
The current common commodity representation method is vector space model (Vector Space Model, VSM) based on TF-IDF weights.
The VSM converts the text documents into space vectors, and the TF-IDF is used to calculate keyword weights for each document. Because synonyms, word multi-meaning and the like exist among words in the document, the robustness and the accuracy of the recommendation model are reduced. In order to enhance the generalization ability of the model to the problems of word ambiguity and synonyms, semantic analysis and knowledge graph are applied to the recommendation system.
The user portrayal is based on the characteristics that the user has browsed or rated before to construct the user interest model. The model mainly comprises two parts of text classification and a user interest model for constructing a hierarchical structure, namely, firstly, clustering object portraits of information browsed by a user to obtain information categories and corresponding features (namely, object portraits) of the information categories, secondly, calculating weights of the information categories, and finally, counting the number of the information browsed by the user contained in the information categories. Traditional text classification models include nearest neighbor algorithms, rocchio algorithms, decision tree methods, linear classification methods, bayesian classifiers, and the like. The user interest hierarchical model construction process comprises a hierarchical model of a three-layer structure of user-category-object or a hierarchical model of a three-layer structure of user-interest-item. The recommendation is to recommend a group of goods set with highest correlation to the user by comparing the characteristic similarity of the user portrait and the candidate goods, and the common similarity calculation method comprises two methods of pearson correlation coefficient and cosine similarity.
The image of the object in this embodiment is a series of labels for each object. One of the item representations may be used as an item feature in the recommendation model. In the recommendation system, the item representation is the basis of the user representation, and the item representation+the user behavior=the user representation.
It should be noted that, although the foregoing embodiments describe the steps in a specific order, it will be understood by those skilled in the art that, in order to achieve the effects of the present invention, the steps are not necessarily performed in such an order, and may be performed simultaneously (in parallel) or in other orders, and these variations are within the scope of the present invention.
Based on the same inventive concept, as shown in fig. 3, the present invention further provides an information recommendation system based on VSM and AMMK-means, including:
The acquisition module is used for acquiring the object portraits of each piece of candidate information;
the similarity calculation module is used for substituting the item portraits of the candidate information into a pre-constructed interest model to obtain the similarity between the candidate information and the user portraits;
the recommending module is used for recommending the candidate information with the highest similarity to the user;
the interest model is constructed based on the VSM, AMMK-means and the item representation of the information that the user has browsed.
In an embodiment, the system further comprises a building module of an interest model, wherein the building module of the interest model comprises:
The first construction unit is used for acquiring the article portraits of the information which the user has browsed and characterizing the article portraits of the information which the user has browsed by utilizing the VSM;
the information category construction unit is used for clustering the object portraits of the information browsed by the user through AMMK-means, and taking the clustering result as the information category of interest to the user;
a weight calculation unit for calculating the weight of each information category according to the number of information browsed by the user and the total number of information browsed by the user contained in each information category;
A user portrait construction unit, which is used for generating a user portrait based on the information category of interest of the user and the weight of the information category;
and the calculating unit is used for calculating the similarity between the article portrait of the candidate information and the article portrait in the user portrait.
The maximum and minimum distance method adopted by the embodiment is based on Euclidean distance, and objects which are as far as possible are taken as clustering centers, so that the situation that the clustering centers are too close to each other possibly occurring during the initial value selection of the K-means method is avoided, the number of initial clustering centers is intelligently determined, and the efficiency of dividing the initial data set is improved.
The information category construction unit in the embodiment is specifically configured to:
generating a dataset based on the representation of the item of information that the user has browsed;
determining a clustering center and the number of the clustering centers by using a maximum and minimum distance clustering algorithm for samples in the data set;
Taking the number of the clustering centers as a K value in a K-means algorithm, and taking all the obtained clustering centers as initial clustering centers in the K-means clustering algorithm;
Based on the distance between each sample in the dataset and each initial clustering center, a clustering result is obtained when the set constraint condition is met;
and taking the clustering result as the information category of interest to the user.
In an embodiment, determining the cluster center and the number of cluster centers for the samples in the dataset by using a maximum-minimum distance clustering algorithm includes:
calculating the average value of the sample attributes, calculating the distance between each sample and the average value, and taking the sample corresponding to the minimum value of the distance as a first clustering center C 1;
Selecting a sample farthest from C 1 as a second cluster center C 2;
Calculating distances D i1 and D i2 from all remaining samples to C 1 and C 2, if D l=max{min(Di1,Di2), i=1, 2,..n }, and D l>θD12, θ is a given value, D 12 is the distance between C 1 and C 2, taking x l as a third cluster center C 3;
if C 3 is present, calculate D j=max{min(Di1,Di2,Di3), i=1, 2,..n, if D j>θD12, establish a fourth cluster center;
and analogically, ending the calculation of searching the cluster centers until the maximum and minimum distances are not more than thetad 12, and obtaining the cluster centers and the number of the cluster centers.
In an embodiment, the expression of the interest model is as follows:
Vseat=(w1*T1,w2*T2,...,wm*Tm)T
Wherein V seat represents a user image, w m represents a weight of the mth information category, and T m represents a feature vector of the mth information category.
In an embodiment, the similarity is calculated as follows:
Wherein, the seal is the item portrait in the user portrait, w i is the weight of the information category to which the candidate information d i belongs, T i T is the feature vector of the information category to which the candidate information d i belongs, w i*Ti T is the feature vector of the information category to which the candidate information d i belongs, Is the eigenvector of d i.
It will be appreciated by those skilled in the art that the present invention may implement all or part of the above-described methods according to the above-described embodiments, or may be implemented by means of a computer program for instructing relevant hardware, where the computer program may be stored in a computer readable storage medium, and where the computer program may implement the steps of the above-described embodiments of the method when executed by a processor. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include any entity or device capable of carrying the computer program code, a medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory, a random access memory, an electrical carrier wave signal, a telecommunication signal, a software distribution medium, etc. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
Further, the invention also provides a storage device. In one storage device embodiment according to the present invention, the storage device may be configured to store a program for performing the information recommendation method based on the VSM and the AMMK-means of the above method embodiment, which may be loaded and executed by the processor to implement the information recommendation method based on the VSM and the AMMK-means. For convenience of explanation, only those portions of the embodiments of the present invention that are relevant to the embodiments of the present invention are shown, and specific technical details are not disclosed, please refer to the method portions of the embodiments of the present invention. The storage means may be a storage means device formed by including various electronic devices, and optionally, a non-transitory computer readable storage medium is stored in an embodiment of the present invention.
Further, the invention also provides a control device. In one control device embodiment according to the present invention, the control device includes a processor and a storage device, the storage device may be configured to store a program for executing the information recommendation method based on the VSM and the AMMK-means of the above-described method embodiment, and the processor may be configured to execute the program in the storage device, including, but not limited to, the program for executing the information recommendation method based on the VSM and the AMMK-means of the above-described method embodiment. For convenience of explanation, only those portions of the embodiments of the present invention that are relevant to the embodiments of the present invention are shown, and specific technical details are not disclosed, please refer to the method portions of the embodiments of the present invention. The control device may be a control device formed of various electronic devices.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the specific embodiments of the present invention without departing from the spirit and scope of the present invention, and any modifications and equivalents are intended to be included in the scope of the claims of the present invention.
Claims (4)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011432407.3A CN114625952B (en) | 2020-12-10 | 2020-12-10 | Information recommendation method and system based on VSM and AMMK-means |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202011432407.3A CN114625952B (en) | 2020-12-10 | 2020-12-10 | Information recommendation method and system based on VSM and AMMK-means |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN114625952A CN114625952A (en) | 2022-06-14 |
| CN114625952B true CN114625952B (en) | 2025-07-18 |
Family
ID=81896053
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202011432407.3A Active CN114625952B (en) | 2020-12-10 | 2020-12-10 | Information recommendation method and system based on VSM and AMMK-means |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN114625952B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118016249B (en) * | 2024-02-19 | 2024-09-10 | 中山大学孙逸仙纪念医院 | Preoperative anxiety relieving method and system based on virtual reality technology |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110532306A (en) * | 2019-05-27 | 2019-12-03 | 浙江工业大学 | A kind of Library User's portrait model building method dividing k-means based on multi-angle of view two |
| CN110781963A (en) * | 2019-10-28 | 2020-02-11 | 西安电子科技大学 | K-means clustering-based aerial target clustering method |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104376057A (en) * | 2014-11-06 | 2015-02-25 | 南京邮电大学 | Self-adaptation clustering method based on maximum distance, minimum distance and K-means |
| CN105678607B (en) * | 2016-01-07 | 2019-05-31 | 合肥工业大学 | A kind of Order Batch method based on improved K-Means algorithm |
| CN107645393A (en) * | 2016-07-20 | 2018-01-30 | 中兴通讯股份有限公司 | Determine the method, apparatus and system of the black-box system input and output degree of association |
| WO2020232616A1 (en) * | 2019-05-20 | 2020-11-26 | 深圳市欢太科技有限公司 | Information recommendation method and apparatus, and electronic device and storage medium |
-
2020
- 2020-12-10 CN CN202011432407.3A patent/CN114625952B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110532306A (en) * | 2019-05-27 | 2019-12-03 | 浙江工业大学 | A kind of Library User's portrait model building method dividing k-means based on multi-angle of view two |
| CN110781963A (en) * | 2019-10-28 | 2020-02-11 | 西安电子科技大学 | K-means clustering-based aerial target clustering method |
Also Published As
| Publication number | Publication date |
|---|---|
| CN114625952A (en) | 2022-06-14 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110162706B (en) | Personalized recommendation method and system based on interactive data clustering | |
| US20230039496A1 (en) | Question-and-answer processing method, electronic device and computer readable medium | |
| CN109829104B (en) | Semantic similarity based pseudo-correlation feedback model information retrieval method and system | |
| USRE47340E1 (en) | Image retrieval apparatus | |
| CA2786727C (en) | Joint embedding for item association | |
| CN111581354A (en) | A method and system for calculating similarity of FAQ questions | |
| CN101996191B (en) | Method and system for searching for two-dimensional cross-media element | |
| TWI396105B (en) | Digital data processing method for personalized information retrieval and computer readable storage medium and information retrieval system thereof | |
| CN114298020B (en) | Keyword vectorization method based on topic semantic information and application thereof | |
| CN110083764A (en) | A kind of collaborative filtering cold start-up way to solve the problem | |
| CN113704617A (en) | Article recommendation method, system, electronic device and storage medium | |
| CN114936278A (en) | Text recommendation method, apparatus, computer equipment and storage medium | |
| Niu | Music Emotion Recognition Model Using Gated Recurrent Unit Networks and Multi‐Feature Extraction | |
| CN114625952B (en) | Information recommendation method and system based on VSM and AMMK-means | |
| CN118069814B (en) | Text processing method, device, electronic equipment and storage medium | |
| TW201243627A (en) | Multi-label text categorization based on fuzzy similarity and k nearest neighbors | |
| Eyjolfsdottir et al. | Moviegen: A movie recommendation system | |
| Spiegel et al. | Pattern recognition in multivariate time series: dissertation proposal | |
| CN117972359A (en) | Intelligent data analysis method based on multi-mode data | |
| Wang | Application of E-Commerce Recommendation Algorithm in Consumer Preference Prediction | |
| CN113505223A (en) | Network water army identification method and system | |
| Ha et al. | Ordered Clustering-Based Semantic Music Recommender System Using Deep Learning Selection. | |
| Tu | Online Text Retrieval Method Based on Convolution Neural Network. | |
| CN119646191B (en) | Automatic labeling method, device and equipment based on large model and clustering algorithm | |
| CN111581164B (en) | Multimedia file processing method, device, server and storage medium |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |