CN114625952B

CN114625952B - Information recommendation method and system based on VSM and AMMK-means

Info

Publication number: CN114625952B
Application number: CN202011432407.3A
Authority: CN
Inventors: 彭石宝; 曹郁; 焦峰; 王炜华
Original assignee: 93216 Troops Of Chinese Pla
Current assignee: 93216 Troops Of Chinese Pla
Priority date: 2020-12-10
Filing date: 2020-12-10
Publication date: 2025-07-18
Anticipated expiration: 2040-12-10
Also published as: CN114625952A

Abstract

The invention discloses an information recommendation method and system based on VSM and AMMK-means, comprising the steps of obtaining object images of candidate information, substituting the object images of the candidate information into a pre-constructed interest model to obtain the similarity between the candidate information and a user image, and recommending the candidate information with the highest similarity to the user, wherein the interest model is constructed based on the object images of the VSM, the AMMK-means and the information browsed by the user. Because the interest model is constructed based on the VSM, the AMMK-means and the item portrait of the information browsed by the user, which is equivalent to customizing based on the item portrait of the user's interest, the invention avoids the deviation between the information category recommended to the user and the information category actually interested by the user according to the classification standard set by the editor, and improves the recommendation accuracy compared with the traditional collaborative filtering algorithm.

Description

Information recommendation method and system based on VSM and AMMK-means

Technical Field

The invention relates to the field of information retrieval, in particular to an information recommendation method and system based on VSM and AMMK-means.

Background

The situation assessment refers to the analysis, reasoning and judgment of multi-source information based on the relation between object assessment understanding battlefield and is used for supporting command layer decision. Because battlefield information has the characteristics of large data volume and multiple data types, each seat (namely a user) has the problem of difficult information selection, and an auxiliary decision-making system is required to recommend interesting information for different seats according to browsing records of each seat and characteristics of the battlefield information.

Recommendation algorithms, typically based on collaborative filtering, make recommendations based on user browsing records or feedback records. In addition to the browse records or feedback records according to users, some studies consider increasing information categories when generating recommended information, but the information categories are manually classified by information editors, and classification standards can only represent edited opinions, thereby causing deviation of recommendation effects.

Disclosure of Invention

In order to solve the defects in the prior art, the invention provides an information recommendation method and system based on VSM and AMMK-means.

In a first aspect, an information recommendation method based on VSM and AMMK-means is provided, including:

acquiring an item portrait of each candidate information;

Substituting the item portraits of the candidate information into a pre-constructed interest model to obtain the similarity between the candidate information and the user portraits;

recommending the candidate information with the highest similarity to a user;

the interest model is constructed based on the VSM, AMMK-means and the item representation of the information that the user has browsed.

Preferably, the construction of the interest model includes:

Acquiring an article portrait of information which has been browsed by a user, and characterizing the article portrait of the information which has been browsed by the user by utilizing a VSM;

clustering the object portraits of the information browsed by the user through AMMK-means, and taking a clustering result as an information category of interest of the user;

Respectively calculating the weight of each information category according to the information quantity browsed by the user and the total information quantity browsed by the user contained in each information category;

generating a user representation based on the information categories of interest to the user and the weights of the information categories;

and calculating the similarity between the item portrait of the candidate information and the item portrait in the user portrait.

Further, the clustering of the item portraits of the information browsed by the user through the AMMK-means uses the clustering result as the information category of interest of the user, and the clustering method comprises the following steps:

generating a dataset based on the representation of the item of information that the user has browsed;

determining a clustering center and the number of the clustering centers by using a maximum and minimum distance clustering algorithm for samples in the data set;

Taking the number of the clustering centers as a K value in a K-means algorithm, and taking all the obtained clustering centers as initial clustering centers in the K-means clustering algorithm;

Based on the distance between each sample in the dataset and each initial clustering center, a clustering result is obtained when the set constraint condition is met;

and taking the clustering result as the information category of interest to the user.

Further, the determining the clustering center and the number of the clustering centers for the samples in the dataset by using a maximum and minimum distance clustering algorithm includes:

calculating the average value of the sample attributes, calculating the distance between each sample and the average value, and taking the sample corresponding to the minimum value of the distance as a first clustering center C ₁;

Selecting a sample farthest from C ₁ as a second cluster center C ₂;

Calculating distances D _i1 and D _i2 from all remaining samples to C ₁ and C ₂, if D _l＝max{min(D_i1,D_i2), i=1, 2,..n }, and D _l＞θD₁₂, θ is a given value, D ₁₂ is the distance between C ₁ and C ₂, taking x _l as a third cluster center C ₃;

if C ₃ is present, calculate D _j＝max{min(D_i1,D_i2,D_i3), i=1, 2,..n, if D _j＞θD₁₂, establish a fourth cluster center;

and analogically, ending the calculation of searching the cluster centers until the maximum and minimum distances are not more than thetad ₁₂, and obtaining the cluster centers and the number of the cluster centers.

Preferably, the expression of the interest model is as follows:

V_seat＝(w₁*T₁,w₂*T₂,...,w_m*T_m)^T

Wherein V _seat represents a user image, w _m represents a weight of the mth information category, and T _m represents a feature vector of the mth information category.

Preferably, the similarity is calculated as follows:

Wherein, the seal is the item portrait in the user portrait, w _i is the weight of the information category to which the candidate information d _i belongs, T _i ^T is the feature vector of the information category to which the candidate information d _i belongs, Is the eigenvector of d _i.

In a second aspect, there is provided an information recommendation system based on VSM and AMMK-means, comprising:

The acquisition module is used for acquiring the object portraits of each piece of candidate information;

the similarity calculation module is used for substituting the item portraits of the candidate information into a pre-constructed interest model to obtain the similarity between the candidate information and the user portraits;

the recommending module is used for recommending the candidate information with the highest similarity to the user;

Preferably, the system further comprises a building module of an interest model, wherein the building module of the interest model comprises:

The first construction unit is used for acquiring the article portraits of the information which the user has browsed and characterizing the article portraits of the information which the user has browsed by utilizing the VSM;

the information category construction unit is used for clustering the object portraits of the information browsed by the user through AMMK-means, and taking the clustering result as the information category of interest to the user;

a weight calculation unit for calculating the weight of each information category according to the number of information browsed by the user and the total number of information browsed by the user contained in each information category;

A user portrait construction unit, which is used for generating a user portrait based on the information category of interest of the user and the weight of the information category;

and the calculating unit is used for calculating the similarity between the article portrait of the candidate information and the article portrait in the user portrait.

In a third aspect, a storage device is provided, in which a plurality of program codes are stored, the program codes are adapted to be loaded and executed by a processor to perform the information recommendation method based on VSM and AMMK-means according to any of the above technical solutions.

In a fourth aspect, a control device is provided, including a processor and a storage device, the storage device being adapted to store a plurality of program codes, the program codes being adapted to be loaded and executed by the processor to perform the information recommendation method based on VSM and AMMK-means according to any of the above-mentioned aspects.

The technical scheme provided by the invention has at least one or more of the following beneficial effects:

In this embodiment, first, an item representation of each candidate information is acquired; the method comprises the steps of obtaining object images of all candidate information, substituting the object images of all candidate information into a pre-constructed interest model to obtain the similarity between all candidate information and the user image, recommending the candidate information with the highest similarity to a user, and because the interest model in the embodiment is constructed based on VSM, AMMK-means and the object images of the information browsed by the user, the method is equivalent to customizing based on the object images of interest of the user, avoids deviation between information types recommended to the user according to classification standards set by editors and information types actually interested by the user, and improves recommendation accuracy compared with a traditional collaborative filtering algorithm.

Drawings

FIG. 1 is a flow chart illustrating the main steps of a method for information recommendation based on VSM and AMMK-means according to an embodiment of the present invention;

FIG. 2 is a flowchart of the main steps of the modified algorithm AMMK-means according to the embodiment of the present invention;

FIG. 3 is a main block diagram of an information recommendation method based on VSM and AMMK-means according to an embodiment of the present invention.

Detailed Description

For a better understanding of the present invention, reference is made to the following description, drawings and examples.

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order to solve the problems of the existing information recommendation model, the embodiment provides an information recommendation method based on VSM and AMMK-means, wherein AMMK-means is used for improving a K-means clustering algorithm based on a maximum and minimum distance clustering algorithm, an interest model for representing information content is built according to important influences of information content and information category on user interest, quantized information is classified at the same time, in the process, the VSM is firstly adopted to represent text information of the information, then AMMK-means algorithm is adopted to cluster information, an interest model of the user is built, similarity calculation is carried out on the interest model of the user and candidate information, and the information of interest is recommended to the user. The embodiment avoids the problems existing in editing classification by utilizing the improved clustering algorithm while inheriting the explanatory advantage of the recommendation algorithm of the existing content. Experiments prove that compared with the traditional collaborative filtering algorithm, the prediction accuracy of the method provided by the embodiment is high.

In the embodiment of the invention, referring to fig. 1, fig. 1 is a flowchart of an information recommendation method based on VSM and AMMK-means. As shown in FIG. 1, the information recommendation method based on VSM and AMMK-means in the embodiment of the invention mainly comprises the following steps:

s1, acquiring an article portrait of each piece of candidate information;

s2, substituting the object portraits of the candidate information into a pre-constructed interest model to obtain the similarity between the candidate information and the user portraits;

s3, recommending the candidate information with the highest similarity to a user;

In this embodiment, the information fusion is also called data fusion or multi-sensor fusion, and may be defined as an information processing process that uses computer technology to automatically analyze and integrate several sensor observation information obtained in time sequence under a certain criterion so as to complete task decision and evaluation.

In this embodiment, the information keyword refers to the most representative word in the information, which can characterize the uniqueness and uniqueness of the information, and is generally extracted by a text processing algorithm.

In this embodiment, the information feature vector is usually represented by a vector d= { w ₁,w₂,...,w_i,...w_n } because the content included in the information belongs to the text type, and the result of vectorization of the information text is called an information feature vector.

In one embodiment, the construction of the interest model in S2 includes:

The VSM may be utilized in this embodiment to characterize representations of items of information that a user has viewed, including:

In the vector space model VSM, each document uses a feature vector to represent multidimensional information in the document, the feature vector is an article portrait, and the article portrait for constructing the information not only can embody high dimensionality of the information, but also is convenient to be used as clustering information for clustering so as to construct a user interest model;

The present embodiment adopts a vector space model to represent information feature vectors, and given an information set x= { X ₁,X₂,...,X_i,...X_n }, vectorization of information is represented as:

Where w _ij represents the weight of keyword j in information i.

In the VSM construction process, firstly, the dimension m of a keyword set is determined, the keywords are used for representing the characteristics of the document, and when the number of the keywords is increased, the time complexity is increased along with the increase of m. On the premise of ensuring the characterization effect, in order to reduce time overhead, the embodiment extracts the first 5 keywords in each piece of information to characterize the piece of information (generally takes 3 and 5 to have the best effect), then adopts a TF-IDF algorithm to obtain the dimension m of the keyword set in the information set, and adopts the TF-IDF algorithm to calculate the weight w _ij.

The process of calculating the weight w _ij by the TF-IDF algorithm includes that the calculation of the TF-IDF algorithm can be divided into a word frequency (TF) part and an Inverse Document Frequency (IDF) part, and the products of the two parts jointly determine the weight of the document words.

Wherein the calculation formula of TF is:

Where count (i, j) represents the frequency of the keyword i in the information document j, and size (j) represents the total number of information j.

The IDF calculation formula is:

n represents the total number of information sets, and N (i) represents the number of information in which the keyword i appears.

The weights are calculated by TF and IDF as:

w_ij=TF(i,j)*IDF(i)

W _ij after the weights are processed by using a normalization mode:

in one embodiment, the inventors have found that the conventional vsm+kmeans based recommendation method has the following drawbacks:

1) Recommendation algorithms, typically based on collaborative filtering, make recommendations based on user browsing records or feedback records. In addition, some methods consider adding information category factors, but the information categories are manually classified by information editors, and classification standards only represent edited opinions, so that recommendation effect deviation is caused.

2) The K-means algorithm is used for clustering, and has the limitations that firstly, the number of clusters is preset in the algorithm, but the accurate number of clusters is difficult to give in practical application, reference of the number of clusters is not selected for different data sets, a large number of training experiments are needed, and secondly, the initial cluster center of the algorithm is obtained in a random mode, if the initial center position selection is unsuitable, the operation amount is likely to be increased, and the global optimal solution is not obtained.

Therefore, the maximum and minimum distance clustering algorithm is earliest used in the field of pattern recognition, and samples with the greatest distance as far as possible are clustered by probing Euclidean distance between clusters, so that the situation that clustering results are poor due to too close initial center selection can be effectively avoided, the number of clusters hopefully generated is naturally increased after the initial cluster center selection is completed, and the defect of the number of unknown classes in K-means clustering is overcome.

Specifically, clustering the item portraits of the information browsed by the user through AMMK-means, and taking the clustering result as the information category of interest of the user, wherein the clustering result comprises the following steps:

Specifically, the determining the clustering center and the number of the clustering centers for the samples in the dataset by using a maximum and minimum distance clustering algorithm includes:

Selecting a sample farthest from C ₁ as a second cluster center C ₂;

The clustering algorithm does not need to set the clustering number, does not need to pre-estimate the clustering number by using a large number of experiments, has smaller calculated amount, optimizes the selection of an initial clustering center, and avoids the defect of obtaining a local optimal solution of the initial clustering center of the conventional algorithm in a random mode.

In one embodiment, the modified algorithm AMMK-means referring to FIG. 2, FIG. 2 is a flow chart of the main steps of the modified algorithm AMMK-means according to an embodiment of the present invention. The specific steps of the improved algorithm AMMK-means are as follows:

For a given dataset x= { X ₁,X₂,...,X_n }:

Step 1, calculating a sample attribute average value, calculating the distance between each sample and the average value, and taking a sample corresponding to the minimum distance value as a first clustering center C ₁;

step 2, giving theta, wherein 0< theta <1;

Step 3, selecting a sample point corresponding to D _i1 farthest from C ₁ as a second clustering center C ₂;

Step 4, calculating the distances D _i1 and D _i2 from the sample to C ₁ and C ₂, if D _l＝max{min(D_i1,D_i2), i=1, 2,..n }, and D _l＞θD₁₂,D₁₂ is the distance between C ₁ and C ₂, taking x _l as the third cluster center C ₃;

Step 5, if C ₃ is present, calculate D _j＝max{min(D_i1,D_i2,D_i3), i=1, 2. If D _j＞θD₁₂ is not greater than θD ₁₂, continuing to find and establish a cluster center, and so on;

Step 6, taking the number of the cluster centers obtained in the steps 3, 4 and 5 as a K value in a K-means algorithm, and taking all the obtained cluster centers as initial cluster centers in the K-means clustering algorithm;

step 7, calculating the distance between each sample in the sample set and the center of the cluster, distributing the rest samples into the centroid cluster closest to the rest samples according to the nearest principle of the distance, and updating the centroid of each cluster;

and 8, repeating the step 7 until the error square sum criterion is met and the objective function is minimized to finish clustering, and obtaining a clustering result.

The objective function in the above step 8 is as follows:

Where p represents the data object, C _i represents the centroid, and J _c represents the sum of all the object squared errors in the data set.

The AMMK-means algorithm in the method provided by the embodiment does not need cosine to preset the number of clusters during clustering, does not need to pre-estimate the number of clusters by using a large number of experiments in advance, has smaller calculated amount, and has proper selection of the initial cluster center, thereby avoiding the defects that the addition amount is caused by the random acquisition of the initial cluster center and the global optimal solution is not obtained.

In the existing clustering process based on Chamelon algorithm, on one hand, when constructing a K nearest neighbor graph (Gk graph), similarity between every two data points needs to be calculated, and the first K values are taken according to the order of the similarity from big to small, the K value of the K-nearest neighbor graph and the threshold value of a similarity function need to be manually given, a large amount of priori knowledge is needed for giving the parameters, and the difficulty is high. The clustering algorithm provided by the embodiment avoids the need of manually giving the K value of the K-nearest neighbor graph and the threshold value of the similarity function when constructing the G _K graph, avoids a great deal of priori knowledge required by giving the parameters, and reduces the difficulty. On the other hand, when dividing the G _K graph into unconnected sub-graphs and taking the unconnected sub-graphs as an initial cluster of clustering, dividing the G _K graph into two approximately equal sub-graphs according to a minimum truncation principle, and then taking the sub-graphs obtained by dividing as the initial cluster, and continuously repeating the previous process until the dividing standard is reached, thereby completing the dividing process. The partitioning technique employed in partitioning the G _K graph increases the complexity of the algorithm and makes the minimum binary selection used difficult. The clustering method provided by the embodiment reduces the algorithm complexity and avoids the difficulty of using minimum binary selection.

In one embodiment, the user builds a user interest model through the browsed information, wherein the first layer of nodes of the model are users, the second layer of nodes are information categories, and the third layer of nodes are battlefield information browsed by the users.

If the user browses m different battlefield information, the user interest model may be expressed as:

seat={(T₁,w₁,n₁),...,(T_m,w_m,n_m)}。

Where T _i represents a feature vector of the i-th information category, w _i represents a weight of the i-th information category, and n _i represents the number of information categories included in the i-th information category that the user browses.

The feature vector of a certain information category is obtained from a weighted average of all browsed information feature vectors contained in the category.

The calculation formula of the feature vector T _i of the i-th information category is:

Wherein E _j represents a set of information browsed by the user in the information category I, E _j represents an information feature vector, I _j represents a user interest level of the j-th information in the category, and the information browsed by the user represents the interest of the user in the information, so that if I _j is set to 1, the formula can be simplified as follows:

Further, the value of w _i is calculated according to the ratio of the information browsed by the user to the total browsed information in the ith information category, and the calculation formula is as follows:

At the time of calculation, the user interest model is expressed as:

V_seat＝(w₁*T₁,w₂*T₂,...,w_m*T_m)^T

Finally, the cosine similarity is used for calculating the similarity between the candidate battlefield information d _i and the user, and the formula is calculated:

where w _i*T_i ^T is the feature vector of the battlefield information category to which the candidate news d _i belongs, Is the eigenvector of d _i.

According to the method and the device for recommending the user, the interests of the user can be accurately expressed, the recommending effect is improved, and the defect that the classification standard only can represent edited opinions because the information category is obtained by manually classifying information editors when the recommendation algorithm based on collaborative filtering carries out recommendation according to the user browsing records or feedback records is overcome.

In a specific embodiment, the characteristic attribute of the information is acquired first, the information browsed by the user is analyzed to generate a user portrait, the characteristic similarity between the user portrait and the candidate information is calculated, and finally the information with high similarity is recommended to the user according to the similarity.

The content-based recommendation method generally includes three steps of item portraits, user portraits, and recommendation generation.

The object representation is characterized in that the object is represented by characteristic information, and the attribute describing the object has structured data and unstructured data, and the unstructured data needs to be converted into the structured data to be used in the model.

The current common commodity representation method is vector space model (Vector Space Model, VSM) based on TF-IDF weights.

The VSM converts the text documents into space vectors, and the TF-IDF is used to calculate keyword weights for each document. Because synonyms, word multi-meaning and the like exist among words in the document, the robustness and the accuracy of the recommendation model are reduced. In order to enhance the generalization ability of the model to the problems of word ambiguity and synonyms, semantic analysis and knowledge graph are applied to the recommendation system.

The user portrayal is based on the characteristics that the user has browsed or rated before to construct the user interest model. The model mainly comprises two parts of text classification and a user interest model for constructing a hierarchical structure, namely, firstly, clustering object portraits of information browsed by a user to obtain information categories and corresponding features (namely, object portraits) of the information categories, secondly, calculating weights of the information categories, and finally, counting the number of the information browsed by the user contained in the information categories. Traditional text classification models include nearest neighbor algorithms, rocchio algorithms, decision tree methods, linear classification methods, bayesian classifiers, and the like. The user interest hierarchical model construction process comprises a hierarchical model of a three-layer structure of user-category-object or a hierarchical model of a three-layer structure of user-interest-item. The recommendation is to recommend a group of goods set with highest correlation to the user by comparing the characteristic similarity of the user portrait and the candidate goods, and the common similarity calculation method comprises two methods of pearson correlation coefficient and cosine similarity.

The image of the object in this embodiment is a series of labels for each object. One of the item representations may be used as an item feature in the recommendation model. In the recommendation system, the item representation is the basis of the user representation, and the item representation+the user behavior=the user representation.

It should be noted that, although the foregoing embodiments describe the steps in a specific order, it will be understood by those skilled in the art that, in order to achieve the effects of the present invention, the steps are not necessarily performed in such an order, and may be performed simultaneously (in parallel) or in other orders, and these variations are within the scope of the present invention.

Based on the same inventive concept, as shown in fig. 3, the present invention further provides an information recommendation system based on VSM and AMMK-means, including:

In an embodiment, the system further comprises a building module of an interest model, wherein the building module of the interest model comprises:

The maximum and minimum distance method adopted by the embodiment is based on Euclidean distance, and objects which are as far as possible are taken as clustering centers, so that the situation that the clustering centers are too close to each other possibly occurring during the initial value selection of the K-means method is avoided, the number of initial clustering centers is intelligently determined, and the efficiency of dividing the initial data set is improved.

The information category construction unit in the embodiment is specifically configured to:

In an embodiment, determining the cluster center and the number of cluster centers for the samples in the dataset by using a maximum-minimum distance clustering algorithm includes:

Selecting a sample farthest from C ₁ as a second cluster center C ₂;

In an embodiment, the expression of the interest model is as follows:

V_seat＝(w₁*T₁,w₂*T₂,...,w_m*T_m)^T

In an embodiment, the similarity is calculated as follows:

Wherein, the seal is the item portrait in the user portrait, w _i is the weight of the information category to which the candidate information d _i belongs, T _i ^T is the feature vector of the information category to which the candidate information d _i belongs, w _i*T_i ^T is the feature vector of the information category to which the candidate information d _i belongs, Is the eigenvector of d _i.

It will be appreciated by those skilled in the art that the present invention may implement all or part of the above-described methods according to the above-described embodiments, or may be implemented by means of a computer program for instructing relevant hardware, where the computer program may be stored in a computer readable storage medium, and where the computer program may implement the steps of the above-described embodiments of the method when executed by a processor. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include any entity or device capable of carrying the computer program code, a medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory, a random access memory, an electrical carrier wave signal, a telecommunication signal, a software distribution medium, etc. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

Further, the invention also provides a storage device. In one storage device embodiment according to the present invention, the storage device may be configured to store a program for performing the information recommendation method based on the VSM and the AMMK-means of the above method embodiment, which may be loaded and executed by the processor to implement the information recommendation method based on the VSM and the AMMK-means. For convenience of explanation, only those portions of the embodiments of the present invention that are relevant to the embodiments of the present invention are shown, and specific technical details are not disclosed, please refer to the method portions of the embodiments of the present invention. The storage means may be a storage means device formed by including various electronic devices, and optionally, a non-transitory computer readable storage medium is stored in an embodiment of the present invention.

Further, the invention also provides a control device. In one control device embodiment according to the present invention, the control device includes a processor and a storage device, the storage device may be configured to store a program for executing the information recommendation method based on the VSM and the AMMK-means of the above-described method embodiment, and the processor may be configured to execute the program in the storage device, including, but not limited to, the program for executing the information recommendation method based on the VSM and the AMMK-means of the above-described method embodiment. For convenience of explanation, only those portions of the embodiments of the present invention that are relevant to the embodiments of the present invention are shown, and specific technical details are not disclosed, please refer to the method portions of the embodiments of the present invention. The control device may be a control device formed of various electronic devices.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the specific embodiments of the present invention without departing from the spirit and scope of the present invention, and any modifications and equivalents are intended to be included in the scope of the claims of the present invention.

Claims

1. An information recommendation method based on VSM and AMMK-means, characterized by comprising:

Obtaining item portraits of each candidate information;

Substituting the item portrait of each candidate information into a pre-built interest model to obtain the similarity between each candidate information and the user portrait;

Recommending the candidate information with the highest similarity to the user;

The interest model is constructed based on VSM and AMMK-means and the item portraits of the information that the user has browsed;

The construction of the interest model includes:

Obtain the item portraits of the information that the user has browsed, and use VSM to represent the item portraits of the information that the user has browsed;

Clustering the item portraits of the information that the user has browsed through AMMK-means, and using the clustering results as the information category that the user is interested in;

The weight of each information category is calculated according to the number of information browsed by the user contained in each information category and the total number of information browsed by the user;

Generate a user profile based on information categories that the user is interested in and weights of the information categories;

Calculate the similarity between the item image of the candidate information and the item image in the user image;

The clustering of the item portraits of the information that the user has browsed by AMMK-means and taking the clustering results as the information category that the user is interested in includes:

Generate a dataset based on the item portraits of the information that the user has browsed;

Determine the cluster center and the number of cluster centers by using the maximum and minimum distance clustering algorithm for the samples in the data set;

The number of cluster centers is used as the K value in the K-means algorithm, and all the obtained cluster centers are used as the initial cluster centers in the K-means clustering algorithm;

Based on the distance between each sample in the data set and each initial cluster center, the clustering result is obtained when the set constraints are met;

The clustering results are used as information categories that users are interested in;

Determining the cluster centers and the number of cluster centers by using the maximum and minimum distance clustering algorithm for the samples in the data set includes:

Calculate the average value of sample attributes, calculate the distance between each sample and the average value, and take the sample corresponding to the minimum distance as the first cluster center C ₁ ;

Select the sample farthest from C ₁ as the second cluster center C ₂ ;

Calculate the distances D _i1 and D _i2 from all remaining samples to C ₁ and C _2. If D _l = max{min(D _i1 , D _i2 ), i = 1, 2, ... n}, and D _l > θD ₁₂ , θ is a given value, D ₁₂ is the distance between C ₁ and C ₂ , then take x _l as the third cluster center C ₃ ;

If C ₃ exists, calculate D _j = max{min(D _i1 , D _i2 , D _i3 ), i = 1, 2, ... n, if D _j > θ D ₁₂ , establish the fourth cluster center;

And so on, until the maximum and minimum distances are no greater than θD ₁₂ , the calculation of finding the cluster center is finished, and the cluster center and the number of cluster centers are obtained;

The expression of the interest model is shown in the following formula:

Where: V _seat represents the user portrait; w _m represents the weight of the mth information category; T _m represents the feature vector of the mth information category;

The similarity is calculated as follows:

Where: seat is the item portrait in the user portrait, _wi is the weight of the information category to ^which the candidate information _di belongs, _TiT is the feature vector of the information category to which the candidate information _di belongs, is the eigenvector of d _i .

2. An information recommendation system based on VSM and AMMK-means, characterized by comprising:

An acquisition module is used to obtain an item image of each candidate information;

A similarity calculation module is used to substitute the item portrait of each candidate information into a pre-built interest model to obtain the similarity between each candidate information and the user portrait;

A recommendation module, used for recommending the candidate information with the highest similarity to the user;

The system further includes a construction module of an interest model; the construction module of the interest model includes:

The first construction unit is used to obtain the item portraits of the information that the user has browsed, and use VSM to represent the item portraits of the information that the user has browsed;

An information category construction unit, used for clustering the item portraits of the information that the user has browsed through AMMK-means, and using the clustering results as the information category that the user is interested in;

A weight calculation unit, used to calculate the weight of each information category according to the number of information browsed by the user contained in each information category and the total number of information browsed by the user;

A user portrait building unit, used to generate a user portrait based on the information categories that the user is interested in and the weights of the information categories;

A calculation unit, used to calculate the similarity between the item portrait of the candidate information and the item portrait in the user portrait;

The information category construction unit specifically includes:

The information category construction unit determines the cluster centers and the number of cluster centers by using the maximum and minimum distance clustering algorithm for the samples in the data set, specifically including:

Select the sample farthest from C ₁ as the second cluster center C ₂ ;

The expression of the interest model is shown in the following formula:

V _seat =(w ₁ *T ₁ ,w ₂ *T ₂ ,…,w _m *T _m ) ^T

The similarity is calculated as follows:

Where: seat is the item portrait in the user portrait, _wi is the weight of the information category to which the candidate information _di belongs, _TiT is the feature vector ^of the information category to which the candidate information _di belongs, is the eigenvector of d _i .

3. A storage device storing a plurality of program codes, characterized in that the program codes are suitable for being loaded and run by a processor to execute the information recommendation method based on VSM and AMMK-means described in claim 1.

4. A control device, comprising a processor and a storage device, wherein the storage device is suitable for storing multiple program codes, characterized in that the program codes are suitable for being loaded and run by the processor to execute the information recommendation method based on VSM and AMMK-means described in claim 1.