Disclosure of Invention
The invention aims to provide a scientific and technological achievement management system for enterprises, which solves the technical problems that:
The aim of the invention can be achieved by the following technical scheme:
A system for managing a scientific achievement of an enterprise, comprising:
The data acquisition module is used for acquiring a browsing text set and an unbrown text set which are currently searched by a user in a calibration period T, and acquiring the interaction times J and the watching time T of any browsing text of the user, wherein the user interaction comprises clicking and sliding;
The text recommendation module is used for determining an interest value G corresponding to any browsing text through the watching time T and the user interaction times J of the user in the browsing text, and calculating the average value of the watching time And the average of the number of user interactionsCalculating a threshold G ́, when G is more than or equal to G ́, marking the text as a target text, when G is less than G ́, marking the text as a non-target text, generating a comprehensive text by all target texts, respectively preprocessing the comprehensive text and any one of the non-browsed texts to obtain a preprocessed comprehensive text word vector set and a preprocessed any one of the non-browsed text word vector sets, respectively aggregating according to the comprehensive text word vector set and the any one of the non-browsed text word vector sets to obtain a comprehensive text overall vector A and a non-browsed text overall vector D n, wherein n is any one of the non-browsed texts;
and obtaining an included angle cosine value Q n by calculating the integrated text overall vector A and any one of the unbrown text overall vectors D n, obtaining all unbrown text characteristic values X according to the included angle cosine value Q n, sorting all unbrown texts according to the corresponding characteristic values X from large to small, generating a sorting result, and recommending the unbrown texts to a user according to the sorting result.
The data acquisition module further comprises a step of screening a text database according to the search words input by the user to obtain search data, and a step of sorting the search data according to the publication sequence of the text and the relevance of the article titles and the search words.
In the data processing module, the preprocessing comprises sentence division of the text, word segmentation, stop word removal and normalization processing of the divided text sentences.
In the text recommendation module, the specific process of aggregating the word vector sets comprises the following steps:
Embedding word vectors of all different dimensions into the highest dimensional representation in the word vector of all target texts and the word vector of any unbrown text, and according to a calculation formula ,Wherein, C a is any a word vector in all target texts, b is the total number of all target texts, C f is any f word vector in any unbrown text, and r is the total number of any unbrown text.
In the text recommendation module, the specific acquisition process of the target text comprises the following steps:
The interest value G corresponding to any browsing text is determined through the watching time T and the user interaction times J of the user in any browsing text, and the average value of the watching time is used for determining And the average of the number of user interactionsThe threshold G ́ is calculated as follows:
;;
G=Ti+Ji/y; G ́=+/y;
Wherein i is any browsing text, y is a preset coefficient, when G is more than or equal to G ́, the text is marked as a target text, and when G < G ́, the text is marked as a non-target text.
In the text recommendation module, the text cosine similarity has the following calculation formula:
;
Where a x D n represents the dot product of vectors a and D n, and |a| and |d n | represent the modular length of vector a and vector D n.
In the text recommendation module, the cosine value Q n of the passing included angle is obtained according to a calculation formula X n=k*Qn to obtain a characteristic value X n, wherein k is a preset coefficient, k is greater than 0, and the characteristic value X n represents a characteristic value of a user on any nth unbrown text.
In the data acquisition module, if the initial time of the calibration period is not first searched, acquiring all historical integrated text overall vectors in the historical calibration period, calculating cosine similarity Q ́ of the current integrated text overall vector A and all historical integrated text overall vectors, screening out historical integrated text overall vectors with Q ́ being more than or equal to 0.7 and marking the historical integrated text overall vectors as similar vectors, acquiring all historical integrated text word vector sets and the current integrated text word vector sets corresponding to the similar text vectors, comparing the two word vector sets, screening out repeated word vectors and generating a repeated word vector set, polymerizing the repeated word vector sets to obtain an overall correction vector, calculating the overall correction vector and any unbeared text overall vector D n to obtain an included angle cosine value Q 1, calculating a new feature value X xn through the included angle cosine value Q 1 and a feature value X n of a current user, sorting the unbeared texts according to the new feature value X, generating a sorting formula, recommending corresponding texts to the user according to the sorting result, and calculating as follows:
;
wherein p and m are preset coefficients, and p > m >0;
if the starting time of the calibration period is the first search, no correction is performed.
The method has the advantages that a browsing text set and an unbrown text set of a user after current retrieval are obtained in a calibration period based on a current retrieval scene, browsing texts of the user are divided into target texts and unbrown texts through behavior data, cosine similarity calculation is conducted on the target texts and the unbrown texts, characteristic values corresponding to the unbrown texts by the user are determined, the unbrown texts are recommended to the user according to the characteristic value sequence, if the initial time of the calibration period is not the first retrieval, a correction module is started, the current characteristic values are corrected, new characteristic values are calculated, the unbrown texts are recommended to the user according to the characteristic value sequence, reading efficiency of the user is improved through screening of texts of interest of the user, and therefore efficiency of scientific and technological achievement management and conversion is improved.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention is based on the scene after the user searches through the key word, through the second screening of the user to the text which is left unbrown and is obtained after the search, the browsing text set and unbrown text set after the user searches are obtained in the calibration period based on the current search scene, the browsing text of the user is divided into the target text and the unbrown text through the behavior data, the cosine similarity calculation is carried out on the target text and the unbrown text, the characteristic value corresponding to the unbrown text of the user is determined, the characteristic value is ordered and recommended to the user according to the characteristic value, if the initial moment of the calibration period is not the first search, the correction module is started to correct the current characteristic value, the new characteristic value is calculated, the unbrown text is recommended to the user according to the characteristic value ordering, the reading efficiency of the user is improved through the screening of the text which is interested by the user, and the efficiency of scientific and technological achievement management and conversion is improved.
Referring to fig. 1, the present invention is a system for managing scientific achievements of enterprises, comprising:
The data acquisition module is used for acquiring a browsing text set and an unbrown text set which are currently searched by a user in a calibration period T, and acquiring the interaction times J and the watching time T of any browsing text of the user, wherein the user interaction comprises clicking and sliding;
The text recommendation module is used for determining an interest value G corresponding to any browsing text through the watching time T and the user interaction times J of the user in the browsing text, and calculating the average value of the watching time And the average of the number of user interactionsCalculating a threshold G ́, when G is more than or equal to G ́, marking the text as a target text, when G is less than G ́, marking the text as a non-target text, generating a comprehensive text by all target texts, respectively preprocessing the comprehensive text and any one of the non-browsed texts to obtain a preprocessed comprehensive text word vector set and a preprocessed any one of the non-browsed text word vector sets, respectively aggregating according to the comprehensive text word vector set and the any one of the non-browsed text word vector sets to obtain a comprehensive text overall vector A and a non-browsed text overall vector D n, wherein n is any one of the non-browsed texts;
and obtaining an included angle cosine value Q n by calculating the integrated text overall vector A and any one of the unbrown text overall vectors D n, obtaining all unbrown text characteristic values X according to the included angle cosine value Q n, sorting all unbrown texts according to the corresponding characteristic values X from large to small, generating a sorting result, and recommending the unbrown texts to a user according to the sorting result.
It will be appreciated that the dwell time T is the logic that determines the user's interest in the text by the number J of user interactions, as the user's interest in a segment of text increases the interaction with the text and the dwell time.
It can be understood that the text of interest of the user is assembled into a comprehensive text, the whole vector A is used as a center vector, then cosine similarity between the rest of the unbrown text and the comprehensive text is calculated to obtain the similar text of interest of the user, when the search is not performed for the first time in the current calibration period, the historical comprehensive text vector is obtained, the whole vector A of the current comprehensive text is used as the center vector, the highly similar comprehensive text is screened out, then the repeated word vectors are screened out through the word vectors of all the similar comprehensive texts and the word vector of the current comprehensive text, the repeated word vectors are aggregated into corrected whole vectors, finally the cosine similarity between the rest of the unbrown text and the corrected whole vector is calculated again to correct the characteristic value of the current user.
In a preferred embodiment of the present invention, the data obtaining module further includes screening the text database according to the search terms input by the user to obtain search data, and sorting the search data according to the publication order of the text, and the relevance between the article titles and the search terms.
It can be understood that the invention is based on the scene of the user after the keyword is searched, and the text database is screened according to the search word input by the user, so that the search data highly related to the user requirement can be ensured to be obtained, thereby improving the relevance and the accuracy of the information.
In a preferred embodiment of the present invention, in the data processing module, the preprocessing includes sentence division for the text, word segmentation for the divided text sentence, stop word removal, and normalization.
It can be appreciated that by sentence division of text, text structure can be more accurately analyzed and understood, thereby improving accuracy of subsequent processing. The keyword segmentation and the stop word removal can effectively extract key information in the text, reduce interference of irrelevant contents, and enable subsequent data analysis to be more efficient. The normalization processing can unify words in different forms into a standard format, and reduces complexity caused by word diversity, so that subsequent comparison and analysis are facilitated.
In a preferred embodiment of the present invention, in the text recommendation module, the specific process of aggregating a set of word vectors:
Embedding word vectors of all different dimensions into the highest dimensional representation in the word vector of all target texts and the word vector of any unbrown text, and according to a calculation formula ,Wherein, C a is any a word vector in all target texts, b is the total number of all target texts, C f is any f word vector in any unbrown text, and r is the total number of any unbrown text.
When the method is used for understanding, word vectors with different dimensions are embedded into unified high-dimensional representation, so that the consistency of data is ensured, and the subsequent calculation and comparison are facilitated. The aggregation process may reduce computational complexity so that the system can generate recommendations more quickly when processing large-scale text.
In a preferred embodiment of the present invention, in the text recommendation module, a specific acquisition process of the target text:
The interest value G corresponding to any browsing text is determined through the watching time T and the user interaction times J of the user in any browsing text, and the average value of the watching time is used for determining And the average of the number of user interactionsThe threshold G ́ is calculated as follows:
;;
G=Ti+Ji/y; G ́=+/y;
Wherein i is any browsing text, y is a preset coefficient, when G is more than or equal to G ́, the text is marked as a target text, and when G < G ́, the text is marked as a non-target text.
It can be appreciated that the interest value is calculated through the watching time and the interaction times, so that the real interest of the user to the text can be reflected more accurately, and the individuation of the recommendation is enhanced. The dynamic threshold G' is calculated by using the mean value, so that the recommendation system can adapt to the behavior modes and preferences of different users, and has higher flexibility and adaptability. By focusing on the content which is really interested by the user, the interference of irrelevant information can be reduced, and the satisfaction and the use experience of the user are improved.
In a preferred embodiment of the present invention, in the text recommendation module, a calculation formula of the text cosine similarity is as follows:
;
Where a x D n represents the dot product of vectors a and D n, and |a| and |d n | represent the modular length of vector a and vector D n.
In a preferred embodiment of the present invention, in the text recommendation module, the through-included angle cosine value Q n obtains a feature value X n according to a calculation formula X n=k*Qn, where k is a preset coefficient, and k >0, and the feature value X n represents a feature value of a user on any nth unviewed text.
In a preferred embodiment of the present invention, in the data acquisition module, if the initial time of the calibration period is not the first search, all the history integrated text overall vectors in the history calibration period are acquired, the cosine similarity Q ́ is calculated for the current integrated text overall vector a and all the history integrated text overall vectors, the history integrated text overall vectors with Q ́ equal to or greater than 0.7 are screened out and marked as similar vectors, all the history integrated text word vector sets and the current integrated text word vector sets corresponding to the similar text vectors are acquired, the two word vector sets are compared and screened out to generate a repeated word vector set, the repeated word vector sets are aggregated to obtain an overall correction vector, the overall correction vector and any one of the unbrown text overall vectors D n are calculated to obtain an included angle cosine value Q 1, a new feature value X xn is calculated through the included angle cosine value Q 1 and the feature value X n of the current user, all the unbrown texts are ordered from large to small according to the new feature value X, a sequencing result is generated, and the text corresponding to a user recommendation formula is calculated according to the sequencing result is calculated as:
;
wherein p and m are preset coefficients, and p > m >0;
if the starting time of the calibration period is the first search, no correction is performed.
It can be understood that by acquiring the historical comprehensive text vector, the accuracy of current recommendation can be improved by utilizing the existing data, and the learning ability of the system is enhanced. The cosine similarity Q' is used for screening similar texts, and historical target texts similar to the current target text in terms of semantics can be effectively identified, so that the recommendation correlation is improved. And comparing and screening the repeated word vectors and generating an overall correction vector, redundant information can be reduced, and the final recommendation result is more refined and effective.
The foregoing describes one embodiment of the present invention in detail, but the description is only a preferred embodiment of the present invention and should not be construed as limiting the scope of the invention. All equivalent changes and modifications within the scope of the present invention are intended to be covered by the present invention.