CN105677695B - A method of the calculating mobile application similitude based on content - Google Patents
A method of the calculating mobile application similitude based on content Download PDFInfo
- Publication number
- CN105677695B CN105677695B CN201510776878.9A CN201510776878A CN105677695B CN 105677695 B CN105677695 B CN 105677695B CN 201510776878 A CN201510776878 A CN 201510776878A CN 105677695 B CN105677695 B CN 105677695B
- Authority
- CN
- China
- Prior art keywords
- app
- keyword
- weight
- checked
- content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/243—Natural language query formulation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The method for the calculating mobile application similitude based on content that the present invention relates to a kind of.Include the following steps: after obtaining a large amount of mobile application information, carry out the extraction of mobile application information, including Apply Names, application type, using description and application size etc.;It will be segmented using description information;Content after participle is divided into two parts, the training corpus after portion integration as the model of word2vec, another is saved as the form of document sets and carries out the calculating of TF-IDF, and result is then stored in HBase data warehouse;Carry out app similarity query and calculating.Implement a kind of method of calculating mobile application similitude based on content of the invention, have the advantages that the similarity query that can quickly respond app, app feature and description information based on content can well generation refer to app, accuracy rate is high, and the accuracy rate of search and the recommendation of app can be improved.
Description
Technical field
It is the present invention relates to data information retrieval and recommender system field, in particular to a kind of to be completed by information retrieval
The method of calculating mobile application similitude based on feature.
Background technique
With increasingly prosperous and " internet+" the proposition of mobile Internet, the convenience of mobile Internet and efficiently
Property is more and more well known.The it is proposed of O2O (Online To Offline, under line on line) concept and various lines are online
Under application, not only rapidly promoted the dealing of commodity, also greatly enriched people's lives.
In the life of public " internet+", the mobile application (Mobile Applications, abbreviation app) of magnanimity
In occupation of the sufficient consequence of act.Domestic major mobile application market provides strong support for public app demand.?
In mobile application market, user often searches for the app of oneself needs.But under conditions of such magnanimity, to as amateur
For the public users of personnel, it may appear that the result much searched for not is the case where oneself is needed.Therefore, it is badly in need of a kind of side
Method can provide some similar app while user query correlation app for user, possible to meet user
Rough inquiry etc..Simultaneously in recommender system, it is some similar with app that is installing on user terminal to be actively that user recommends
Mobile application recommends mobile application that the accuracy rate of recommendation can be improved according to the hobby of user.
The existing Similarity measures for application have the Similarity measures based on bottom code and interface.These are based on
The Similarity measures of code layer can not directly reflect the semantic requirement of ordinary user, and the mobile application app for developing completion is
Complete .apk file, can not obtain the code details of its bottom, therefore be not suitable with the current demand of user.
For the Similarity measures of application, while there are also the similarity calculation methods based on app content.It is most to be based on
The similarity calculation method of content is the description information based on app, because description information is can to describe an app itself to compare
The data of authority.But the calculation method of existing description information is generally based on bag of words to do.Bag of words do not have
Consider the sequence between word and word, thus has ignored the context relation of many words, when calculating the similitude between vector, than
Such as two near synonym, due to not being the same word, it is more likely that so that similitude becomes smaller and very big error occurs.
Meanwhile similitude is calculated in application, existing most methods are not by other such as titles, classification of app and greatly
The information such as small are taken into account.And the comment information of such as app is also added thereto by some methods.According to us it has been observed that app
Comment information quality it is excessively poor, be generally unable to respond the true content of app.
Therefore, for drawbacks described above present in current existing technology, it is necessary to it is studied, a kind of scheme is provided,
Defect existing in the prior art is solved, enables similarity calculation method deeper dependent on app characteristic information.
Summary of the invention
The purpose of the present invention is to provide the similarity calculation methods of mobile application app a kind of, for preferably from magnanimity
The most like app of some app is found in the library app, so as to improve app search accuracy rate and recommendation success rate.
To achieve the above object, the technical solution of the present invention is as follows:
A method of the calculating mobile application similitude based on content includes the following steps:
S10. a large amount of app data are crawled and carry out the feature arrangement of data, the feature put in order is saved in database,
A feature database is established for inquiry;
S20. it according to the characteristic information of app to be checked, is inquired and is calculated in the feature database, found out to be checked
The similar app of app;The characteristic information of the app to be checked is provided or is inquired from the feature database by user and obtained.
Further, step S10 the following steps are included:
S101. a large amount of app data are crawled, structuring is deposited into database after arranging;
S102. the description information of app each in the database is individually integrated into file, then segmented respectively;
S103. the data obtained after the completion of participle, copy merge as complete corpus, then use word2vec
Carry out the training of corpus;Another copy carries out the calculating of TF-IDF between each document, obtains then according to original file structure
The weight of all keywords in each document;
S104. by the keyword being calculated and its weight write-in HBase, each app packet name is corresponded to wherein going, column pair
Should all keywords, be worth for keyword weight, establish feature database for inquiry;
S105. the title of app, type, the similitude of four aspect features of description and application size and with respective are calculated
Weight integrated, the similarity last as algorithm.
Further, step S20 the following steps are included:
S201. the packet name for the app to be inquired is obtained as its unique name;
S202. in the feature Kuku in HBase, lateral inquiry is carried out according to the packet name of app, it is all to find out this app
Keyword;
S203. for each keyword, K near synonym are expanded before finding out this keyword using word2vec respectively
Exhibition;
S204., keyword after extension is carried out to the integration of weight, and picks out its top n keyword as this app's
Absolute keyword;
S205. according to absolute keyword, by the feature database in column inquiry HBase, by the corresponding institute of the absolute keyword
Some app are checked out, and the weight of app is integrated;
S206. the similar value for calculating separately title between these app and app to be checked, classification and size, then by this
The similar value of description information, title, classification and size between a little app and app to be checked is integrated according to respective weight,
As the similarity between these app and app to be checked;
S207. by the app after integration according to weight descending arrange, establish app similitude sequence, weight it is bigger be
More similar app.
The beneficial effects of the present invention are: provide the similarity calculation method of mobile application app a kind of, for preferably from
The most like app of some app is found in the library magnanimity app, so as to improve app search accuracy rate and recommendation success rate.Tool
In terms of body surface is now following:
1) description information of app is used, while carrying out the calculating of near synonym using word2vec, it can not only be anti-well
The specific semantic content of app is reflected, while can preferably excavate nearly justice therein in conjunction with the context relation in description information
Word feature;
2) title, type, size and the description information of app are combined, the feature of app, while commenting app are sufficiently used
By etc. the poor information of content foreclose, calculated result is more acurrate comprehensively;
3) use HBase as data warehouse carry out data inquiry, for the app data of magnanimity can faster into
Row processing.
Detailed description of the invention
Fig. 1 is the flow diagram of the embodiment of the method for the calculating mobile application similitude of the invention based on content.
Specific embodiment
For a further understanding of the present invention, the preferred embodiment of the invention is described below with reference to embodiment, still
It should be appreciated that these descriptions are only further explanation the features and advantages of the present invention, rather than to the claims in the present invention
Limitation.
The method for the calculating mobile application similitude that the present invention provides a kind of based on content, dependent on app title, retouch
State the features such as information, type and size, find with the most similar app of this app, specifically includes the following steps:
S1. a large amount of app data are crawled and carry out the feature arrangement of data, the feature put in order is saved in database,
A feature database is established for inquiry;
S2. it according to the characteristic information of app to be checked, is inquired and is calculated in the feature database, find out app to be checked
Similar app;The characteristic information of the app to be checked is provided or is inquired from the feature database by user and obtained.
With reference to embodiment, above content is described in further detail.
Step S10 crawls the relevant information of a large amount of app from network, including the title of app, classification, size and description
Information, and by these information preservations into relevant database.
Step S20 extracts the description information of all app, and this information is divided into two parts, is respectively calculated, packet
Include following steps:
S201 and S202, obtains all app data from database, and by its title, type, size and description information
It reads out;
App description information is divided into the form of individual document by S203, and the content of each document is being removed stopping first
Word and add retain word under the premise of segmented, its TF-IDF (Term Frequency-then is calculated to entire document sets
Inverse Document Frequency, term frequency-inverse document frequency) value, obtain the keyword and its weight of each document;
App description information after all points of good words is formed a big document by S204, then as
The training corpus of word2vec, is trained;
The resultant content of step S203 is deposited into HBase data warehouse by S205, to carry out based on app description letter
The data retrieval of breath.Using the corresponding app packet name of each document as the rowkey of HBase, using all keywords as HBase's
Column content.When storing the description information of the app after a calculating, packet name is as rowkey, all keyword conducts
Corresponding column, while the weighted value of keyword is as the corresponding value of column.It is corresponding that certain app can not only quickly be searched in this way
Information, while can dynamically extend its corresponding keyword, convenient search;
Step S30 looks for a kind of method that can will carry out weight adjustment according to correlation result, by the title of app, class
Type, size and description information are integrated, and under conditions of keeping obtaining optimal similar app, calculate this using multiple groups case
The optimal weight of four combinations of attributes.
Step S40, by data preparation after, the searching step for carrying out similar app can be started, content is further
Include the following steps:
S401 obtains the packet name of app to be retrieved;
S402 retrieves it as row corresponding to rowkey, and therefrom according to app packet name to be retrieved in HBase
Find its corresponding all keyword and weight;
All keywords are carried out synonym extension using the training result of word2vec, and will expand and by S403
Word calculate its weight weight, then identical word is merged, while weight is superimposed;
S404, according to the keyword and its weight expanded, after the weight in each word column is normalized, in HBase
Its corresponding app is longitudinally searched in data warehouse.Each word corresponds to multiple app, then calculates the weight of each app, goes forward side by side
Row integration, descending simultaneously filter out the multiple apps most like according to description information;
S405, the app obtained according to S404 extract the information such as their title, type, size;
S406 calculates app title using the method for editing distance and retrieves the similitude of the title of app;
S407 calculates the type of app and the similitude of retrieval app type using the method for Taxonomic discussion:
S408 calculates the size of app and the similitude of retrieval app.Use formula:
Wherein, a is app to be retrieved, and x is calculated based on the similar each app of description information, size in S404maxFor
The size of the app of the occupancy maximum space of these similar app, sizeminFor the app of the occupancy minimum space of these similar app
Size.
S409, the weight by the title of app, type, size and the calculated similarity of description information according to each attribute
It is weighted integration, obtains final a similarity value namely following formula:
Similarity=λ1Simname+λ2Simcategory+λ3Simsize+λ4Simdescription
Wherein, name refers to app title, and category refers to app type, and size refers to the size of app, and description refers to app
Description information, respectively refers to the weight of calculated app title when integrating various aspects weight, type, size and description information, and has
λ1+λ2+λ3+λ4=1.
Then result is ranked up and is filtered according to Similarity value, obtain most like one or more to the end
app。
The above description of the embodiment is only used to help understand the method for the present invention and its core ideas.It should be pointed out that pair
For those skilled in the art, without departing from the principle of the present invention, the present invention can also be carried out
Some improvements and modifications, these improvements and modifications also fall within the scope of protection of the claims of the present invention.
Claims (2)
1. a kind of method of the calculating mobile application similitude based on content, which comprises the steps of:
S10. a large amount of app data are crawled and carry out the feature arrangement of data, the feature put in order is saved in database, is established
One feature database specifically includes for inquiry:
S101. a large amount of app data are crawled, structuring is deposited into database after arranging;
S102. the description information of app each in the database is individually integrated into file, then segmented respectively;
S103. the data obtained after the completion of participle, copy are merged as complete corpus, are then carried out using word2vec
The training of corpus;Another copy carries out the calculating of TF-IDF between each document, obtains each then according to original file structure
The weight of all keywords in document;
S104. by the keyword being calculated and its weight write-in HBase, wherein row corresponds to each app packet name, corresponding institute is arranged
There is keyword, be worth for keyword weight, establishes feature database for inquiry;
S105. the title of app, type, the similitude of four aspect features of description and application size and with respective power are calculated
It is integrated again, the similarity last as algorithm;
S20. it according to the characteristic information of app to be checked, is inquired and is calculated in the feature database, find out app's to be checked
Similar app;The characteristic information of the app to be checked is provided or is inquired from the feature database by user and obtained.
2. the method for the calculating mobile application similitude according to claim 1 based on content, which is characterized in that step
S20 the following steps are included:
S201. the packet name for the app to be inquired is obtained as its unique name;
S202. in the feature Kuku in HBase, lateral inquiry is carried out according to the packet name of app, finds out all keys of this app
Word;
S203. for each keyword, K near synonym are extended before finding out this keyword using word2vec respectively;
S204., keyword after extension is carried out to the integration of weight, and picks out its top n keyword as the absolute of this app
Keyword;
S205. by the feature database in column inquiry HBase, the absolute keyword is corresponding all according to absolute keyword
App is checked out, and the weight of app is integrated;
S206. the similar value for calculating separately title between these app and app to be checked, classification and size, then by these
The similar value of description information, title, classification and size between app and app to be checked is integrated according to respective weight, is made
For the similarity between these app and app to be checked;
S207. the app after integration is arranged according to weight descending, establishes the similitude sequence of app, weight it is bigger be more phase
As app.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510626874 | 2015-09-28 | ||
| CN2015106268742 | 2015-09-28 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN105677695A CN105677695A (en) | 2016-06-15 |
| CN105677695B true CN105677695B (en) | 2019-03-08 |
Family
ID=56946915
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201510776878.9A Active CN105677695B (en) | 2015-09-28 | 2015-11-13 | A method of the calculating mobile application similitude based on content |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN105677695B (en) |
Families Citing this family (8)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108255522A (en) * | 2016-12-27 | 2018-07-06 | 北京金山云网络技术有限公司 | A kind of application program sorting technique and device |
| CN108319449B (en) * | 2017-01-16 | 2021-07-20 | 北京金山云网络技术有限公司 | Method and device for determining application architecture |
| CN109002441A (en) * | 2017-06-06 | 2018-12-14 | 阿里巴巴集团控股有限公司 | Determination method, the exception of Apply Names similarity apply detection method and system |
| CN108170664B (en) * | 2017-11-29 | 2021-04-09 | 有米科技股份有限公司 | Key word expansion method and device based on key words |
| CN108170665B (en) * | 2017-11-29 | 2021-06-04 | 有米科技股份有限公司 | Keyword expansion method and device based on comprehensive similarity |
| CN108182201B (en) * | 2017-11-29 | 2020-06-30 | 有米科技股份有限公司 | Application expansion method and device based on key keywords |
| CN108804492B (en) * | 2018-03-27 | 2022-04-29 | 阿里巴巴(中国)有限公司 | Method and device for recommending multimedia objects |
| CN113868533B (en) * | 2021-09-30 | 2025-04-22 | 北京达佳互联信息技术有限公司 | Application search method, device, electronic device and storage medium |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103530339A (en) * | 2013-10-08 | 2014-01-22 | 北京百度网讯科技有限公司 | Mobile application information push method and device |
| CN103955536A (en) * | 2014-05-15 | 2014-07-30 | 深圳市中兴移动通信有限公司 | Classification method and device of applications |
| CN104424307A (en) * | 2013-09-04 | 2015-03-18 | 腾讯科技(深圳)有限公司 | Intelligent terminal application classifying method, system and intelligent terminal, |
| CN104750798A (en) * | 2015-03-19 | 2015-07-01 | 腾讯科技(深圳)有限公司 | Application program recommendation method and device |
| CN104778178A (en) * | 2014-01-13 | 2015-07-15 | 腾讯科技(深圳)有限公司 | Application classification method, application classification device and service server |
| CN104866526A (en) * | 2015-04-21 | 2015-08-26 | 惠州Tcl移动通信有限公司 | Intelligent terminal and method for recommending applications thereof |
Family Cites Families (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8190634B2 (en) * | 2006-10-10 | 2012-05-29 | Canon Kabushiki Kaisha | Image display controlling apparatus, method of controlling image display, and storage medium |
-
2015
- 2015-11-13 CN CN201510776878.9A patent/CN105677695B/en active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104424307A (en) * | 2013-09-04 | 2015-03-18 | 腾讯科技(深圳)有限公司 | Intelligent terminal application classifying method, system and intelligent terminal, |
| CN103530339A (en) * | 2013-10-08 | 2014-01-22 | 北京百度网讯科技有限公司 | Mobile application information push method and device |
| CN104778178A (en) * | 2014-01-13 | 2015-07-15 | 腾讯科技(深圳)有限公司 | Application classification method, application classification device and service server |
| CN103955536A (en) * | 2014-05-15 | 2014-07-30 | 深圳市中兴移动通信有限公司 | Classification method and device of applications |
| CN104750798A (en) * | 2015-03-19 | 2015-07-01 | 腾讯科技(深圳)有限公司 | Application program recommendation method and device |
| CN104866526A (en) * | 2015-04-21 | 2015-08-26 | 惠州Tcl移动通信有限公司 | Intelligent terminal and method for recommending applications thereof |
Also Published As
| Publication number | Publication date |
|---|---|
| CN105677695A (en) | 2016-06-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN105677695B (en) | A method of the calculating mobile application similitude based on content | |
| CN106227815B (en) | Multi-modal clue personalized application program function recommendation method and system | |
| CN107480158B (en) | Method and system for evaluating matching of content item and image based on similarity score | |
| CN105335519B (en) | Model generation method and device and recommendation method and device | |
| CN107145496B (en) | Method for matching image with content item based on keyword | |
| US9965793B1 (en) | Item selection based on dimensional criteria | |
| CN106156082B (en) | A body alignment method and device | |
| CN107145545B (en) | Top-k area user text data recommendation method in social network based on position | |
| CN103593474B (en) | Image retrieval sort method based on deep learning | |
| CN103455487B (en) | The extracting method and device of a kind of search term | |
| TW201241773A (en) | Method and apparatus of determining product category information | |
| CN107145497B (en) | Method for selecting image matched with content based on metadata of image and content | |
| EP2788896B1 (en) | Fuzzy full text search | |
| CN107463592B (en) | Method, device and data processing system for matching a content item with an image | |
| CN105468790B (en) | A kind of comment information search method and device | |
| Hauff et al. | Placing images on the world map: a microblog-based enrichment approach | |
| CN105426550B (en) | Collaborative filtering label recommendation method and system based on user quality model | |
| CN106156157B (en) | Electronic book navigation system and method | |
| CN102831224B (en) | Generation method and device are suggested in a kind of method for building up in data directory library, search | |
| CN103714092A (en) | Geographic position searching method and geographic position searching device | |
| CN104199875A (en) | Search recommending method and device | |
| CN104424257A (en) | Information indexing unit and information indexing method | |
| CN106294358A (en) | The search method of a kind of information and system | |
| EP2788897B1 (en) | Optimally ranked nearest neighbor fuzzy full text search | |
| CN106682190A (en) | Construction method and device of label knowledge base, application search method and server |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |