CN106447363A - Method and device for mining user category information - Google Patents
Method and device for mining user category information Download PDFInfo
- Publication number
- CN106447363A CN106447363A CN201510475686.4A CN201510475686A CN106447363A CN 106447363 A CN106447363 A CN 106447363A CN 201510475686 A CN201510475686 A CN 201510475686A CN 106447363 A CN106447363 A CN 106447363A
- Authority
- CN
- China
- Prior art keywords
- user
- application
- category information
- class
- specified
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本发明公开了一种挖掘用户类别信息的方法和装置。该方法包括:对于用户类别信息已知的用户样本集合中的用户所下载的各应用中的每个应用,根据用户样本集合中的不同类别用户下载该应用的次数,为该应用标注用户类别信息;对于一个用户类别信息待挖掘的指定用户,将该指定用户下载的各应用按照标注的用户类别信息进行分类;对于每个用户类别,根据属于该用户类别的应用的相关参数,计算该指定用户在该用户类别的分数值;将该指定用户的用户类别信息确定为与分数值最大的用户类别相同。本发明的技术方案,具有基于已获得的数据样本对用户类别信息进行可靠挖掘的有益效果。
The invention discloses a method and device for mining user category information. The method includes: for each application downloaded by a user in a user sample set whose user category information is known, labeling the application with user category information according to the number of downloads of the application by different types of users in the user sample set ; For a specified user whose user category information is to be mined, classify the applications downloaded by the specified user according to the marked user category information; for each user category, calculate the specified user category according to the relevant parameters of the application belonging to the user category Score value in the user category; determine the user category information of the designated user as being the same as the user category with the largest score value. The technical scheme of the present invention has the beneficial effect of reliably mining user category information based on the obtained data samples.
Description
技术领域technical field
本发明涉及数据挖掘领域,具体涉及一种挖掘用户类别信息的方法和装置。The invention relates to the field of data mining, in particular to a method and device for mining user category information.
背景技术Background technique
定向投放广告是广告投放策略中的精准投放方式,根据不同的目标人群投放不同的广告,当目标人群符合指定的定向特征时,向相应的用户群投放广告,能大大提高广告的精准投放,增大广告投资回报比。对用户类别信息的判断是该问题的核心。传统的挖掘用户类别样本的方法是通过用户注册、问卷调查等方式获取用户类别信息。然而这种方式,由于大多数用户不填写相关信息,以及所填写的信息有误等原因,获得的信息数量较少,且准确率也不高。Targeted advertising is the precise delivery method in the advertising delivery strategy. Different advertisements are placed according to different target groups. When the target group meets the specified targeting characteristics, the advertisement is delivered to the corresponding user group, which can greatly improve the precise delivery of the advertisement and increase the Great advertising ROI. The judgment of user category information is the core of the problem. The traditional method of mining user category samples is to obtain user category information through user registration, questionnaire survey, etc. However, in this way, because most users do not fill in relevant information, and the information filled in is wrong, etc., the amount of information obtained is small, and the accuracy rate is not high.
发明内容Contents of the invention
鉴于上述问题,提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的挖掘用户类别信息的方法。In view of the above problems, the present invention is proposed to provide a method for mining user category information that overcomes the above problems or at least partially solves the above problems.
依据本发明的一个方面,提供了一种挖掘用户类别信息的方法,包括:According to one aspect of the present invention, a method for mining user category information is provided, including:
对于用户类别信息已知的用户样本集合中的用户所下载的各应用中的每个应用,根据用户样本集合中的不同类别用户下载该应用的次数,为该应用标注用户类别信息;For each of the applications downloaded by users in the user sample set whose user category information is known, mark the application with user category information according to the number of downloads of the application by different types of users in the user sample set;
对于一个用户类别信息待挖掘的指定用户,将该指定用户下载的各应用按照标注的用户类别信息进行分类;For a specified user whose user category information is to be mined, classify each application downloaded by the specified user according to the marked user category information;
对于每个用户类别,根据属于该用户类别的应用的相关参数,计算该指定用户在该用户类别的分数值;For each user category, calculate the score value of the specified user in the user category according to the relevant parameters of the application belonging to the user category;
将该指定用户的用户类别信息确定为与分数值最大的用户类别相同。The user category information of the designated user is determined to be the same as the user category with the largest score value.
可选地,对于用户类别信息已知的用户样本集合中的用户所下载的各应用中的每个应用,根据用户样本集合中的不同类别用户下载该应用的次数,为该应用标注用户类别信息包括:Optionally, for each of the applications downloaded by users in the user sample set whose user category information is known, mark the application with user category information according to the number of downloads of the application by different types of users in the user sample set include:
对于所述的每个应用,计算该应用在每个用户类别的用户中的平均下载量;For each application described, calculate the average number of downloads of the application among users in each user category;
为该应用标注与平均下载量最大的用户类别相同的用户类别信息。Label the app with the same user category information as the user category with the largest average downloads.
可选地,对于每个用户类别,根据属于该用户类别的应用的相关参数,计算该指定用户在该用户类别的分数值包括:Optionally, for each user category, according to relevant parameters of applications belonging to the user category, calculating the score value of the specified user in the user category includes:
对于每个用户类别,根据属于该类别的各应用的卡方分值和卡方分值权重,计算该指定用户在该用户类别的分数值。For each user category, calculate the score value of the specified user in the user category according to the chi-square score and the chi-square score weight of each application belonging to the category.
可选地,对于每个用户类别,根据属于该类别的各应用的卡方分值和卡方分值权重,计算该指定用户在该用户类别的分数值包括:Optionally, for each user category, calculating the score value of the specified user in the user category according to the chi-square score and the chi-square score weight of each application belonging to the category includes:
根据如下公式计算指定用户u在用户类别L的分数值S(u)L:Calculate the score value S(u) L of the specified user u in the user category L according to the following formula:
其中,si是该指定用户下载的第i个应用的卡方分值,wi是该指定用户下载的第i个应用的卡方分值权重,Li是该指定用户下载的第i个应用所属的用户类别。Among them, s i is the chi-square score of the i-th application downloaded by the specified user, w i is the chi-square score weight of the i -th application downloaded by the specified user, and Li is the i-th application downloaded by the specified user The user category to which the app belongs.
可选地,当只有两种用户类别时,根据如下公式计算一个应用的卡方分值s:Optionally, when there are only two user categories, the chi-square score s of an application is calculated according to the following formula:
s=(A+B+C+D)(A*D-B*C)2/((A+C)*(B+D)*(A+B)*(C+D)).s=(A+B+C+D)(A*DB*C) 2 /((A+C)*(B+D)*(A+B)*(C+D)).
其中,A表示用户样本集合中的属于第一用户类别的用户下载该应用的次数,B表示用户样本集合中的属于第二用户类别的用户下载该应用的次数;C表示用户样本集合中的属于第一用户类别的用户不下载该应用的次数,D表示用户样本集合中的属于第二用户类别的用户不下载该应用的次数。Among them, A represents the number of downloads of the application by users belonging to the first user category in the user sample set; B represents the number of downloads of the application by users belonging to the second user category in the user sample set; C represents the number of downloads of the application in the user sample set. The number of times that users of the first user category do not download the application, and D represents the number of times that users belonging to the second user category in the user sample set do not download the application.
可选地,当只有两种用户类别时,一个应用的卡方分值权重w等于该应用在两个用户类别中的较大平均下载量与较小平均下载量的比值,具体根据如下公式计算一个应用的卡方分值权重w:Optionally, when there are only two user categories, the chi-square score weight w of an application is equal to the ratio of the larger average download volume to the smaller average download volume of the application in the two user categories, specifically calculated according to the following formula An applied chi-square score weight w:
w=round(max(b(L))/min(b(L)))w=round(max(b(L))/min(b(L)))
其中,b(L)是一个用于在用户分类L的用户中的平均下载量。where b(L) is an average download for users in user category L.
可选地,该方法进一步包括:Optionally, the method further includes:
确定该指定用户的用户类别信息后,将该指定用户添加到所述用户样本集合中。After the user category information of the designated user is determined, the designated user is added to the user sample set.
可选地,该方法进一步包括:Optionally, the method further includes:
当添加到所述用户样本集合中的用户数量达到预设值时,重新对用户样本集合中的用户所下载的各应用标注用户类别信息。When the number of users added to the user sample set reaches a preset value, re-mark user category information on each application downloaded by the user in the user sample set.
可选地,该方法进一步包括:Optionally, the method further includes:
对于该指定用户,根据所确定的用户类别信息向该指定用户推送推荐内容。For the designated user, push recommended content to the designated user according to the determined user category information.
依据本发明的另一方面,提供了一种挖掘用户类别信息的装置,其中,该装置包括:According to another aspect of the present invention, a device for mining user category information is provided, wherein the device includes:
标注单元,适于对于用户类别信息已知的用户样本集合中的用户所下载的各应用中的每个应用,根据用户样本集合中的不同类别用户下载该应用的次数,为该应用标注用户类别信息;The labeling unit is suitable for each application downloaded by a user in the user sample set whose user category information is known, and marks the user category for the application according to the number of downloads of the application by different types of users in the user sample set information;
挖掘单元,适于对于一个用户类别信息待挖掘的指定用户,将该指定用户下载的各应用按照标注的用户类别信息进行分类;对于每个用户类别,根据属于该用户类别的应用的相关参数,计算该指定用户在该用户类别的分数值;将该指定用户的用户类别信息确定为与分数值最大的用户类别相同。The mining unit is adapted to classify each application downloaded by the specified user according to the marked user category information for a designated user whose user category information is to be mined; for each user category, according to the relevant parameters of the application belonging to the user category, Calculate the score value of the specified user in the user category; determine the user category information of the specified user to be the same as the user category with the largest score value.
可选地,所述标识单元,适于对于所述的每个应用,计算该应用在每个用户类别的用户中的平均下载量;为该应用标注与平均下载量最大的用户类别相同的用户类别信息。Optionally, the identification unit is adapted to, for each application, calculate the average download volume of the application among users of each user category; mark the application with the same user category as the user with the largest average download volume category information.
可选地,所述挖掘单元,适于对于每个用户类别,根据属于该类别的各应用的卡方分值和卡方分值权重,计算该指定用户在该用户类别的分数值。Optionally, the mining unit is adapted to, for each user category, calculate the score value of the specified user in the user category according to the chi-square score and the chi-square score weight of each application belonging to the category.
可选地,所述挖掘单元,适于根据如下公式计算指定用户u在用户类别L的分数值S(u)L:Optionally, the mining unit is adapted to calculate the score value S(u) L of the specified user u in the user category L according to the following formula:
其中,si是该指定用户下载的第i个应用的卡方分值,wi是该指定用户下载的第i个应用的卡方分值权重,Li是该指定用户下载的第i个应用所属的用户类别。Among them, s i is the chi-square score of the i-th application downloaded by the specified user, w i is the chi-square score weight of the i -th application downloaded by the specified user, and Li is the i-th application downloaded by the specified user The user category to which the app belongs.
可选地,当只有两种用户类别时,所述挖掘单元,适于根据如下公式计算一个应用的卡方分值s:Optionally, when there are only two types of users, the mining unit is adapted to calculate the chi-square score s of an application according to the following formula:
s=(A+B+C+D)(A*D-B*C)2/((A+C)*(B+D)*(A+B)*(C+D)).s=(A+B+C+D)(A*DB*C) 2 /((A+C)*(B+D)*(A+B)*(C+D)).
其中,A表示用户样本集合中的属于第一用户类别的用户下载该应用的次数,B表示用户样本集合中的属于第二用户类别的用户下载该应用的次数;C表示用户样本集合中的属于第一用户类别的用户不下载该应用的次数,D表示用户样本集合中的属于第二用户类别的用户不下载该应用的次数。Among them, A represents the number of downloads of the application by users belonging to the first user category in the user sample set; B represents the number of downloads of the application by users belonging to the second user category in the user sample set; C represents the number of downloads of the application in the user sample set. The number of times that users of the first user category do not download the application, and D represents the number of times that users belonging to the second user category in the user sample set do not download the application.
可选地,当只有两种用户类别时,所述挖掘单元,适于将一个应用的卡方分值权重w设置为等于该应用在两个用户类别中的较大平均下载量与较小平均下载量的比值,更具体地根据如下公式计算一个应用的卡方分值权重w:Optionally, when there are only two user categories, the mining unit is adapted to set the chi-square score weight w of an application to be equal to the larger average download amount and the smaller average download amount of the application in the two user categories The ratio of downloads, more specifically, the chi-square score weight w of an application is calculated according to the following formula:
w=round(max(b(L))/min(b(L)))w=round(max(b(L))/min(b(L)))
其中,b(L)是一个用于在用户分类L的用户中的平均下载量。where b(L) is an average download for users in user category L.
可选地,该装置进一步包括:添加单元,适于在确定该指定用户的用户类别信息后,将该指定用户添加到所述用户样本集合中。Optionally, the apparatus further includes: an adding unit, adapted to add the specified user to the user sample set after determining the user category information of the specified user.
可选地,所述标识单元,适于当添加到所述用户样本集合中的用户数量达到预设值时,重新对用户样本集合中的用户所下载的各应用标注用户类别信息。Optionally, the identification unit is adapted to remark user category information on each application downloaded by a user in the user sample set when the number of users added to the user sample set reaches a preset value.
可选地,该装置进一步包括:推荐单元,适于对于该指定用户,根据所确定的用户类别信息向该指定用户推送推荐内容。Optionally, the device further includes: a recommending unit, adapted to, for the specified user, push recommended content to the specified user according to the determined user category information.
根据本发明的挖掘用户类别信息的方法,可以对于已知用户类别信息的用户样本的集合中用户下载的各应用中的每一个应用,根据不同类别用户下载该应用的次数来标注该应用的用户类别信息,并对于一个待挖掘用户类别信息的指定用户,将其下载的各应用按照标注的用户类别信息进行分类,对于各用户类别根据属于该用户类别的应用的相关参数计算指定用户在该类别用户的分数值,最终由分数值最大的用户类别确定为该指定用户的用户类别信息。由此解决了现有技术获取用户类别信息不真实的问题,取得了基于已获得的数据样本对用户类别信息进行可靠挖掘的有益效果,从而更准确地判断用户类别,针对性地进行广告投放,大大提高广告的精准投放,增加广告投资回报比。According to the method for mining user category information of the present invention, for each application downloaded by the user in the collection of user samples with known user category information, the user of the application can be marked according to the number of times users of different categories download the application Category information, and for a designated user whose user category information is to be mined, classify the downloaded applications according to the marked user category information, and calculate the specified user’s status in this category according to the relevant parameters of the application belonging to the user category for each user category. The score value of the user is finally determined by the user category with the largest score value as the user category information of the specified user. This solves the problem of untruthful user category information obtained in the prior art, and achieves the beneficial effect of reliable mining of user category information based on the obtained data samples, thereby more accurately judging user category, and targeted advertising delivery, Greatly improve the precise placement of advertisements and increase the return on advertising investment.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and understandable , the specific embodiments of the present invention are enumerated below.
附图说明Description of drawings
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiment. The drawings are only for the purpose of illustrating a preferred embodiment and are not to be considered as limiting the invention. Also throughout the drawings, the same reference numerals are used to designate the same components. In the attached picture:
图1示出了本发明一个实施例提供的一种挖掘用户类别信息的方法的流程图;Fig. 1 shows a flow chart of a method for mining user category information provided by an embodiment of the present invention;
图2示出了本发明一个实施例提供的一种挖掘用户类别信息的装置的结构图。Fig. 2 shows a structural diagram of an apparatus for mining user category information provided by an embodiment of the present invention.
具体实施方式detailed description
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.
图1示出了本发明一个实施例提供的一种挖掘用户类别信息的方法的流程图,如图1所示,该方法包括:Fig. 1 shows a flow chart of a method for mining user category information provided by an embodiment of the present invention. As shown in Fig. 1, the method includes:
步骤S110,对于用户类别信息已知的用户样本集合中的用户所下载的各应用中的每个应用,根据用户样本集合中的不同类别用户下载该应用的次数,为该应用标注用户类别信息。Step S110, for each of the applications downloaded by users in the user sample set whose user category information is known, mark the application with user category information according to the number of downloads of the application by different types of users in the user sample set.
用户类别信息可以按照样本获得的数据划分,如按职业分为工人、学生、教师、公务员等,或者按性别分为男性、女性,或者按爱好分为旅游达人、购物达人、技术达人等。User category information can be divided according to the data obtained from the sample, such as workers, students, teachers, civil servants, etc. by occupation, or male and female by gender, or travel experts, shoppers, and technology experts by hobbies. Wait.
步骤S120,对于一个用户类别信息待挖掘的指定用户,将该指定用户下载的各应用按照标注的用户类别信息进行分类。Step S120, for a designated user whose user type information is to be mined, classify the applications downloaded by the designated user according to the marked user type information.
步骤S130,对于每个用户类别,根据属于该用户类别的应用的相关参数,计算该指定用户在该用户类别的分数值。Step S130, for each user category, calculate the score value of the specified user in the user category according to the relevant parameters of the applications belonging to the user category.
步骤S140,将该指定用户的用户类别信息确定为与分数值最大的用户类别相同。In step S140, the user category information of the specified user is determined to be the same as the user category with the largest score value.
例如,以用户类别信息具体指性别信息为例:已获得了一份数据样本集合,包含1000个性别已知的用户样本,分析其中各别用户下载的应用软件,如手机APP的次数。假设该1000个样本用户共下载过1000种APP,对于每一个APP,统计不同性别的用户下载该APP的次数,如对于某点评APP(仅作举例),男性用户有300人下载,女性类别的用户有200人下载(剩余500人未下载),则可能为该APP标注的用户类别信息为男性(用户类别信息的判断方式有多种,在此暂以样本中各类别用户数量的多少来简单的比较进行判断);类比此种方式对余下999个APP进行用户性别信息的统计,即将每个APP标注为男性或者女性。那么对于一个性别信息待挖掘的用户,即判断该用户是男性还是女性的情况下,可将该用户下载的APP按照各APP的相应标注信息进行分类;例如该用户下载了40个APP,这40个APP都已经得到了用户分类信息的标注,其中26个标注为男性,14个标注为女性。此时对于每类APP,根据属于该用户类别的应用的相关参数,计算该指定用户在该用户类别的分数值,假设每个APP的参数都记为1,那么该用户在男性类别的得分为26,女性类别的得分为14,将该指定用户的用户类别信息确定为与分数值最大的用户类别相同,即将该用户判断为男性。For example, take user category information specifically referring to gender information as an example: a data sample set has been obtained, including 1,000 user samples with known gender, and the number of application software downloaded by each user, such as mobile APP, has been analyzed. Assume that the 1000 sample users have downloaded 1000 APPs in total. For each APP, count the number of downloads of the APP by users of different genders. If 200 users have downloaded it (the remaining 500 people have not downloaded it), then the user category information marked on the APP may be male (there are many ways to judge user category information, and here it is simply based on the number of users of each category in the sample judge by comparison); in analogy to this method, the remaining 999 APPs are counted on user gender information, that is, each APP is marked as male or female. Then, for a user whose gender information is to be mined, that is, to determine whether the user is male or female, the APPs downloaded by the user can be classified according to the corresponding labeling information of each APP; for example, the user has downloaded 40 APPs, and the 40 All APPs have been labeled with user classification information, of which 26 are labeled as male and 14 are labeled as female. At this time, for each type of APP, according to the relevant parameters of the application belonging to the user category, the score value of the specified user in the user category is calculated. Assuming that the parameters of each APP are recorded as 1, then the score of the user in the male category is 26. The score of the female category is 14, and the user category information of the specified user is determined to be the same as the user category with the largest score value, that is, the user is judged as a male.
该方法不同于传统技术中的问卷调查、用户注册信息等方式进行用户类别的判断,原因在于上述传统技术中很多用户在填写时,或是为了保护隐私等原因不提供真实信息,或是由于懒于填写,完全不提供信息,得到的用户类别信息不够准确,且收集到大量样本采用机器学习进行分类存在较大的困难,而图1所示方法能够有效解决这一问题。This method is different from the questionnaire survey and user registration information in the traditional technology to judge the user category. The user category information obtained is not accurate enough, and it is difficult to collect a large number of samples and use machine learning to classify them. The method shown in Figure 1 can effectively solve this problem.
在本发明的又一个实施例中,图1所示的方法中,对于用户类别信息已知的用户样本集合中的用户所下载的各应用中的每个应用,根据用户样本集合中的不同类别用户下载该应用的次数,为该应用标注用户类别信息包括:In yet another embodiment of the present invention, in the method shown in FIG. 1 , for each of the applications downloaded by users in the user sample set whose user category information is known, according to the different categories in the user sample set The number of times the user has downloaded the app, and the user category information for the app includes:
对于所述的每个应用,计算该应用在每个用户类别的用户中的平均下载量;为该应用标注与平均下载量最大的用户类别相同的用户类别信息。For each application, calculate the average download volume of the application among users of each user category; label the application with the same user category information as the user category with the largest average download volume.
此实施例通过对每个应用计算其在每个用户类别的用户中的平均下载量,为该应用标注与平均下载量最大的用户类别相同的用户类别信息。依然以上述性别已知的1000个样本用户为例,对于某点评APP(仅作举例),有300个男性用户下载该APP,有200个女性用户下载该APP(剩余500人未下载,这500人中有200男性类别的用户以及300女性类别的用户),则对于男性类别的用户来说,用户样本有500人,下载该APP的有300人,则平均下载量为300/500=0.6。同理可得女性类别的用户中平均下载量为0.4。对比可得平均下载量最大的用户类别为男性,则为该APP标注用户类别信息为男性。类比此方法对剩余APP计算平均下载量并标注用户类别信息。此方法能够更为准确地根据用户样本中应用的相关数据标注用户类别信息。In this embodiment, by calculating the average download volume of each application among users of each user category, the application is marked with the same user category information as the user category with the largest average download volume. Still taking the above-mentioned 1000 sample users with known genders as an example, for a review app (just for example), 300 male users downloaded the app, and 200 female users downloaded the app (the remaining 500 people did not download it, the 500 There are 200 male category users and 300 female category users), then for male category users, there are 500 user samples, and 300 people have downloaded the APP, then the average download volume is 300/500=0.6. Similarly, the average number of downloads among female category users is 0.4. If the comparison shows that the user category with the largest average download volume is male, then the user category information of the APP is marked as male. Analogous to this method, calculate the average downloads for the remaining apps and mark the user category information. This method can more accurately label user category information based on the relevant data of the application in the user sample.
在本发明的又一个实施例中,图1所示的方法中,对于每个用户类别,根据属于该用户类别的应用的相关参数,计算该指定用户在该用户类别的分数值包括:In yet another embodiment of the present invention, in the method shown in FIG. 1, for each user category, calculating the score value of the specified user in the user category according to the relevant parameters of the application belonging to the user category includes:
对于每个用户类别,根据属于该类别的各应用的卡方分值和卡方分值权重,计算该指定用户在该用户类别的分数值。For each user category, calculate the score value of the specified user in the user category according to the chi-square score and the chi-square score weight of each application belonging to the category.
卡方检验是用途非常广的一种假设检验方法,它在分类资料统计推断中的应用,包括:两个率或两个构成比比较的卡方检验;多个率或多个构成比比较的卡方检验以及分类资料的相关分析等。本实施中才用的是卡方分值比较,利用求卡方分值计算该指定用户在该用户类别的分数值能得到更为准确的结果。Chi-square test is a hypothesis testing method with a wide range of uses. Its application in statistical inference of classified data includes: chi-square test for comparison of two rates or two constituent ratios; Chi-square test and correlation analysis of categorical data, etc. Only the chi-square score comparison is used in this implementation, and a more accurate result can be obtained by calculating the score value of the specified user in the user category by calculating the chi-square score.
在本发明的又一个实施例中,上述方法中,对于每个用户类别,根据属于该类别的各应用的卡方分值和卡方分值权重,计算该指定用户在该用户类别的分数值包括:In yet another embodiment of the present invention, in the above method, for each user category, the score value of the specified user in the user category is calculated according to the chi-square score and the chi-square score weight of each application belonging to the category include:
根据如下公式计算指定用户u在用户类别L(当用户类别特指用户性别时,L=男/女)的分数值S(u)L:Calculate the score value S(u) L of the specified user u in the user category L (when the user category specifically refers to the user's gender, L=male/female) according to the following formula:
其中,si是该指定用户下载的第i个应用的卡方分值,wi是该指定用户下载的第i个应用的卡方分值权重,Li是该指定用户下载的第i个应用所属的用户类别。Among them, s i is the chi-square score of the i-th application downloaded by the specified user, w i is the chi-square score weight of the i -th application downloaded by the specified user, and Li is the i-th application downloaded by the specified user The user category to which the app belongs.
在本发明的又一个实施例中,上述方法中,当只有两种用户类别时,根据如下公式计算一个应用的卡方分值s:In yet another embodiment of the present invention, in the above method, when there are only two types of users, the chi-square score s of an application is calculated according to the following formula:
s=(A+B+C+D)(A*D-B*C)2/((A+C)*(B+D)*(A+B)*(C+D)).s=(A+B+C+D)(A*DB*C) 2 /((A+C)*(B+D)*(A+B)*(C+D)).
其中,A表示用户样本集合中的属于第一用户类别的用户下载该应用的次数,B表示用户样本集合中的属于第二用户类别的用户下载该应用的次数;C表示用户样本集合中的属于第一用户类别的用户不下载该应用的次数,D表示用户样本集合中的属于第二用户类别的用户不下载该应用的次数。Among them, A represents the number of downloads of the application by users belonging to the first user category in the user sample set; B represents the number of downloads of the application by users belonging to the second user category in the user sample set; C represents the number of downloads of the application in the user sample set. The number of times that users of the first user category do not download the application, and D represents the number of times that users belonging to the second user category in the user sample set do not download the application.
本实施例适于判断用户类别仅有两类的情况,如用户性别判断,计算更为方便快捷。This embodiment is suitable for judging only two types of user types, such as judging user gender, and the calculation is more convenient and fast.
在本发明的又一个实施例中,上述方法中,当只有两种用户类别时,一个应用的卡方分值权重w等于该应用在两个用户类别中的较大平均下载量与较小平均下载量的比值,具体根据如下公式计算一个应用的卡方分值权重w:In yet another embodiment of the present invention, in the above method, when there are only two user categories, the chi-square score weight w of an application is equal to the larger average download volume and the smaller average download volume of the application in the two user categories. The ratio of downloads, specifically calculate the chi-square score weight w of an application according to the following formula:
w=round(max(b(L))/min(b(L)))w=round(max(b(L))/min(b(L)))
其中,b(L)是一个用于在用户分类L的用户中的平均下载量,round是按指定位数对数值进行四舍五入的函数。Wherein, b(L) is an average download amount used among users of user category L, and round is a function of rounding the value according to a specified number of digits.
在本发明的又一个实施例中,上述方法进一步包括:In yet another embodiment of the present invention, the above method further includes:
确定该指定用户的用户类别信息后,将该指定用户添加到所述用户样本集合中。After the user category information of the designated user is determined, the designated user is added to the user sample set.
当对一个用户的用户类别信息进行指定后,可以将其添加到已有的用户样本集合中,有利于扩大用户样本容量,使基于数据得出的用户信息标注更为准确。After the user category information of a user is specified, it can be added to the existing user sample set, which is conducive to expanding the user sample size and making the user information labeling based on data more accurate.
在本发明的又一个实施例中,上述方法进一步包括:In yet another embodiment of the present invention, the above method further includes:
当添加到所述用户样本集合中的用户数量达到预设值时,重新对用户样本集合中的用户所下载的各应用标注用户类别信息。When the number of users added to the user sample set reaches a preset value, re-mark user category information on each application downloaded by the user in the user sample set.
上述方法中当新添加的用户样本过多,达到预设值时,继续原方法可能会造成用户类别信息标注准确率的下降,通过重新对用户样本集合中的用户所下载的各应用标注用户类别信息可以有效解决这一问题。In the above method, when too many newly added user samples reach the preset value, continuing the original method may cause a decline in the accuracy of user category information labeling. By re-labeling the user category of each application downloaded by the user in the user sample set Information can effectively address this issue.
在本发明的又一个实施例中,上述方法进一步包括:In yet another embodiment of the present invention, the above method further includes:
对于该指定用户,根据所确定的用户类别信息向该指定用户推送推荐内容。For the designated user, push recommended content to the designated user according to the determined user category information.
本实施例提供了标注该指定用户的用户类别信息后可进行的操作。对于广告投放,根据不同的目标人群投放不同广告具有重要的意义。如标注用户类别为商务人士,可以投放商务用品、西装等广告;用户类别为中学生,可以投放教辅广告,等等。也可以针对不同爱好的人群推荐APP应用,如向摄影爱好者推荐拍照APP、摄影杂志APP,向化妆达人推荐时尚杂志APP等。This embodiment provides operations that can be performed after marking the user category information of the specified user. For advertising, it is of great significance to launch different advertisements according to different target groups. For example, if the user category is marked as a business person, advertisements for business supplies and suits can be placed; if the user category is a middle school student, teaching supplementary advertisements can be placed, and so on. Apps can also be recommended for people with different hobbies, such as recommending photography apps and photography magazine apps to photographers, and fashion magazine apps to makeup experts.
图2示出了本发明一个实施例提供的一种挖掘用户类别信息的装置的结构图,如图2所示,该挖掘用户类别信息的装置200包括:Fig. 2 shows a structural diagram of a device for mining user category information provided by an embodiment of the present invention. As shown in Fig. 2, the device 200 for mining user category information includes:
标注单元210,适于对于用户类别信息已知的用户样本集合中的用户所下载的各应用中的每个应用,根据用户样本集合中的不同类别用户下载该应用的次数,为该应用标注用户类别信息;The labeling unit 210 is adapted to, for each application downloaded by a user in the user sample set whose user category information is known, mark the user for the application according to the number of downloads of the application by different types of users in the user sample set category information;
挖掘单元220,适于对于一个用户类别信息待挖掘的指定用户,将该指定用户下载的各应用按照标注的用户类别信息进行分类;对于每个用户类别,根据属于该用户类别的应用的相关参数,计算该指定用户在该用户类别的分数值;将该指定用户的用户类别信息确定为与分数值最大的用户类别相同。The mining unit 220 is adapted to, for a specified user whose user category information is to be mined, classify the applications downloaded by the specified user according to the marked user category information; for each user category, according to the relevant parameters of the application belonging to the user category , calculating the score value of the specified user in the user category; determining the user category information of the specified user to be the same as the user category with the largest score value.
不同于传统技术中的问卷调查、用户注册信息等方式进行用户类别的判断,因为上述传统技术中很多用户在填写时,或是为了保护隐私等原因不提供真实信息,是由于懒于填写完全不提供信息,得到的用户类别信息不够准确,且收集到大量样本采用机器学习进行分类存在较大的困难,而图2所示装置能够有效解决这一问题。It is different from the questionnaire survey and user registration information in the traditional technology to judge the user category, because in the above-mentioned traditional technology, many users do not provide real information when filling in, or for reasons such as protecting privacy, because they are too lazy to fill in. Provide information, the user category information obtained is not accurate enough, and it is difficult to collect a large number of samples and use machine learning to classify, but the device shown in Figure 2 can effectively solve this problem.
在本发明的又一个实施例中,图2中的标识单元210,适于对于所述的每个应用,计算该应用在每个用户类别的用户中的平均下载量;为该应用标注与平均下载量最大的用户类别相同的用户类别信息。In yet another embodiment of the present invention, the identification unit 210 in FIG. 2 is adapted to, for each application, calculate the average download volume of the application among users of each user category; label the application with the average The same user category information as the most downloaded user category.
标识单元210采用此方法能够更为准确地根据用户样本中应用的相关数据标注用户类别信息。Using this method, the identification unit 210 can more accurately mark the user category information according to the relevant data applied in the user sample.
在本发明的又一个实施例中,所述挖掘单元220,适于对于每个用户类别,根据属于该类别的各应用的卡方分值和卡方分值权重,计算该指定用户在该用户类别的分数值。In yet another embodiment of the present invention, the mining unit 220 is adapted for each user category, according to the chi-square score and the chi-square score weight of each application belonging to the category, to calculate the specified user's The score value for the category.
在本发明的又一个实施例中,上述装置中挖掘单元220,适于根据如下公式计算指定用户u在用户类别L的分数值S(u)L:In yet another embodiment of the present invention, the mining unit 220 in the above-mentioned device is adapted to calculate the score value S(u) L of the specified user u in the user category L according to the following formula:
其中,si是该指定用户下载的第i个应用的卡方分值,wi是该指定用户下载的第i个应用的卡方分值权重,Li是该指定用户下载的第i个应用所属的用户类别。Among them, s i is the chi-square score of the i-th application downloaded by the specified user, w i is the chi-square score weight of the i -th application downloaded by the specified user, and Li is the i-th application downloaded by the specified user The user category to which the app belongs.
挖掘单元220利用求卡方分值计算该指定用户在该用户类别的分数值能得到更为准确的结果。The mining unit 220 calculates the score value of the specified user in the user category by calculating the chi-square score to obtain a more accurate result.
在本发明的又一个实施例中,上述装置中,当只有两种用户类别时,所述挖掘单元220,适于根据如下公式计算一个应用的卡方分值s:In yet another embodiment of the present invention, in the above device, when there are only two types of users, the mining unit 220 is adapted to calculate the chi-square score s of an application according to the following formula:
s=(A+B+C+D)(A*D-B*C)2/((A+C)*(B+D)*(A+B)*(C+D)).s=(A+B+C+D)(A*DB*C) 2 /((A+C)*(B+D)*(A+B)*(C+D)).
其中,A表示用户样本集合中的属于第一用户类别的用户下载该应用的次数,B表示用户样本集合中的属于第二用户类别的用户下载该应用的次数;C表示用户样本集合中的属于第一用户类别的用户不下载该应用的次数,D表示用户样本集合中的属于第二用户类别的用户不下载该应用的次数。Among them, A represents the number of downloads of the application by users belonging to the first user category in the user sample set; B represents the number of downloads of the application by users belonging to the second user category in the user sample set; C represents the number of downloads of the application in the user sample set. The number of times that users of the first user category do not download the application, and D represents the number of times that users belonging to the second user category in the user sample set do not download the application.
本实施例适于判断用户类别仅有两类的情况,如性别判断,计算更为方便快捷。This embodiment is suitable for judging only two types of user types, such as gender judgment, and the calculation is more convenient and quick.
在本发明的又一个实施例中,上述装置中,当只有两种用户类别时,所述挖掘单元220,适于将一个应用的卡方分值权重w设置为等于该应用在两个用户类别中的较大平均下载量与较小平均下载量的比值,更具体地根据如下公式计算一个应用的卡方分值权重w:In yet another embodiment of the present invention, in the above device, when there are only two user categories, the mining unit 220 is adapted to set the chi-square score weight w of an application to be equal to that of the application in the two user categories The ratio of the larger average download volume to the smaller average download volume in , more specifically, calculate the chi-square score weight w of an application according to the following formula:
w=round(max(b(L))/min(b(L)))w=round(max(b(L))/min(b(L)))
其中,b(L)是一个用于在用户分类L的用户中的平均下载量。where b(L) is an average download for users in user category L.
在本发明的又一个实施例中,上述装置进一步包括:In yet another embodiment of the present invention, the above-mentioned device further includes:
添加单元,适于在确定该指定用户的用户类别信息后,将该指定用户添加到所述用户样本集合中。The adding unit is adapted to add the specified user to the user sample set after determining the user category information of the specified user.
当对一个用户的用户类别信息进行指定后,可以利用添加单元将其添加到已有的用户样本集合中,有利于扩大用户样本容量,使基于数据得出的用户信息标注更为准确。After the user category information of a user is specified, it can be added to the existing user sample set by using the adding unit, which is conducive to expanding the user sample capacity and making the user information labeling based on data more accurate.
在本发明的又一个实施例中,上述装置中所述标识单元,适于当添加到所述用户样本集合中的用户数量达到预设值时,重新对用户样本集合中的用户所下载的各应用标注用户类别信息。In yet another embodiment of the present invention, the identification unit in the above-mentioned device is adapted to re-identify each user downloaded by the user in the user sample set when the number of users added to the user sample set reaches a preset value. App label user category information.
当添加单元采用上述方法新添加的用户样本过多,达到预设值时,继续原方法可能会造成用户类别信息标注准确率的下降,通过重新对用户样本集合中的用户所下载的各应用标注用户类别信息可以有效解决这一问题。When the adding unit adopts the above method to add too many user samples and reaches the preset value, continuing the original method may cause a decline in the accuracy of user category information labeling. By re-labeling each application downloaded by the user in the user sample set User category information can effectively solve this problem.
在本发明的又一个实施例中,上述装置进一步包括:In yet another embodiment of the present invention, the above-mentioned device further includes:
推荐单元,适于对于该指定用户,根据所确定的用户类别信息向该指定用户推送推荐内容。The recommending unit is adapted to, for the specified user, push recommended content to the specified user according to the determined user category information.
本实施例提供了标注该指定用户的用户类别信息后可进行的操作。对于广告投放,根据不同的目标人群投放不同广告具有重要的意义。如标注用户类别为商务人士,可以投放商务用品、西装等广告;用户类别为中学生,可以投放教辅广告,等等。也可以针对不同爱好的人群推荐APP应用,如向摄影爱好者推荐拍照APP、摄影杂志APP,向化妆达人推荐时尚杂志APP等。This embodiment provides operations that can be performed after marking the user category information of the designated user. For advertising, it is of great significance to launch different advertisements according to different target groups. For example, if the user category is marked as a business person, advertisements for business supplies and suits can be placed; if the user category is a middle school student, teaching supplementary advertisements can be placed, and so on. Apps can also be recommended for people with different hobbies, such as recommending photography apps and photography magazine apps to photographers, and fashion magazine apps to makeup experts.
在本发明的一个具体实施例中,获得了一批用户类别信息已知的用户样本集合,其中用户类别分为男性和女性,男性样本55376人,女性样本52902人。首先计算APP在男性类别和女性类别的平均下载量,根据公式In a specific embodiment of the present invention, a batch of user sample sets whose user category information is known is obtained, wherein user categories are divided into male and female, 55376 male samples and 52902 female samples. First calculate the average downloads of the APP in the male category and the female category, according to the formula
L=男,女 L = male, female
得出计算结果,其中appL表示在类别L下该APP下载的次数,CL表示样本L的样本总数目,再根据公式The calculation results are obtained, where appL represents the number of downloads of the APP under category L, C L represents the total number of samples of sample L, and then according to the formula
L*=argmaxL{b(L)},L=男,女L * =argmax L {b(L)}, L=male, female
确定APP的用户类别。再将APP在两个类别中较大平均下载量与较小平均下载量的比值作为APP卡方值的权重,即利用前述公式Determine the user category of the APP. Then take the ratio of the larger average download volume to the smaller average download volume of the APP in the two categories as the weight of the APP chi-square value, that is, use the aforementioned formula
w=round(max(b(L))/min(b(L)))w=round(max(b(L))/min(b(L)))
得到。当该APP在两个类别中总的下载量少于10,则将w设置为1,若该APP只在一个类别中出现,则利用公式get. When the total number of downloads of the APP in the two categories is less than 10, set w to 1. If the APP only appears in one category, use the formula
w=round(count/10)w=round(count/10)
计算,其中count为该APP在该类别下载的次数。再根据前述公式Calculate, where count is the number of times the app has been downloaded in this category. Then according to the above formula
得到指定用户u在用户类别L的分数值S(u)L,则该用户类别为类别分数值最大的类别。结果如下:Get the score value S(u) L of the specified user u in the user category L, then the user category is the category with the largest category score. The result is as follows:
AUC=0.782,总正确率=75.8%AUC=0.782, total correct rate=75.8%
可知对该用户测试集样本总准确率在75%以上。It can be seen that the total accuracy rate of the user test set sample is above 75%.
此外,在存在大量标注样本时,可筛选出质量较高的样本作为用户样本集合,当标注样本和计算得到用户类别一致时才会筛选出。In addition, when there are a large number of labeled samples, samples with higher quality can be filtered out as the user sample set, and only when the labeled samples are consistent with the calculated user category will be screened out.
当样本量不充足时,可通过本发明给出的技术方案直接挖掘出用户的用户类型。When the sample size is insufficient, the user type of the user can be directly mined through the technical solution provided by the present invention.
需要说明的是:It should be noted:
在此提供的算法和显示不与任何特定计算机、虚拟装置或者其它设备固有相关。各种通用装置也可以与基于在此的示教一起使用。根据上面的描述,构造这类装置所要求的结构是显而易见的。此外,本发明也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本发明的内容,并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithms and displays presented herein are not inherently related to any particular computer, virtual appliance, or other device. Various general purpose devices can also be used with the teachings based on this. The structure required to construct such an apparatus will be apparent from the foregoing description. Furthermore, the present invention is not specific to any particular programming language. It should be understood that various programming languages can be used to implement the content of the present invention described herein, and the above description of specific languages is for disclosing the best mode of the present invention.
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, in order to streamline this disclosure and to facilitate an understanding of one or more of the various inventive aspects, various features of the invention are sometimes grouped together in a single embodiment, figure, or its description. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method or method so disclosed may be used in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will understand that although some embodiments described herein include some features included in other embodiments but not others, combinations of features from different embodiments are meant to be within the scope of the invention. and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
本发明的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的挖掘用户类别信息的装置中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。The various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) can be used in practice to implement some or all functions of some or all components in the device for mining user category information according to the embodiment of the present invention. The present invention can also be implemented as an apparatus or an apparatus program (for example, a computer program and a computer program product) for performing a part or all of the methods described herein. Such a program for realizing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. does not indicate any order. These words can be interpreted as names.
Claims (10)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201510475686.4A CN106447363A (en) | 2015-08-05 | 2015-08-05 | Method and device for mining user category information | 
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title | 
|---|---|---|---|
| CN201510475686.4A CN106447363A (en) | 2015-08-05 | 2015-08-05 | Method and device for mining user category information | 
Publications (1)
| Publication Number | Publication Date | 
|---|---|
| CN106447363A true CN106447363A (en) | 2017-02-22 | 
Family
ID=58092969
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date | 
|---|---|---|---|
| CN201510475686.4A Pending CN106447363A (en) | 2015-08-05 | 2015-08-05 | Method and device for mining user category information | 
Country Status (1)
| Country | Link | 
|---|---|
| CN (1) | CN106447363A (en) | 
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| CN108008973A (en) * | 2017-12-25 | 2018-05-08 | 百度在线网络技术(北京)有限公司 | The method, apparatus and server of a kind of affiliate application | 
| WO2018176454A1 (en) * | 2017-04-01 | 2018-10-04 | 深圳市智晟达科技有限公司 | Method for recommending videos according to app usage, and recommendation system | 
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US20030084096A1 (en) * | 2001-10-31 | 2003-05-01 | Bryan Starbuck | Computer system with file association and application retrieval | 
| US20040078236A1 (en) * | 1999-10-30 | 2004-04-22 | Medtamic Holdings | Storage and access of aggregate patient data for analysis | 
| CN102063433A (en) * | 2009-11-16 | 2011-05-18 | 华为技术有限公司 | Method and device for recommending related items | 
| CN104423945A (en) * | 2013-08-30 | 2015-03-18 | 联想(北京)有限公司 | Information processing method and electronic device | 
| CN104598869A (en) * | 2014-07-25 | 2015-05-06 | 北京智膜科技有限公司 | Intelligent advertisement pushing method based on human face recognition device | 
- 
        2015
        - 2015-08-05 CN CN201510475686.4A patent/CN106447363A/en active Pending
 
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| US20040078236A1 (en) * | 1999-10-30 | 2004-04-22 | Medtamic Holdings | Storage and access of aggregate patient data for analysis | 
| US20030084096A1 (en) * | 2001-10-31 | 2003-05-01 | Bryan Starbuck | Computer system with file association and application retrieval | 
| CN102063433A (en) * | 2009-11-16 | 2011-05-18 | 华为技术有限公司 | Method and device for recommending related items | 
| CN104423945A (en) * | 2013-08-30 | 2015-03-18 | 联想(北京)有限公司 | Information processing method and electronic device | 
| CN104598869A (en) * | 2014-07-25 | 2015-05-06 | 北京智膜科技有限公司 | Intelligent advertisement pushing method based on human face recognition device | 
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title | 
|---|---|---|---|---|
| WO2018176454A1 (en) * | 2017-04-01 | 2018-10-04 | 深圳市智晟达科技有限公司 | Method for recommending videos according to app usage, and recommendation system | 
| CN108008973A (en) * | 2017-12-25 | 2018-05-08 | 百度在线网络技术(北京)有限公司 | The method, apparatus and server of a kind of affiliate application | 
Similar Documents
| Publication | Publication Date | Title | 
|---|---|---|
| Hidayah et al. | Algoritma Multinomial Naïve Bayes Untuk Klasifikasi Sentimen Pemerintah Terhadap Penanganan Covid-19 Menggunakan Data Twitter | |
| CN110135895B (en) | Advertisement putting method, device, equipment and storage medium | |
| CN105468742A (en) | Malicious order recognition method and device | |
| CN107563500A (en) | A kind of video recommendation method and system based on user's head portrait | |
| CN107944593A (en) | A kind of resource allocation methods and device, electronic equipment | |
| CN105512257A (en) | Method and system for searching questions and displaying answers | |
| CN108376164B (en) | Display method and device of potential anchor | |
| TWI690808B (en) | Data reorganization method and device | |
| CN106227743B (en) | Advertisement target group touching reaches ratio estimation method and device | |
| CN111159167B (en) | Labeling quality detection device and method | |
| US11226991B2 (en) | Interest tag determining method, computer device, and storage medium | |
| CN107133315A (en) | A kind of smart media based on semantic analysis recommends method | |
| CN105893484A (en) | Microblog Spammer recognition method based on text characteristics and behavior characteristics | |
| CN111860671A (en) | Classification model training method, apparatus, terminal device and readable storage medium | |
| Phadke et al. | Framing hate with hate frames: Designing the codebook | |
| CN106875076A (en) | Set up the method and system that outgoing call quality model, outgoing call model and outgoing call are evaluated | |
| CN104346408A (en) | Method and equipment for labeling network user | |
| CN107767155A (en) | A kind of method and system for assessing user's representation data | |
| CN105809557A (en) | Method and device for mining genders of users in social network | |
| CN108648019A (en) | Promote method, apparatus, electronic equipment and the storage medium of advertisement analysis efficiency | |
| CN107885846A (en) | Recommend method in a kind of knowledge point excavated based on implicit attribute and implicit relationship | |
| CN110287405A (en) | The method, apparatus and storage medium of sentiment analysis | |
| CN114067343A (en) | Data set construction method, model training method and corresponding device | |
| CN106529114A (en) | Method and system for making and implementing nursing plan | |
| CN106844384A (en) | Examination question indexing method and device | 
Legal Events
| Date | Code | Title | Description | 
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| RJ01 | Rejection of invention patent application after publication | Application publication date: 20170222 | |
| RJ01 | Rejection of invention patent application after publication |