[go: up one dir, main page]

CN105488697A - Potential customer mining method based on customer behavior characteristics - Google Patents

Potential customer mining method based on customer behavior characteristics Download PDF

Info

Publication number
CN105488697A
CN105488697A CN201510903856.4A CN201510903856A CN105488697A CN 105488697 A CN105488697 A CN 105488697A CN 201510903856 A CN201510903856 A CN 201510903856A CN 105488697 A CN105488697 A CN 105488697A
Authority
CN
China
Prior art keywords
page
data
feature
user
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510903856.4A
Other languages
Chinese (zh)
Inventor
李娟�
徐丽萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Focus Technology Co Ltd
Original Assignee
Focus Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Focus Technology Co Ltd filed Critical Focus Technology Co Ltd
Priority to CN201510903856.4A priority Critical patent/CN105488697A/en
Publication of CN105488697A publication Critical patent/CN105488697A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a potential customer mining method based on customer behavior characteristics. The potential customer mining method comprises the following steps: a first step: data preprocessing; 1) data cleaning: deleting unnecessary recording rows at first; 2) forming a URL rule list; 3) user marking: accessing a log sheet to jointly identify the user through vinfo and client ID; 4) extracting features: with a session as a unit, analyzing the access source, the number of browsed pages, the number of browsed product detail pages, the number of browsed products, the page browsing duration, the product detail page browsing duration, the time of looking up a screening list, whether looking up a service topic and the first time browsing time period of the user in a single day of each user in single session, using whether the user is tend to purchase as a category attribute, and forming a training sample by the characteristic; 5) screening a training set; a second step: simplifying the characteristic attributes based on a rough set to improve the classification precision; and a third step: establishing a random forest potential customer identification model based on the customer behavior characteristics.

Description

一种基于客户行为特征的潜在客户挖掘方法A Potential Customer Mining Method Based on Customer Behavior Characteristics

技术领域technical field

本发明涉及通过分析网站用户访问日志挖掘潜在客户领域,具体而言,涉及一种基于客户行为特征的潜在客户挖掘方法。The invention relates to the field of mining potential customers by analyzing website user access logs, in particular to a potential customer mining method based on customer behavior characteristics.

背景技术Background technique

目前在市场竞争日益激烈的电子商务时代,不断拓展更多的新客户,从众多的浏览者有效挖掘出潜在客户群体,并努力将潜在客户转换成为现实客户,企业就能获得更多效益以及市场竞争优势。潜在客户挖掘的目的就是为网站制定相应的服务策略提供准确的参考依据及做出相应决策。At present, in the era of increasingly fierce market competition in the e-commerce era, by continuously expanding more new customers, effectively digging out potential customer groups from numerous browsers, and working hard to convert potential customers into actual customers, enterprises can gain more benefits and market Competitive Advantage. The purpose of potential customer mining is to provide accurate references and make corresponding decisions for the website to formulate corresponding service strategies.

潜在客户挖掘的基础数据来源于网站的访问日志,访问日志记录了客户访问某一站点的访问行为信息,这些信息是易于获取的。The basic data of potential customer mining comes from the website's access log, which records the customer's visit behavior information when visiting a certain site, and this information is easy to obtain.

访问日志里记录了访客的IP、登陆ID、访问时间、VINFO、浏览产品ID、REFERER(上一次访问的页面)、REQUEST(访问的页面)、SEARCH_WORD(搜索词)、PROD_ID(浏览产品)、ORDER_ID等信息。The access log records the visitor's IP, login ID, access time, VINFO, browsed product ID, REFERER (last visited page), REQUEST (visited page), SEARCH_WORD (search term), PROD_ID (browsed product), ORDER_ID and other information.

表1访问日志信息Table 1 Access log information

IptonumberIptonumber 访客ipvisitor ip VinfoVinfo Cookiecookies Login_idLogin_id 登陆idlogin id Visit_timeVisit_time 访问时间interview time Requestrequest 请求ask ReferrerReferrer 来源source User_agentUser_agent 所使用机器machine used Prod_idProd_id 产品idproduct id Search_wordSearch_word 搜索词search term Order_idOrder_id 订单idorder id Policy_idPolicy_id 保单idpolicy id User_agentUser_agent 机器idmachine id

其中,REFERER和REQUEST是分析访问来源、访问去向和判断访客是否意向购买,是否加入购物车等行为的最主要的信息。对这些行为进行挖掘提取出客户的行为特征,这些行为特征能够较为有效的反映客户的类别,比如具有哪种访问行为特征的客户是忠诚客户,具有哪种访问行为特征的客户是纯粹参观者,具有哪种访问行为特征的客户是潜在客户。Among them, REFERER and REQUEST are the most important information for analyzing the source of the visit, the destination of the visit, and judging whether the visitor intends to purchase, whether to add to the shopping cart, etc. Mining these behaviors extracts the behavioral characteristics of customers, which can more effectively reflect the category of customers, such as customers with which kind of visiting behavior characteristics are loyal customers, customers with which visiting behavior characteristics are pure visitors, Customers with which visit behavior characteristics are potential customers.

因此如何从网站的访问日志中找出潜在客户的行为特征就是挖掘潜在客户需要解决的问题。Therefore, how to find out the behavior characteristics of potential customers from the website access log is the problem that potential customers need to solve.

发明内容Contents of the invention

发明目的:本发明提供一种基于客户行为特征的潜在客户挖掘方法,为网站制定相应的服务策略,并提供准确的参考依据及做出相应决策。Purpose of the invention: The present invention provides a potential customer mining method based on customer behavior characteristics, formulates corresponding service strategies for websites, and provides accurate references and makes corresponding decisions.

技术方案:为实现上述目的,本发明采用的技术方案为:一种基于客户行为特征的潜在客户挖掘方法,其特征在于:Technical solution: In order to achieve the above object, the technical solution adopted in the present invention is: a potential customer mining method based on customer behavior characteristics, characterized in that:

步骤一:数据预处理;Step 1: Data preprocessing;

Step1:数据清洗Step1: Data cleaning

原始日志记录累积了大量的客户浏览信息,很多是与数据挖掘无关的冗余信息,比如图片、短信验证、Logo图片等信息,首先需要删除不需要的记录行;The original log records have accumulated a large amount of customer browsing information, many of which are redundant information not related to data mining, such as pictures, SMS verification, Logo pictures and other information. First, you need to delete unnecessary record lines;

Step2:形成URL规则列表Step2: Form a list of URL rules

分析新一站web数据中的REQUEST字段,对包含‘confirm’时,代表意向购买等,最终形成URL规则列表;后续计算特征时,不需要逐个分析request字段,可根据request字段跟url规则列表中的url匹配,获取url_name;Analyze the REQUEST field in the web data of the new station, and when it contains 'confirm', it represents the intention to purchase, etc., and finally forms a list of URL rules; when calculating features later, it is not necessary to analyze the request fields one by one, you can use the request field and the url rule list The url matches, get url_name;

Step3:用户标志Step3: User logo

访问日志表里是以vinfo、iptonumber、客户ID(login_id)联合识别用户;In the access log table, users are jointly identified by vinfo, iptonumber, and customer ID (login_id);

vinfo:相当于cookie,标志着一台计算机;Vinfo: Equivalent to cookie, which marks a computer;

iptonumber:ip地址,同一台计算机在不同地方登陆,会有不同的ip;iptonumber: ip address, the same computer will have different ip when logging in in different places;

login_id:会员登陆id,非会员登陆时login_id=-1。login_id: member login id, login_id=-1 when non-member login.

Step4:特征提取Step4: Feature extraction

以一个session为单位,分析单个session里每个用户的访问来源、浏览页面数、浏览产品详情页数、浏览产品数、页面浏览时长、产品详情页浏览时长、查看筛选列表的次数、是否查看业务话题、用户单日首次浏览时段、用户是否查看购物车等特征属性,并以用户是否意向购买作为类别属性,以此特征最终形成训练样本;Taking a session as a unit, analyze the access source, number of pages browsed, number of product details pages browsed, number of products browsed, page browsing time, product details page browsing time, times of viewing the filter list, and whether to view the business of each user in a single session. Feature attributes such as topic, user's first browsing period in a single day, whether the user views the shopping cart, etc., and whether the user intends to purchase is used as the category attribute, and the training sample is finally formed with this feature;

Step5:筛选训练集Step5: Filter the training set

web日志里的行为数据信息是某个时间段内全体用户在新一站网站上产生的行为信息数据,这其中就包括有多次购物的人(忠诚客户),购物次数不多的人(现有客户),潜在客户以及浏览了网站主页,但没有浏览任何网站内商品的人(纯粹的浏览者)产生的行为数据;The behavior data information in the web log is the behavior information data generated by all users on the new website within a certain period of time, which includes people who have made many purchases (loyal customers) and people who have made few purchases (now There are customers), potential customers and behavioral data generated by people who have browsed the homepage of the website but have not browsed any products on the website (pure browsers);

通过分析一段时间内的购买次数,排除多次购买的客户数据,选取对某一产品进行第一次购买的客户或浏览后未购买的客户作为挖掘对象;By analyzing the number of purchases within a period of time, excluding customer data with multiple purchases, select customers who have purchased a product for the first time or customers who have not purchased after browsing as mining objects;

步骤二:基于粗糙集的特征属性约简;Step 2: Reduction of feature attributes based on rough sets;

对于类别属性,步骤一提取的11个特征属性,某些有可能是冗余的。可以根据粗糙集理论,在不影响分类性能的前提下,将冗余属性约去,从而减少运算量,提高分类精度。For category attributes, some of the 11 feature attributes extracted in step 1 may be redundant. According to the rough set theory, the redundant attributes can be reduced without affecting the classification performance, so as to reduce the amount of calculation and improve the classification accuracy.

方法步骤:Method steps:

首先利用相对正域求核Core:First use the relative positive domain to find the core:

Step1:初始化数据Core=φ,C={a1,a2,...,aj}j=1,2,...11,aj为特征属性,D={a12}为类别属性,计算相对正域POSc(D);Step1: Initialize data Core=φ, C={a 1 ,a 2 ,...,a j }j=1,2,...11, a j is the feature attribute, D={a 12 } is the category attribute , calculate the relative positive domain POS c (D);

Step2:B=C-{aj},计算相对正域POSB(D),并比较POSc(D)、POSB(D)。若POSc(D)≠POSB(D),则aj为核属性,Core=Core∩B,循环判断每个属性是否为核属性。Step2: B=C-{a j }, calculate the relative positive field POS B (D), and compare POS c (D) and POS B (D). If POS c (D)≠POS B (D), then a j is a core attribute, Core=Core∩B, loop to determine whether each attribute is a core attribute.

Step3:返回Core,结束。Step3: Return to Core, end.

其次利用属性依赖度求约简Reduce:Secondly, use attribute dependency to reduce Reduce:

Step1:初始化数据,剩余属性RestAtt=C-Core,Reduce=Core, Step1: Initialize data, remaining attributes RestAtt=C-Core, Reduce=Core,

Step2:比较POScore(D)、POSc(D),若相等,则Core即为约简,否则转到step3;Step2: Compare POS core (D) and POS c (D), if they are equal, then Core is reduction, otherwise go to step3;

Step3:循环RestAtt中每个剩余属性aj,设选出使得K值最大的属性ak,令Reduce=Reduce∪{ak},RestAtt=RestAtt-{ak},并比较POSRestAtt(D)与POSc(D),若相等则转到step4,否则继续循环。Step3: Loop through each remaining attribute a j in RestAtt, set Select the attribute a k that makes the K value the largest, set Reduce=Reduce∪{a k }, RestAtt=RestAtt-{a k }, and compare POS RestAtt (D) and POS c (D), if they are equal, go to step4 , otherwise the loop continues.

Step4:返回Reduce,结束。Step4: Return to Reduce, end.

此时的Reduce即为最终输入分类器的特征属性。添加基于粗糙集的属性筛选这一步。The Reduce at this time is the feature attribute of the final input classifier. Add a rough set based attribute filtering step.

步骤三:基于客户行为特征的随机森林潜在客户识别模型Step 3: Random forest potential customer identification model based on customer behavior characteristics

随机森林算法使用R3.0.2软件的语言软件包randomForest4.6-6来实现,程序通过数据源ODBC连接Oracle数据库,运用函数get_data()获取所需数据,运用函数cal_feture()计算数据特征;筛选训练集后,调用随机森林分类模型model_rf对特征数据进行预测得到潜在客户ip及cookie信息,最后通过ip和cookie在已有数据表中查找潜在客户的用户信息并写入数据库中。The random forest algorithm is implemented using the language package randomForest4.6-6 of the R3.0.2 software. The program connects to the Oracle database through the data source ODBC, uses the function get_data() to obtain the required data, and uses the function cal_feture() to calculate the data features; screening training After the collection, call the random forest classification model model_rf to predict the feature data to obtain the potential customer ip and cookie information, and finally use the ip and cookie to find the potential customer user information in the existing data table and write it into the database.

步骤三中:In step three:

Step1:连接数据库,函数get_data()的功能为从数据库中获取所需数据,参数chan为数据库连接,cal_number为所需获取数据的日期,Step1: Connect to the database. The function of get_data() is to obtain the required data from the database. The parameter chan is the database connection, and cal_number is the date to obtain the data.

data=sqlQuery(chan,sql,stringsAsFactors=FALSE)data=sqlQuery(chan,sql,stringsAsFactors=FALSE)

通过RODBC包中odbcConnect()函数建立R与Oracle数据库连接:Use the odbcConnect() function in the RODBC package to establish a connection between R and the Oracle database:

chan=odbcConnect("dm_xyz",uid='######',pwd='******')chan=odbcConnect("dm_xyz", uid='######', pwd='******')

其中,参数dm_xyz为数据源ODBC的系统DSN名,uid为用户名,pwd为用户登录密码;Among them, the parameter dm_xyz is the system DSN name of the data source ODBC, uid is the user name, and pwd is the user login password;

建立数据库连接之后,通过执行sql语句获取数据库中所需数据;并在sql语句中添加步骤一的数据清洗规则;After establishing the database connection, obtain the required data in the database by executing the sql statement; and add the data cleaning rule of step 1 in the sql statement;

Step2:URL匹配,更新浏览页面信息Step2: URL matching, update browsing page information

由本地读入步骤二的URL规则txt文档,根据数据中的REQUEST字段匹配URL规则的关键字,更新VISIT_PAGE字段,无匹配项的记录则设为“-1”。Read the URL rule txt file in step 2 locally, match the keywords of the URL rule according to the REQUEST field in the data, update the VISIT_PAGE field, and set "-1" for records with no matching items.

Step3:特征计算cal_feature(data)Step3: Feature calculation cal_feature(data)

函数cal_feature的功能为将get_data函数获取的单日数据分割成不同的浏览session,对单个session计算特征最终获得特征数据集。The function of the function cal_feature is to divide the single-day data obtained by the get_data function into different browsing sessions, calculate the features for a single session, and finally obtain the feature data set.

步骤四:潜在客户识别模型性能验证Step 4: Potential Customer Identification Model Performance Verification

利用oracle编辑存储过程,判断挖出的潜在客户在挖出日期后一个月内真正购买的比例,作为模型性能验证指标。如果每天预测出的客户后续购买率比较高且比较平稳,证明模型性能较好。给出了模型效果指标。Use oracle to edit the stored procedure to determine the proportion of potential customers who are dug out and actually buy within one month after the date of dug out, as a model performance verification indicator. If the customer's follow-up purchase rate predicted every day is relatively high and relatively stable, it proves that the model has better performance. Model performance indicators are given.

本发明有益效果:本发明提供了一种基于客户行为特征的潜在客户挖掘方法,使用稳定随机森林算法建立分类模型,具有效率高且数据准确的特点,能够为网站制定相应的服务策略提供准确的参考依据及做出相应决策。Beneficial effects of the present invention: the present invention provides a potential customer mining method based on customer behavior characteristics, uses a stable random forest algorithm to establish a classification model, has the characteristics of high efficiency and accurate data, and can provide accurate information for the website to formulate corresponding service strategies References and make corresponding decisions.

附图说明Description of drawings

图1为整个程序的结构图,其中model_rf.Rdata为已经训练的分类模型。Figure 1 is the structure diagram of the whole program, where model_rf.Rdata is the trained classification model.

具体实施方式detailed description

步骤一:数据预处理;Step 1: Data preprocessing;

Step1:数据清洗Step1: Data cleaning

原始日志记录累积了大量的客户浏览信息,很多是与数据挖掘无关的冗余信息,比如图片、短信验证、Logo图片等信息,首先需要删除不需要的记录行。The original log records have accumulated a large amount of customer browsing information, many of which are redundant information not related to data mining, such as pictures, SMS verification, Logo pictures and other information. First, unnecessary record lines need to be deleted.

需要删除的记录行的REQUEST如下The REQUEST of the row to be deleted is as follows

Step2:形成本申请人新一站URL规则列表Step2: Form the applicant's new one-stop URL rule list

分析本申请人新一站web数据中的REQUEST字段,比如分析request包含‘baoxian’时,代表查看保险话题,包含‘confirm’时,代表意向购买等,最终形成新一站URL规则列表。后续计算特征时,不需要逐个分析request字段,可根据request字段跟url规则列表中的url匹配,获取url_name,如表2所示。Analyze the REQUEST field in the applicant’s new web data, for example, analyze that when the request contains ‘baoxian’, it means viewing insurance topics, and when it includes ‘confirm’, it means that it intends to purchase, etc., and finally forms a list of new URL rules. In the subsequent calculation of features, it is not necessary to analyze the request fields one by one. The url_name can be obtained by matching the request field with the url in the url rule list, as shown in Table 2.

表2新一站url规则Table 2 New one-stop url rules

Step3:用户标志Step3: User logo

新一站访问日志表里是以vinfo、iptonumber、客户ID(login_id)联合识别用户。The access log table of the new site uses vinfo, iptonumber, and customer ID (login_id) to jointly identify users.

vinfo:相当于cookie,标志着一台计算机;Vinfo: Equivalent to cookie, which marks a computer;

iptonumber:ip地址,同一台计算机在不同地方登陆,会有不同的ip;iptonumber: ip address, the same computer will have different ip when logging in in different places;

login_id:新一站会员登陆id,非会员登陆时login_id=-1。login_id: login id of a member of the new station, login_id=-1 when a non-member logs in.

Step4:特征提取Step4: Feature extraction

以一个session为单位,分析单个session里每个用户的访问来源、浏览页面数、浏览产品详情页数、浏览产品数、页面浏览时长、产品详情页浏览时长、查看筛选列表的次数、是否查看保险话题、是否查看保险字典、用户单日首次浏览时段、用户是否查看购物车等特征属性,并以用户是否意向购买作为类别属性,最终形成训练样本。Taking a session as a unit, analyze the access source, number of pages viewed, number of product details pages viewed, number of products viewed, page browsing time, product details page browsing time, times of viewing the filter list, and whether to view insurance for each user in a single session. Feature attributes such as topic, whether to check the insurance dictionary, the time period of the user's first browsing in a single day, whether the user checks the shopping cart, etc., and whether the user intends to purchase is used as the category attribute to finally form a training sample.

a)特征1:访问来源,根据每一个session的第一个referer,区分客户的访问来源。广告投放、搜索引擎、邮件、直接访问、其他。a) Feature 1: Access source, according to the first referer of each session, distinguish the client's access source. Advertising, search engines, mail, direct access, others.

b)特征2:浏览页面数,取一个session内用户浏览的页面数。b) Feature 2: The number of pages browsed, the number of pages browsed by the user in a session.

c)特征3:浏览产品详情页数,取一个session内用户浏览产品详情页数。c) Feature 3: The number of product details pages browsed, the number of product details pages browsed by users in a session.

d)特征4:页面浏览时长,一个session内用户每个页面的浏览时长取平均值。d) Feature 4: page browsing time, the average browsing time of each page of a user in a session.

e)特征5:产品详情页浏览时长,一个session内用户每个产品详情页面的浏览时长取平均值。e) Feature 5: Browsing time of the product details page, the average browsing time of each product details page of the user in a session.

f)特征6:查看筛选列表的次数,观察一个session内ruquest有’viewsearchlist’的次数。f) Feature 6: View the number of times to filter the list, and observe the number of times that ruquest has 'viewsearchlist' in a session.

g)特征7:是否查看保险话题。观察ruquest里是否有‘baoxian’。g) Feature 7: Whether to view insurance topics. Check if there is 'baoxian' in ruquest.

h)特征8:是否查看保险字典。观察ruquest里是否有‘toptag’。h) Feature 8: Whether to check the insurance dictionary. Check if there is 'toptag' in ruquest.

i)特征9:浏览时段。一个session首次访问时间visit_timei) Feature 9: Browsing period. A session first visit time visit_time

j)特征10:是否查看购物车,是为‘1’,否为‘0’;是否查看购物车是观察ruquest里有没有shopping_car。j) Feature 10: Yes No to check the shopping cart, yes is '1', no is '0'; whether to check the shopping cart is to observe whether there is shopping_car in ruquest.

k)类别属性:是否意向购买,是为‘1’,否为‘0’;是否意向购买是判断ruquest里有没有confirm,含有confirm说明产生意向购买行为。k) Category attribute: whether it is intended to purchase, it is '1', and if it is not, it is '0'; whether it is intended to purchase is to judge whether there is a confirm in the ruquest, and the presence of confirm indicates the intention to purchase.

Step5:筛选训练集Step5: Filter the training set

web日志里的行为数据信息是某个时间段内全体用户在新一站网站上产生的行为信息数据,这其中就包括有多次购物的人(忠诚客户),购物次数不多的人(现有客户),潜在客户以及浏览了网站主页,但没有浏览任何网站内商品的人(纯粹的浏览者)产生的行为数据。The behavior data information in the web log is the behavior information data generated by all users on the new website within a certain period of time, which includes people who have made many purchases (loyal customers) and people who have made few purchases (now There are customers), potential customers, and people who have browsed the homepage of the website but have not browsed any products on the website (pure viewers).

若将所有数据直接用来训练分类模型会使得结果产生较大的误差。为避免造成误差,应该排除掉其他类型客户行为对构建模型准确性的干扰。If all the data is directly used to train the classification model, the result will have a large error. In order to avoid errors, the interference of other types of customer behaviors on the accuracy of the model should be excluded.

通过分析一段时间内的购买次数,排除掉多次购买的客户数据,本发明选取对某一产品进行第一次购买的客户或浏览后未购买的客户作为挖掘对象。By analyzing the number of purchases within a period of time and excluding customer data of multiple purchases, the present invention selects customers who purchase a certain product for the first time or customers who have not purchased after browsing as mining objects.

Step6:筛选特征属性Step6: Filter feature attributes

通过粗糙集理论,排除冗余属性,最终得到的特征属性有访问来源、浏览页面数、浏览产品详情页数、浏览产品数、页面浏览时长、产品详情页浏览时长、搜索次数、是否查看保险话题、是否查看保险字典、用户是否查看购物车。Through rough set theory, redundant attributes are eliminated, and the final characteristic attributes include access source, number of pages viewed, number of product details pages viewed, number of products viewed, page browsing time, product details page browsing time, number of searches, whether to view insurance topics , Whether to check the insurance dictionary, whether the user checks the shopping cart.

步骤三:基于客户行为特征的随机森林潜在客户识别模型Step 3: Random forest potential customer identification model based on customer behavior characteristics

随机森林算法使用R3.0.2软件的语言软件包randomForest4.6-6来实现,程序通过数据源(ODBC)连接Oracle数据库,运用函数get_data()获取所需数据,运用函数cal_feture()计算数据特征。筛选训练集后,调用随机森林分类模型model_rf对特征数据进行预测得到潜在客户ip及cookie信息,最后通过ip和cookie在已有数据表中查找潜在客户的用户信息并写入数据库中。The random forest algorithm is implemented using the language package randomForest4.6-6 of the R3.0.2 software. The program connects to the Oracle database through the data source (ODBC), uses the function get_data() to obtain the required data, and uses the function cal_feture() to calculate the data features. After screening the training set, call the random forest classification model model_rf to predict the feature data to obtain potential customer ip and cookie information, and finally find potential customer user information in the existing data table through ip and cookie and write it into the database.

Step1:连接数据库Step1: Connect to the database

函数get_data()的功能为从数据库中获取所需数据,参数chan为数据库连接,cal_number为所需获取数据的日期。The function of the function get_data() is to obtain the required data from the database, the parameter chan is the database connection, and the cal_number is the date to obtain the data.

data=sqlQuery(chan,sql,stringsAsFactors=FALSE)data=sqlQuery(chan,sql,stringsAsFactors=FALSE)

通过RODBC包中odbcConnect()函数建立R与Oracle数据库连接:Use the odbcConnect() function in the RODBC package to establish a connection between R and the Oracle database:

chan=odbcConnect("dm_xyz",uid='######',pwd='******')chan=odbcConnect("dm_xyz", uid='######', pwd='******')

其中,参数dm_xyz为数据源(ODBC)的系统DSN名,uid为用户名,pwd为用户登录密码。Among them, the parameter dm_xyz is the system DSN name of the data source (ODBC), uid is the user name, and pwd is the user login password.

建立数据库连接之后,通过执行sql语句获取数据库中所需数据。并在sql语句中添加步骤一的数据清洗规则。After the database connection is established, the required data in the database is obtained by executing the sql statement. And add the data cleaning rule of step 1 in the sql statement.

Step2:URL匹配,更新浏览页面信息Step2: URL matching, update browsing page information

由本地读入步骤二的URL规则txt文档,根据数据中的REQUEST字段匹配URL规则的关键字,更新VISIT_PAGE字段,无匹配项的记录则设为“-1”。Read the URL rule txt document in step 2 from the local, match the keywords of the URL rule according to the REQUEST field in the data, update the VISIT_PAGE field, and set "-1" for records with no matching items.

Step3:特征计算cal_feature(data)Step3: Feature calculation cal_feature(data)

函数cal_feature的功能为将get_data函数获取的单日数据分割成不同的浏览session,对单个session计算特征最终获得特征数据集。The function of the function cal_feature is to divide the single-day data obtained by the get_data function into different browsing sessions, calculate the features for a single session, and finally obtain the feature data set.

(1)cookie信息(1) cookie information

(2)是否登录(2) Whether to log in

(3)浏览页面数和页面平均浏览时长(3) Number of pages viewed and average page browsing time

#浏览页面数#Number of browsed pages

page_tag=which(tmp$VISIT_PAGE!='-1')page_tag=which(tmp$VISIT_PAGE !='-1')

page_n=length(page_tag)page_n=length(page_tag)

#产品详情页页数#Number of product details pages

prod_tag=which(tmp$VISIT_PAGE=='产品详情页')prod_tag=which(tmp$VISIT_PAGE=='Product Details Page')

page_prod=length(prod_tag)page_prod=length(prod_tag)

#页面和产品详情页的浏览时长(page_n-1)#Browsing time of pages and product details pages (page_n-1)

time=strptime(tmp[page_tag,2],"%Y-%m-%d%H:%M:%S")#实际页面的请求时间time=strptime(tmp[page_tag,2],"%Y-%m-%d%H:%M:%S")#Request time of the actual page

stay_time=as.vector(diff(time))#页面请求时间差作为页面的停留时间stay_time=as.vector(diff(time))#page request time difference as the stay time of the page

page_time_avg=sum(stay_time,na.rm=T)/page_n#页面平均访问时长page_time_avg=sum(stay_time,na.rm=T)/page_n#Average page access time

prod_time=stay_time[which(page_tag%in%prod_tag)]#产品详情页总停留时间prod_time=stay_time[which(page_tag%in%prod_tag)]#The total stay time of the product details page

prod_time_avg=sum(prod_time,na.rm=T)/page_prod#产品详情页平均时长prod_time_avg=sum(prod_time,na.rm=T)/page_prod#Average duration of product details page

(4)查看筛选列表的次数(4) Number of times the filter list was viewed

#筛选列表次数(0表示未搜索)#Filter list times (0 means no search)

sear_n=length(which(tmp$VISIT_PAGE=="搜索列表"))sear_n=length(which(tmp$VISIT_PAGE=="search list"))

(5)是否查看过购物车(5) Have you checked the shopping cart

(6)是否查看过保险话题(6) Have you checked the topic of insurance?

(7)是否查看过保险字典(7) Have you checked the insurance dictionary

(8)来源网站的类型,根据REFERER和REFERER_SOURCE_WORD字段关键字判断来源网站。(8) The type of the source website, judge the source website according to the keywords in the REFERER and REFERER_SOURCE_WORD fields.

(9)是否意向购买(预测变量)(9) Intention to purchase (predictor variable)

最终合成一条obs观测记录返回,筛选首次购买和浏览后未购买的用户,形成训练样本。Finally, an obs observation record is synthesized and returned, and users who purchase for the first time and users who have not purchased after browsing are screened to form a training sample.

Step4:更新数据库中潜在客户信息表数据update_info(channe,cal_number)Step4: Update the potential customer information table data update_info(channe,cal_number) in the database

该函数为整个程序最主要的函数,用于预测潜在客户、获取潜在客户信息并写入数据库。函数中会调用get_data()函数和cal_feature()函数.This function is the most important function of the whole program, which is used to predict potential customers, obtain potential customer information and write it into the database. The get_data() function and cal_feature() function will be called in the function.

(1)获取数据、计算特征及预测潜在客户(1) Obtain data, calculate features and predict potential customers

data=get_data(channel,cal_number)#获取cal_number对应日期的数据data=get_data(channel,cal_number)#Get the data of the date corresponding to cal_number

feature=cal_feature(data)#计算该日数据对应的特征数据feature=cal_feature(data)#Calculate the feature data corresponding to the daily data

feature[,14]=as.factor(feature[,14])#将buy_flag字段设置为因子feature[,14]=as.factor(feature[,14])#Set the buy_flag field to factor

feature0=feature[which(feature$buy_flag==0),]#提取标记为0的记录作为被预测对象feature0=feature[which(feature$buy_flag==0),]#Extract the record marked as 0 as the predicted object

pre_rf=predict(model_rf,feature0[,3:14])#用模型进行预测pre_rf=predict(model_rf,feature0[,3:14])#Use the model to predict

potential_ip=feature0[which(pre_rf==1),1:2]#取标记为1的ip和cookie作为潜在客户potential_ip=feature0[which(pre_rf==1),1:2]#Take the ip and cookie marked as 1 as a potential customer

(2)通过login_id,关联user_info表,更新用户基本信息(2) Through the login_id, associate the user_info table and update the basic information of the user

基本信息包括用户的年龄、性别、邮箱、生日等信息,以便网站做一些线下的推广策略。The basic information includes the user's age, gender, email address, birthday and other information, so that the website can do some offline promotion strategies.

Step5:启动程序自动识别潜在客户Step5: Start the program to automatically identify potential customers

执行整个程序时只需执行update_info(channe,cal_number)函数即可,在执行函数前获取系统日期,将系统日期推前一天作为程序执行时的cal_number参数。When executing the whole program, you only need to execute the update_info(channe, cal_number) function, get the system date before executing the function, and push the system date forward one day as the cal_number parameter when the program is executed.

Step6:模型性能验证Step6: Model performance verification

最终挖出的潜在客户信息表通过vinfo关联新一站日志表,得到每个潜在客户挖出日期后30天的访问行为。若该客户在后续30天内具有付款成功标志,即request含有‘paysuccess’,并且正则表达式截取获得的新一站日志表中的订单ID在订单信息表里付款状态为已经付款,此时认为该潜在客户后续确实产生购买行为,证明模型成功预测。因此以后续购买率作为模型性能验证指标。The finally dug out potential customer information table is associated with the new station log table through vinfo, and the visit behavior of each potential customer 30 days after the digging date is obtained. If the customer has a payment success flag within the next 30 days, that is, the request contains 'paysuccess', and the order ID in the log table of the new station intercepted by the regular expression is paid in the order information table, the payment status is considered to be paid at this time. Potential customers did subsequently purchase behavior, proving that the model successfully predicted. Therefore, the follow-up purchase rate is used as the model performance verification index.

应当指出,在不脱离本发明原理的前提下,作适当修改或者替换,这些修改或者替换也应视为本发明的保护范围。It should be pointed out that, without departing from the principle of the present invention, appropriate modifications or substitutions should be made, and these modifications or substitutions should also be regarded as the protection scope of the present invention.

Claims (3)

1., based on potential customers' method for digging of customer action feature, it is characterized in that:
Step one: data prediction;
1): data cleansing
Original log record have accumulated a large amount of client's browsing informations, is much the redundant information irrelevant with data mining, such as picture, short-message verification, Logo pictorial information, first needs to delete unwanted record row;
2): form URL list of rules
Analyze the REQUEST field in a new station web data, to when comprising ' confirm ', represent purpose purchase etc., final formation URL list of rules; During subsequent calculations feature, do not need to analyze request field one by one, according to request field with the url coupling in url list of rules, url_name can be obtained;
3): user label
Combine with vinfo, iptonumber, Customer ID to identify user in access log table;
Vinfo: be equivalent to cookie, indicates a computing machine;
Iptonumber:ip address, same computer logs in different place, has different ip;
Login_id: Member Entrance id, login_id=-1 when non-member logs in.
4): feature extraction
In units of a session, analyze the access source of each user in single session, browsing pages number, browse product details number of pages, browse product number, page browsing duration, product details page browsing duration, check filtered list number of times, whether check business topic, user's odd-numbered day browses the period first, whether user checks the characteristic attributes such as shopping cart, and using user, whether purpose is bought as category attribute, finally forms training sample with this feature;
5): screening training set
Behavioral data information in web daily record is the behavioural information data that in certain time period, total user produces on a new website, station, this wherein just includes the people and loyalty customer that repeatedly do shopping, the people that shopping number of times is few and existing customer, potential customers and browsed site home page, but the behavioral data not browsing that the people of commodity in any website and pure viewer produce;
By analyzing the purchase number of times in a period of time, excluding the customer data repeatedly bought, choosing the client that initial purchase is carried out to a certain product or the client do not bought after browsing as excavation object;
Step 2: based on the Characteristic Attribute Reduction of rough set;
For category attribute, 11 characteristic attributes that step one is extracted, redundant attributes according to rough set theory, under the prerequisite not affecting classification performance, must be divided out by the characteristic attribute of redundancy, thus reduces operand, improves nicety of grading;
Method step:
First relative positive field is utilized to ask core Core:
1): initialization data Core=φ, C={a 1, a 2..., a jj=1,2 ... 11, a jfor characteristic attribute, D={a 12be category attribute, calculate relative positive field POS c(D);
2): B=C-{a j, calculate relative positive field POS b, and compare POS (D) c(D), POS b(D).If POS c(D) ≠ POS b(D), then a jfor core attributes, Core=Core ∩ B, whether each attribute of cycle criterion is core attributes;
3): return Core, terminate;
Next utilizes attribute dependability to ask yojan Reduce:
1): initialization data, residue attribute RestAtt=C-Core, Reduce=Core,
2): compare POS core(D), POS c(D), if equal, then Core is yojan, otherwise forwards step3 to;
3): each residue attribute a in circulation RestAtt jif, select the attribute a making K value maximum k, make Reduce=Reduce ∪ { a k, RestAtt=RestAtt-{a k, and compare POS restAttand POS (D) c(D), if equal, forward step4 to, otherwise continue circulation;
4): return Reduce, terminate;
Reduce is now the characteristic attribute of final input sorter;
Step 3: based on the random forest potential customers model of cognition of customer action feature;
Random forests algorithm uses the lingware bag randomForest4.6-6 of R3.0.2 software to realize, program connects oracle database by data source ODBC, use function get_data () to obtain desired data, use function cal_feture () to calculate data characteristics; After screening training set, call random forest disaggregated model model_rf and prediction is carried out to characteristic obtain potential customers ip and cookie information, finally by ip and cookie search in data with existing table potential customers user profile and in write into Databasce;
In step 3:
1): connection data storehouse, the function of function get_data () for obtain desired data from database, and parameter chan is DataBase combining, and cal_number is the required date obtaining data,
data=sqlQuery(chan,sql,stringsAsFactors=FALSE)
In being wrapped by RODBC, odbcConnect () function is set up R and is connected with oracle database:
chan=odbcConnect("dm_xyz",uid='######',pwd='******')
Wherein, parameter d m_xyz is the system DSN name of data source ODBC, and uid is user name, and pwd is user login code;
After building database connects, obtain desired data in database by performing sql statement; And in sql statement, add the data cleansing rule of step one;
2): URL mates, browsing pages information is upgraded;
Read in the URL rule txt document of step 2 by this locality, according to the key word of the REQUEST fields match URL rule in data, upgrade VISIT_PAGE field, the record without occurrence is then set to "-1 ";
Step3: feature calculation cal_feature (data)
The function of function cal_feature is the odd-numbered day Data Segmentation that get_data function obtains is become different to browse session, calculates feature finally obtain characteristic data set to single session;
Step 4: potential customers' model of cognition performance verification
Utilizing oracle edit and storage process, judging that the potential customers that dig out are digging out the real ratio bought in day after date one month, as model performance checking index.
2. potential customers' method for digging according to claim 1, is characterized in that, in units of a session, is analyzed as follows feature, finally forms training sample:
Feature 1: access source, according to first referer of each session, distinguishes the access source of client; Advertisement putting, search engine, mail, directly access, other;
Feature 2: browsing pages number, gets the page number that in a session, user browses;
Feature 3: browse product details number of pages, gets user in a session and browses product details number of pages;
Feature 4: page browsing duration, in a session, the duration of browsing of each page of user is averaged; ;
Feature 5: product details page browsing duration, in a session, the duration of browsing of each product details page of user is averaged;
Feature 6: the number of times checking filtered list, observing ruquest in a session has ' number of times of viewsearchlist ';
Feature 7: whether check insurance topic; Observe in ruquest and whether have ' baoxian ';
Feature 8: whether check insurance dictionary; Observe in ruquest and whether have ' toptag ';
Feature 9: browse the period; A session access time visit_time first
Feature 10: whether check shopping cart, be ' 1 ', no is ' 0 '; Whether check shopping cart observes in ruquest whether have shopping_car;
Category attribute: whether purpose is bought, and be ' 1 ', no is ' 0 '; Whether purpose is bought is judge whether have confirm in ruquest, containing confirm, the buying behavior of generation purpose is described.
3. potential customers' method for digging according to claim 1, is characterized in that, the feature extraction of data processing is:
(1) cookie information comprises
It is not the record value of '-1 ' that #cookie information gets VINFO field first
(2) whether log in
(3) browsing pages number and the page on average browse duration
# browsing pages number
page_tag=which(tmp$VISIT_PAGE!='-1')
page_n=length(page_tag)
# product details page number of pages
Prod_tag=which (tmp $ VISIT_PAGE==' product details page ')
page_prod=length(prod_tag)
The # page and product details page browse duration (page_n-1)
The request time of time=strptime (tmp [page_tag, 2], " %Y-%m-%d%H:%M:%S ") # actual pages
Stay_time=as.vector (diff (time)) the # page request mistiming is as the residence time of the page
Page_time_avg=sum (stay_time, na.rm=T)/page_n# page average access duration
Prod_time=stay_time [which (page_tag%in%prod_tag)] # product details page total residence time
Prod_time_avg=sum (prod_time, na.rm=T)/average duration of page_prod# product details page
(4) number of times of filtered list is checked
# filtered list number of times (0 expression is not searched for)
Sear_n=length (which (tmp $ VISIT_PAGE==" search listing "))
(5) whether shopping cart was checked
(6) insurance topic whether was checked
(7) insurance dictionary whether was checked
(8) type of source web, judges source web according to REFERER and REFERER_SOURCE_WORD field keys.
(9) time period of browsing, the different browsing time section of user is distinguished according to VISIT_TIME field.
(10) whether purpose purchase and predictive variable
Final synthesis obs observational record returns, and screening first purchase and the user do not bought after browsing, form training sample.
CN201510903856.4A 2015-12-09 2015-12-09 Potential customer mining method based on customer behavior characteristics Pending CN105488697A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510903856.4A CN105488697A (en) 2015-12-09 2015-12-09 Potential customer mining method based on customer behavior characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510903856.4A CN105488697A (en) 2015-12-09 2015-12-09 Potential customer mining method based on customer behavior characteristics

Publications (1)

Publication Number Publication Date
CN105488697A true CN105488697A (en) 2016-04-13

Family

ID=55675664

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510903856.4A Pending CN105488697A (en) 2015-12-09 2015-12-09 Potential customer mining method based on customer behavior characteristics

Country Status (1)

Country Link
CN (1) CN105488697A (en)

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106250481A (en) * 2016-07-29 2016-12-21 深圳市永兴元科技有限公司 Data digging methods based on big data and device
CN106294587A (en) * 2016-07-28 2017-01-04 焦点科技股份有限公司 Special topic module drainage effect methods of exhibiting in the website of a kind of Rapid Implementation
CN106294823A (en) * 2016-08-17 2017-01-04 上海云信留客信息科技有限公司 Abnormality detection and the method for elimination for big data cleansing
CN106294778A (en) * 2016-08-11 2017-01-04 北京小米移动软件有限公司 Information-pushing method and device
CN106375431A (en) * 2016-08-31 2017-02-01 北京城市网邻信息技术有限公司 Business opportunity recommendation method and device
CN106713034A (en) * 2016-12-23 2017-05-24 广州帷策智能科技有限公司 Wechat public account making user group activation monitoring method and apparatus
CN106991175A (en) * 2017-04-06 2017-07-28 百度在线网络技术(北京)有限公司 A kind of customer information method for digging, device, equipment and storage medium
CN107516236A (en) * 2017-07-22 2017-12-26 长沙兔子代跑网络科技有限公司 A kind of method and device that generation race client is excavated according to user behavior data
CN107526778A (en) * 2017-07-22 2017-12-29 长沙兔子代跑网络科技有限公司 A kind of method and device that generation race client is excavated according to user behavior data
CN107590688A (en) * 2017-08-24 2018-01-16 平安科技(深圳)有限公司 The recognition methods of target customer and terminal device
CN107679889A (en) * 2017-09-08 2018-02-09 平安科技(深圳)有限公司 The recognition methods of potential customers a kind of and terminal device
CN107862556A (en) * 2017-12-04 2018-03-30 北京奇艺世纪科技有限公司 A kind of put-on method and system of VIP advertisements
CN107886382A (en) * 2016-09-29 2018-04-06 北京京东尚科信息技术有限公司 The method, apparatus and system of channel drainage effect in analyzing web site station
CN107944913A (en) * 2017-11-21 2018-04-20 重庆邮电大学 High potential user's purchase intention Forecasting Methodology based on big data user behavior analysis
CN108053263A (en) * 2017-12-28 2018-05-18 北京金堤科技有限公司 The method and device of potential user's data mining
CN108256907A (en) * 2018-01-09 2018-07-06 北京腾云天下科技有限公司 A kind of construction method and computing device of customer grouping model
CN108683949A (en) * 2018-05-18 2018-10-19 北京奇艺世纪科技有限公司 A kind of extracting method and device of live streaming platform potential user
WO2018191918A1 (en) * 2017-04-20 2018-10-25 Beijing Didi Infinity Technology And Development Co., Ltd. System and method for learning-based group tagging
CN108932625A (en) * 2017-05-23 2018-12-04 北京京东尚科信息技术有限公司 Analysis method, device, medium and the electronic equipment of user behavior data
CN109191169A (en) * 2018-07-19 2019-01-11 国政通科技有限公司 Precisely hit the method for high-end tourism potential user
CN109255638A (en) * 2017-07-13 2019-01-22 北京融和友信科技股份有限公司 A kind of mathematical model for excavating potential customers
CN109558396A (en) * 2018-10-24 2019-04-02 深圳市万屏时代科技有限公司 A kind of user demand data cleaning method and system
CN109977977A (en) * 2017-12-28 2019-07-05 中移信息技术有限公司 A kind of method and corresponding intrument identifying potential user
CN109983490A (en) * 2016-10-06 2019-07-05 邓白氏公司 The Machine learning classifiers and prediction engine that the potential customers of artificial intelligence optimization determine are carried out on winning/losing classification
CN110222272A (en) * 2019-04-18 2019-09-10 广东工业大学 A kind of potential customers excavate and recommended method
CN110232589A (en) * 2019-05-16 2019-09-13 浙江华坤道威数据科技有限公司 A kind of intention customer analysis system based on big data
CN110476159A (en) * 2017-03-30 2019-11-19 日本电气株式会社 Information processing system, characteristic value illustration method and characteristic value read-me
CN110637316A (en) * 2016-12-22 2019-12-31 奥恩全球运营有限公司,新加坡分公司 System and method for intelligent prospective object recognition using online resources and neural network processing to classify tissue based on published material
CN111091282A (en) * 2019-12-10 2020-05-01 焦点科技股份有限公司 Customer loyalty segmentation method based on user behavior data
CN111292194A (en) * 2018-12-06 2020-06-16 泰康保险集团股份有限公司 Online insurance client data processing method, device, medium and electronic equipment
CN111611514A (en) * 2020-04-11 2020-09-01 上海淇玥信息技术有限公司 Page display method and device based on user login information and electronic equipment
US10769159B2 (en) 2016-12-22 2020-09-08 Aon Global Operations Plc, Singapore Branch Systems and methods for data mining of historic electronic communication exchanges to identify relationships, patterns, and correlations to deal outcomes
CN111681051A (en) * 2020-06-08 2020-09-18 上海汽车集团股份有限公司 Purchase intention prediction method, device, storage medium and terminal
CN111754253A (en) * 2019-06-20 2020-10-09 北京沃东天骏信息技术有限公司 User authentication method, device, computer equipment and storage medium
CN111914187A (en) * 2020-07-23 2020-11-10 向杰 Method for recommending commodities and tracking recommending relation chain
CN112070519A (en) * 2019-06-11 2020-12-11 中国科学院沈阳自动化研究所 A prediction method based on data global search and feature classification
US10951695B2 (en) 2019-02-14 2021-03-16 Aon Global Operations Se Singapore Branch System and methods for identification of peer entities
CN112598007A (en) * 2021-03-04 2021-04-02 浙江所托瑞安科技集团有限公司 Method, device and equipment for screening picture training set and readable storage medium
CN112667911A (en) * 2021-01-14 2021-04-16 中山世达模型制造有限公司 Method for searching potential customers by using social software big data
TWI735932B (en) * 2019-08-21 2021-08-11 崑山科技大學 Real estate potential customer forecasting system and method thereof
CN113538025A (en) * 2020-04-14 2021-10-22 中国移动通信集团浙江有限公司 Replacement prediction method and device for terminal equipment
CN113554460A (en) * 2021-07-19 2021-10-26 北京沃东天骏信息技术有限公司 Method and device for identifying potential user
CN113934616A (en) * 2021-12-16 2022-01-14 深圳市活力天汇科技股份有限公司 Method for judging abnormal user based on user operation time sequence
CN114519608A (en) * 2022-02-15 2022-05-20 平安证券股份有限公司 Business opportunity extraction method, device, medium and electronic equipment
CN116418881A (en) * 2023-04-18 2023-07-11 吉林省禹语网络科技有限公司 Data intelligent processing method for E-commerce big data cloud edge cooperative transmission
CN116523600A (en) * 2023-05-05 2023-08-01 佛山市大迈信息科技有限公司 Customer classification method and system based on behavior analysis
CN116523572A (en) * 2023-06-28 2023-08-01 悦享星光(北京)科技有限公司 Client mining method and system based on client behavior characteristics
US12361389B2 (en) 2022-10-26 2025-07-15 Volvo Car Corporation Vehicle sharing service optimization

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6839680B1 (en) * 1999-09-30 2005-01-04 Fujitsu Limited Internet profiling
CN102542335A (en) * 2011-06-16 2012-07-04 广州市龙泰信息技术有限公司 Mixed data mining method
CN104142960A (en) * 2013-05-10 2014-11-12 上海普华诚信信息技术有限公司 Internet data analysis system
CN105069654A (en) * 2015-08-07 2015-11-18 新一站保险代理有限公司 User identification based website real-time/non-real-time marketing investment method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6839680B1 (en) * 1999-09-30 2005-01-04 Fujitsu Limited Internet profiling
CN102542335A (en) * 2011-06-16 2012-07-04 广州市龙泰信息技术有限公司 Mixed data mining method
CN104142960A (en) * 2013-05-10 2014-11-12 上海普华诚信信息技术有限公司 Internet data analysis system
CN105069654A (en) * 2015-08-07 2015-11-18 新一站保险代理有限公司 User identification based website real-time/non-real-time marketing investment method and system

Cited By (75)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294587A (en) * 2016-07-28 2017-01-04 焦点科技股份有限公司 Special topic module drainage effect methods of exhibiting in the website of a kind of Rapid Implementation
CN106294587B (en) * 2016-07-28 2019-05-10 焦点科技股份有限公司 Thematic module drainage effect methods of exhibiting in a kind of website of Rapid Implementation
CN106250481A (en) * 2016-07-29 2016-12-21 深圳市永兴元科技有限公司 Data digging methods based on big data and device
CN106294778B (en) * 2016-08-11 2019-09-10 北京小米移动软件有限公司 Information-pushing method and device
CN106294778A (en) * 2016-08-11 2017-01-04 北京小米移动软件有限公司 Information-pushing method and device
CN106294823A (en) * 2016-08-17 2017-01-04 上海云信留客信息科技有限公司 Abnormality detection and the method for elimination for big data cleansing
CN106294823B (en) * 2016-08-17 2019-03-22 上海云信留客信息科技有限公司 The method of abnormality detection and elimination for big data cleaning
CN106375431B (en) * 2016-08-31 2019-12-31 北京城市网邻信息技术有限公司 Business opportunity recommendation method and device
CN106375431A (en) * 2016-08-31 2017-02-01 北京城市网邻信息技术有限公司 Business opportunity recommendation method and device
CN107886382B (en) * 2016-09-29 2021-11-30 北京京东尚科信息技术有限公司 Method, device and system for analyzing channel drainage effect in website
CN107886382A (en) * 2016-09-29 2018-04-06 北京京东尚科信息技术有限公司 The method, apparatus and system of channel drainage effect in analyzing web site station
CN109983490A (en) * 2016-10-06 2019-07-05 邓白氏公司 The Machine learning classifiers and prediction engine that the potential customers of artificial intelligence optimization determine are carried out on winning/losing classification
CN109983490B (en) * 2016-10-06 2023-08-29 邓白氏公司 Machine learning classifier and prediction engine for potential customer determination of artificial intelligence optimization on winning/losing classifications
CN110637316A (en) * 2016-12-22 2019-12-31 奥恩全球运营有限公司,新加坡分公司 System and method for intelligent prospective object recognition using online resources and neural network processing to classify tissue based on published material
US11455313B2 (en) 2016-12-22 2022-09-27 Aon Global Operations Se, Singapore Branch Systems and methods for intelligent prospect identification using online resources and neural network processing to classify organizations based on published materials
CN110637316B (en) * 2016-12-22 2021-04-13 奥恩全球运营有限公司,新加坡分公司 System and method for prospective object identification
US10769159B2 (en) 2016-12-22 2020-09-08 Aon Global Operations Plc, Singapore Branch Systems and methods for data mining of historic electronic communication exchanges to identify relationships, patterns, and correlations to deal outcomes
CN106713034A (en) * 2016-12-23 2017-05-24 广州帷策智能科技有限公司 Wechat public account making user group activation monitoring method and apparatus
US11727203B2 (en) 2017-03-30 2023-08-15 Dotdata, Inc. Information processing system, feature description method and feature description program
CN110476159A (en) * 2017-03-30 2019-11-19 日本电气株式会社 Information processing system, characteristic value illustration method and characteristic value read-me
CN106991175A (en) * 2017-04-06 2017-07-28 百度在线网络技术(北京)有限公司 A kind of customer information method for digging, device, equipment and storage medium
CN106991175B (en) * 2017-04-06 2020-08-11 百度在线网络技术(北京)有限公司 Customer information mining method, device, equipment and storage medium
WO2018191918A1 (en) * 2017-04-20 2018-10-25 Beijing Didi Infinity Technology And Development Co., Ltd. System and method for learning-based group tagging
CN109690571A (en) * 2017-04-20 2019-04-26 北京嘀嘀无限科技发展有限公司 Learning-based group tagging system and method
CN109690571B (en) * 2017-04-20 2020-09-18 北京嘀嘀无限科技发展有限公司 Learning-based group labeling system and method
CN108932625A (en) * 2017-05-23 2018-12-04 北京京东尚科信息技术有限公司 Analysis method, device, medium and the electronic equipment of user behavior data
CN108932625B (en) * 2017-05-23 2022-04-26 北京京东尚科信息技术有限公司 User behavior data analysis method, device, medium and electronic equipment
CN109255638A (en) * 2017-07-13 2019-01-22 北京融和友信科技股份有限公司 A kind of mathematical model for excavating potential customers
CN109255638B (en) * 2017-07-13 2022-04-26 北京融和友信科技股份有限公司 Mathematical model for mining potential customers
CN107516236A (en) * 2017-07-22 2017-12-26 长沙兔子代跑网络科技有限公司 A kind of method and device that generation race client is excavated according to user behavior data
CN107526778A (en) * 2017-07-22 2017-12-29 长沙兔子代跑网络科技有限公司 A kind of method and device that generation race client is excavated according to user behavior data
WO2019037202A1 (en) * 2017-08-24 2019-02-28 平安科技(深圳)有限公司 Method and apparatus for recognising target customer, electronic device and medium
CN107590688A (en) * 2017-08-24 2018-01-16 平安科技(深圳)有限公司 The recognition methods of target customer and terminal device
CN107679889A (en) * 2017-09-08 2018-02-09 平安科技(深圳)有限公司 The recognition methods of potential customers a kind of and terminal device
CN107944913A (en) * 2017-11-21 2018-04-20 重庆邮电大学 High potential user's purchase intention Forecasting Methodology based on big data user behavior analysis
CN107862556A (en) * 2017-12-04 2018-03-30 北京奇艺世纪科技有限公司 A kind of put-on method and system of VIP advertisements
CN108053263A (en) * 2017-12-28 2018-05-18 北京金堤科技有限公司 The method and device of potential user's data mining
CN109977977A (en) * 2017-12-28 2019-07-05 中移信息技术有限公司 A kind of method and corresponding intrument identifying potential user
CN108256907A (en) * 2018-01-09 2018-07-06 北京腾云天下科技有限公司 A kind of construction method and computing device of customer grouping model
CN108683949A (en) * 2018-05-18 2018-10-19 北京奇艺世纪科技有限公司 A kind of extracting method and device of live streaming platform potential user
CN109191169A (en) * 2018-07-19 2019-01-11 国政通科技有限公司 Precisely hit the method for high-end tourism potential user
CN109558396A (en) * 2018-10-24 2019-04-02 深圳市万屏时代科技有限公司 A kind of user demand data cleaning method and system
CN111292194A (en) * 2018-12-06 2020-06-16 泰康保险集团股份有限公司 Online insurance client data processing method, device, medium and electronic equipment
CN111292194B (en) * 2018-12-06 2023-08-22 泰康保险集团股份有限公司 Online application client data processing method and device, medium and electronic equipment
US10951695B2 (en) 2019-02-14 2021-03-16 Aon Global Operations Se Singapore Branch System and methods for identification of peer entities
CN110222272A (en) * 2019-04-18 2019-09-10 广东工业大学 A kind of potential customers excavate and recommended method
CN110232589A (en) * 2019-05-16 2019-09-13 浙江华坤道威数据科技有限公司 A kind of intention customer analysis system based on big data
CN112070519B (en) * 2019-06-11 2024-03-05 中国科学院沈阳自动化研究所 Prediction method based on data global search and feature classification
CN112070519A (en) * 2019-06-11 2020-12-11 中国科学院沈阳自动化研究所 A prediction method based on data global search and feature classification
CN111754253A (en) * 2019-06-20 2020-10-09 北京沃东天骏信息技术有限公司 User authentication method, device, computer equipment and storage medium
TWI735932B (en) * 2019-08-21 2021-08-11 崑山科技大學 Real estate potential customer forecasting system and method thereof
CN111091282A (en) * 2019-12-10 2020-05-01 焦点科技股份有限公司 Customer loyalty segmentation method based on user behavior data
CN111091282B (en) * 2019-12-10 2022-07-22 焦点科技股份有限公司 A Customer Loyalty Segmentation Method Based on User Behavior Data
CN111611514B (en) * 2020-04-11 2024-04-23 上海淇玥信息技术有限公司 Page display method and device based on user login information and electronic equipment
CN111611514A (en) * 2020-04-11 2020-09-01 上海淇玥信息技术有限公司 Page display method and device based on user login information and electronic equipment
CN113538025B (en) * 2020-04-14 2024-03-22 中国移动通信集团浙江有限公司 Replacement prediction method and device for terminal equipment
CN113538025A (en) * 2020-04-14 2021-10-22 中国移动通信集团浙江有限公司 Replacement prediction method and device for terminal equipment
CN111681051A (en) * 2020-06-08 2020-09-18 上海汽车集团股份有限公司 Purchase intention prediction method, device, storage medium and terminal
CN111681051B (en) * 2020-06-08 2023-09-26 上海汽车集团股份有限公司 Purchase intention prediction method, device, storage medium and terminal
CN111914187B (en) * 2020-07-23 2023-09-08 向杰 Commodity recommendation and recommendation relation chain tracking method
CN111914187A (en) * 2020-07-23 2020-11-10 向杰 Method for recommending commodities and tracking recommending relation chain
CN112667911A (en) * 2021-01-14 2021-04-16 中山世达模型制造有限公司 Method for searching potential customers by using social software big data
CN112598007B (en) * 2021-03-04 2021-05-18 浙江所托瑞安科技集团有限公司 Method, device and equipment for screening picture training set and readable storage medium
CN112598007A (en) * 2021-03-04 2021-04-02 浙江所托瑞安科技集团有限公司 Method, device and equipment for screening picture training set and readable storage medium
CN113554460A (en) * 2021-07-19 2021-10-26 北京沃东天骏信息技术有限公司 Method and device for identifying potential user
CN113554460B (en) * 2021-07-19 2024-10-22 北京沃东天骏信息技术有限公司 Potential user identification method and device
CN113934616B (en) * 2021-12-16 2022-03-18 深圳市活力天汇科技股份有限公司 Method for judging abnormal user based on user operation time sequence
CN113934616A (en) * 2021-12-16 2022-01-14 深圳市活力天汇科技股份有限公司 Method for judging abnormal user based on user operation time sequence
CN114519608A (en) * 2022-02-15 2022-05-20 平安证券股份有限公司 Business opportunity extraction method, device, medium and electronic equipment
US12361389B2 (en) 2022-10-26 2025-07-15 Volvo Car Corporation Vehicle sharing service optimization
CN116418881A (en) * 2023-04-18 2023-07-11 吉林省禹语网络科技有限公司 Data intelligent processing method for E-commerce big data cloud edge cooperative transmission
CN116418881B (en) * 2023-04-18 2024-06-04 湖南供销电子商务股份有限公司 Data intelligent processing method for E-commerce big data cloud edge cooperative transmission
CN116523600A (en) * 2023-05-05 2023-08-01 佛山市大迈信息科技有限公司 Customer classification method and system based on behavior analysis
CN116523572A (en) * 2023-06-28 2023-08-01 悦享星光(北京)科技有限公司 Client mining method and system based on client behavior characteristics
CN116523572B (en) * 2023-06-28 2023-09-08 悦享星光(北京)科技有限公司 Client mining method and system based on client behavior characteristics

Similar Documents

Publication Publication Date Title
CN105488697A (en) Potential customer mining method based on customer behavior characteristics
US9015185B2 (en) Ontology based recommendation systems and methods
KR100786795B1 (en) Internet advertising service system and method
EP2704080A1 (en) Recommendation systems and methods
US20140289239A1 (en) Recommendation tuning using interest correlation
CN115544242B (en) Big data-based similar commodity model selection recommendation method
US10679227B2 (en) Systems and methods for mapping online data to data of interest
Alazab et al. Maximising competitive advantage on E-business websites: A data mining approach
CN113901308A (en) Enterprise recommendation method, recommendation device and electronic equipment based on knowledge graph
KR20080026948A (en) How to extract related keyword groups
CN106530017A (en) Online store discount coupon automatic acquisition and shopping combination recommendation method
CN119130603A (en) An interest recommendation algorithm combining user behavior data
JP2007264718A (en) User interest analysis device, method, program
CN105022830A (en) Weighting trajectory data set construction method based on user behaviors
Moazzam et al. Customer opinion mining by comments classification using machine learning
Zhao et al. Anatomy of a web-scale resale market: a data mining approach
CN120106939A (en) Commodity search method and its device, equipment and medium
Fitrianah et al. Analysis of consumer purchase patterns on handphone accessories sales using fp-growth algorithm
KR20100046421A (en) Method and server for estimating preference of commodity
Granov Customer loyalty, return and churn prediction through machine learning methods: for a Swedish fashion and e-commerce company
Agrawal et al. Pros and cons of web mining in E-Commerce
Zhao The review of web mining in e-commerce
Li et al. Incorporating both positive and negative association rules into the analysis of outbound tourism in Hong Kong
JP7716202B2 (en) Information processing device, information processing method, and program
CN112069388B (en) Entity recommendation method, system, computer device and computer readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160413

RJ01 Rejection of invention patent application after publication