[go: up one dir, main page]

CN111831706B - A method, device and storage medium for mining association rules between applications - Google Patents

A method, device and storage medium for mining association rules between applications

Info

Publication number
CN111831706B
CN111831706B CN202010612788.7A CN202010612788A CN111831706B CN 111831706 B CN111831706 B CN 111831706B CN 202010612788 A CN202010612788 A CN 202010612788A CN 111831706 B CN111831706 B CN 111831706B
Authority
CN
China
Prior art keywords
application
applications
community
calculating
time period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010612788.7A
Other languages
Chinese (zh)
Other versions
CN111831706A (en
Inventor
陈光勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Big Data Technologies Co Ltd
Original Assignee
New H3C Big Data Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Big Data Technologies Co Ltd filed Critical New H3C Big Data Technologies Co Ltd
Priority to CN202010612788.7A priority Critical patent/CN111831706B/en
Publication of CN111831706A publication Critical patent/CN111831706A/en
Application granted granted Critical
Publication of CN111831706B publication Critical patent/CN111831706B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本公开提供了一种应用之间关联规则的挖掘方法、装置及存储介质。本公开基于时间分段计算应用的评价使用率作为最小支持度,将所有大于最小支持度的应用筛选出来构成频繁项集L1,基于L1进行应用的两两组合,计算应用两两之间可信度得到可信度集合L2,从L2中筛选出前一应用到后一应用的可信度大于后一应用的最小支持度与预设常数的乘积的项,构成表达应用之间关联规则的集合R2。进一步基于R2构造图并使用社区划分算法进行社区划分。本公开可根据挖掘的关联规则及社区划分将应用主动推送给使用相关应用较多的用户,并可对识别出的应用社区中的核心应用进行重点运维保障,提高关联规则挖掘的智能性、灵活性,以及提高了应用维护系统的效率。

The present disclosure provides a method, device and storage medium for mining association rules between applications. The present disclosure calculates the evaluation usage rate of applications based on time segments as the minimum support, filters out all applications that are greater than the minimum support to form a frequent item set L1, performs pairwise combinations of applications based on L1, calculates the credibility between applications to obtain a credibility set L2, and filters out items from L2 whose credibility from the previous application to the next application is greater than the product of the minimum support of the next application and a preset constant, forming a set R2 that expresses the association rules between applications. Further, a graph is constructed based on R2 and community division is performed using a community division algorithm. The present disclosure can actively push applications to users who use related applications more based on the mined association rules and community divisions, and can focus on operation and maintenance of core applications in the identified application community, thereby improving the intelligence and flexibility of association rule mining and improving the efficiency of the application maintenance system.

Description

Mining method, device and storage medium for association rules between applications
Technical Field
The disclosure relates to the technical field of internet and communication, and in particular relates to a mining method, device and storage medium for association rules between applications.
Background
With the rapid development of information technology, the internet industry has accumulated a huge amount of network traffic data, and potential information is difficult to discover by simply inquiring and counting the database, so that more intelligent methods are urgently needed to mine more valuable information. Data mining is a multidisciplinary field that incorporates the latest technological research results of database technology, artificial intelligence, machine learning, statistics, knowledge engineering, information retrieval, etc. Large data mining for network traffic is certainly one of the most important areas.
With the rapid development of various applications in the internet, the network can help people to do more and more, and the dependence of people on the network is stronger. More and more can be solved by web applications, and application access traffic presents an explosive growth. When the user uses the application, the user often follows a certain rule, observes the behavior of the user using the application, can grasp the user characteristics, and performs operations such as classification, clustering, association and the like on the user, and then uses a prediction system, a recommendation system and the like.
Correlation analysis is a task to find relationships in large-scale datasets. These relationships can take two forms, frequent item sets or association rules. A frequent item set is a collection of items that often appear together, and association rules suggest that there may be a strong relationship between two items. The Apriori algorithm is a classical data mining algorithm that mines frequent item sets and association rules, compressing the search space by using a priori properties of the frequent item sets.
Because of various defects of the existing association rule mining method between applications, association rules between mining applications cannot be well analyzed, association rules for applications with low utilization rate are less easily found, and for applications with interdependence relations, the core applications in the applications cannot be identified, so that the overall maintenance efficiency is low.
Disclosure of Invention
In view of the above, the present disclosure provides a method and an apparatus for mining association rules between applications, which are used for solving the technical problems that the existing application association rules mining is inaccurate and the core application of the application community cannot be identified.
Based on the embodiment of the disclosure, the disclosure provides a mining method for association rules between applications, the method comprising:
In a plurality of time periods of analysis duration, grouping statistics is carried out according to application use information corresponding to the user identification code, and an application set used by each user in each time period is obtained;
calculating the use rate of each application in different time periods, and taking the average value of the use rate of each application in different time periods as the minimum support degree of each application;
For each time period, calculating the support degree of each application in the time period, and screening out the applications with the support degree greater than the minimum support degree per se in the time period to form a term set L1;
combining the items in the L1 in pairs to form a candidate item set C2 for expressing the application relation of the two applications, and calculating the credibility of each item in the C2 to obtain a credibility set L2 for expressing the application relation of the two applications;
And (3) selecting items from the L2, wherein the credibility of the previous application to the next application is larger than the product of the minimum support of the next application and a preset constant, and obtaining a set R2 for expressing the association rule between the applications.
Further, according to an embodiment of the present disclosure, the method further includes:
Respectively taking the front application and the rear application of each item in the association rule set R2 as points in the graph, taking the credibility of the front application and the rear application as the weight of the edge between the two points, and constructing an application association graph;
And carrying out community division on the application association graph by adopting a community division algorithm to obtain an application community.
Further, according to an embodiment of the present disclosure, the method further includes:
and calculating the weight sum of the edges of the nodes corresponding to each application in each application community, and determining the core application of the application community by the application with the maximum weight sum of the edges in each application community.
Further, before the calculating of the minimum support of each application, the method further includes filtering out application data with a usage rate greater than a preset threshold.
Further, based on the embodiment of the present disclosure, the community division algorithm is a Louvain algorithm, and the preset constant is a constant greater than 1.2.
Based on the embodiment of the disclosure, the disclosure further provides an excavating device for applying the association rule between applications, where the device includes:
the grouping statistics module is used for grouping statistics with application use information corresponding to the user identification codes in a plurality of time periods of analysis duration to obtain application sets used by each user in each time period;
the minimum support calculation module is used for calculating the use rate of each application in different time periods, and taking the average value of the use rate of each application in different time periods as the minimum support of each application;
the application screening module is used for calculating the support degree of each application in each time period according to each time period, and screening out the application with the support degree greater than the minimum support degree in the time period to form a term set L1;
the credibility calculation module is used for combining the items in the L1 in pairs to form a candidate item set C2 for expressing two application sequence application relations, and calculating the credibility of each item in the C2 to obtain a credibility set L2 for expressing two application sequence application relations;
And the association rule determining module is used for screening items from the L2, the credibility of which from the previous application to the next application is greater than the product of the minimum support of the next application and a preset constant, and obtaining a set R2 for expressing association rules between the applications.
Further, according to an embodiment of the present disclosure, the apparatus further includes:
The community dividing module is used for respectively taking the front application and the rear application of each item in the association rule set R2 as points in the graph, taking the credibility of the front application and the rear application as the weight of the edge between the two points, and constructing an application association graph;
The core application identification module is used for calculating the weight sum of the edges of the nodes corresponding to each application in each application community, and determining the core application of the application community by the application with the largest weight sum of the edges in each application community.
Further, according to an embodiment of the present disclosure, the apparatus further includes:
The minimum support calculation module is further configured to filter out application data with a usage rate greater than a preset threshold before the minimum support of each application is calculated.
The present disclosure also provides a storage medium, in which a computer program is stored, and when the computer program in the storage medium is read and executed by a processor, the step functions of the mining method for association rules between any application provided in the present disclosure are completed.
The method comprises the steps of calculating the evaluation utilization rate of an application based on time segmentation as the minimum support, screening out all applications larger than the minimum support to form a frequent item set L1, carrying out application pairwise combination based on the L1, calculating the credibility between the applications to obtain a credibility set L2, screening out items from the L2, from the former application to the latter application, of which the credibility is larger than the product of the minimum support of the latter application and a preset constant, and forming a set R2 for expressing association rules between the applications. And further constructing a graph based on the R2 and carrying out community division by using a community division algorithm. The method and the system can actively push the application to the users with more related applications according to the mined association rules and community division, can carry out key operation and maintenance guarantee on the core application in the identified application community, improve the intelligence and flexibility of association rule mining, and improve the efficiency of an application maintenance system.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the following description will briefly describe the drawings that are required to be used in the embodiments of the present disclosure or the description in the prior art, and it is apparent that the drawings in the following description are only some embodiments described in the present disclosure, and other drawings may also be obtained according to these drawings of the embodiments of the present disclosure for those skilled in the art.
Fig. 1 is a flowchart of a mining method for applying association rules between applications according to an embodiment of the present disclosure;
Fig. 2 is a block diagram of an excavating device applying association rules according to an embodiment of the present disclosure.
Detailed Description
The terminology used in the embodiments of the disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the embodiments of the disclosure. As used in the presently disclosed embodiments and in the claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. The term "and/or" as used in this disclosure refers to any or all possible combinations including one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in embodiments of the present disclosure to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of embodiments of the present disclosure. Depending on the context, furthermore, the word "if" used may be interpreted as "at..once" or "when..once" or "in response to a determination".
It is one of the primary objects of embodiments of the present disclosure to provide a mining method for inter-application association rules, which is an improvement of the mining method for inter-application association rules based on Apriori algorithm. The Apriori algorithm is a classical data mining algorithm that mines frequent item sets and association rules. The algorithm uses a priori the nature of the frequent item set, i.e. all non-empty subsets of the frequent item set must also be frequent. The Apriori algorithm uses an iterative approach called a layer-by-layer search, in which a set of k terms is used to explore a set of (k+1) terms.
In the present disclosure, some basic terms used are explained as follows in the meaning represented in the present disclosure:
Item and item set let appset = { app 1,app2,...,appm } is the set of all applications, where app i is called the item and subscript i represents the ith application. The set of items is referred to as the item set appset, and the item set containing k items is referred to as the application k item set, denoted as k-appset.
Support-the proportion of records in the data set that contain the set of items to the total records is used to measure how often a collection appears in the original data. For example, the meaning of the support degree of the i-th application app i is that the proportion of the number of users using the application to the total number of users is expressed as follows:
Confidence measures the probability that a rule will occur, e.g., the confidence that the user uses app n immediately after app i is equal to the support of app i and app n, which are applied in succession, divided by the support of app i alone, as follows:
The embodiment of the disclosure firstly completes the preprocessing of the network traffic by using a traffic collector, wherein the preprocessing comprises the collection of the network traffic, the data matching and filtering, the network application data merging based on the user and the like. Based on preprocessing, the embodiment of the disclosure provides a minimum support degree determining method for calculating association rules between applications in a preset analysis duration range based on time period grouping, and designs an association analysis framework for a user to use the applications, wherein the association analysis framework comprises data processing, data analysis, data mining and application community division.
Fig. 1 is a step flowchart of a mining method for applying an association rule between applications, which is provided in an embodiment of the present disclosure, and the method includes:
And 101, in a plurality of time periods of analysis duration, grouping and counting application use information corresponding to the user identification codes to obtain an application set used by each user in each time period.
One of the purposes of the present disclosure is to analyze the relevance between two applications used by a user simultaneously within a certain time frame, and in order to more accurately determine the relevance rule between the applications, the present disclosure proposes a scheme of segmenting analysis duration and then segment statistics. The analysis duration is used to determine the time frame for performing the analysis of applying the association rule, and may be, for example, 1 day, 1 week, or the like. The time period may be divided based on user living habits or application usage habits in different geographical environments, or divided in fixed time units, etc. The division of the time period needs to consider the analysis precision and efficiency of the association rule, and if the time period is too short, the association between certain applications may be small, and if the time period is too long, the application set may be too large, so that the analysis efficiency is low.
Step 102, calculating the use rate of each application in different time periods, and taking the average value of the use rate of each application in different time periods as the minimum support degree of each application.
The method comprises the steps of calculating the use rate of each application in different time periods based on the obtained application set data used by each user in each time period in the analysis time period range, and taking the average value of the use rates of each application in different time periods as the minimum support degree of each application. The minimum support is a dynamic average value, is more flexible and intelligent relative to a fixed threshold, and can avoid that the application with low use ratio is not easy to find.
Step 103, calculating the support degree of each application in each time period, and screening out the application with the support degree greater than the minimum support degree in the time period to form a term set L1.
Step 104, combining the items in the L1 two by two to form a candidate item set C2 for expressing the two application precedence application relations, and calculating the credibility of each item in the C2 to obtain a credibility set L2 for expressing the two application precedence application relations.
Step 105, selecting items from the L2, wherein the credibility from the previous application to the next application is larger than the product of the minimum support of the next application and a preset constant, and obtaining a set R2 for expressing the association rule between the applications.
In the embodiment of the disclosure, when the association rule between two applications is mined, in order to ensure the reliability of the association rule, when judging whether the two applications are strong association rules, the relation between the reliability and the minimum support of the latter application is further judged on the basis of calculating the reliability of one application before and after the two applications, and only when the reliability of the two applications used successively is greater than the product of the minimum support of the latter application and a preset constant, the reliability of the association rule is further ensured.
In an embodiment of the present disclosure, to identify and partition application communities, mining community relationships between applications, and identifying core applications in an application community, the method further includes:
And 106, respectively taking the front application and the rear application of each item in the association rule set R2 as points in the graph, and taking the credibility of the front application and the rear application as the weight of the edge between the two points to construct an application association graph.
And 107, carrying out community division on the application association graph by adopting a community division algorithm to obtain an application community.
According to the method, the application association diagram is processed through a community division algorithm to obtain one or more application communities, the application communities represent community relations among a group of applications, and application shops can recommend applications in the same community to users based on the application communities, so that the intellectualization and user experience of similar application shops are improved.
And 108, calculating the weight sum of the edges of the nodes corresponding to each application in each application community, and determining the core application of the application community by using the application with the maximum weight sum of the edges in each application community.
By determining the core application in the application community, the core application in the application community can be found, key maintenance is carried out on the core application, the availability and stability of the operation and maintenance system can be improved, and the maintenance efficiency and the user experience are improved.
Based on the above embodiments of the present disclosure, the present disclosure improves the existing application association rule mining method, and by calculating the application usage rates in different time periods, the present disclosure automatically sets a support threshold for each application, thereby improving the intelligence and adaptability of the application association rule mining method, and solving the problem that the conventional Apriori algorithm cannot take all conditions into account. In addition, when judging whether the rule is a strong association rule, the method is compared with the minimum support degree of the latter application, so that the credibility is further ensured. The embodiment of the disclosure further takes association rules among mined applications as points and edges in a graph algorithm, obtains an application community graph by using a community partitioning algorithm, and applies the closely-related applications to the same community. The application association rule can predict the behavior of a user accessing the application, a certain application possibly needed is recommended to the user, and the application community graph can actively push the application in the same community to the user using the community application.
The following describes the steps of the mining method for association rules between applications provided in the present disclosure in detail in connection with specific embodiments.
Step 201, acquiring user network flow data by using a flow acquisition technology, and acquiring application use information from the user network flow data, wherein the application use information at least comprises application use information such as a user identification code, use time and the like.
The step first obtains network data by using a flow collector and stores the data in a database. The data is then processed using python, including filtering, grouping, integration, etc.
The network flow data is acquired by utilizing the deep packet detection technology of the flow collector, and the deep packet detection technology not only can acquire data such as source ip, destination ip, source port, destination port, protocol and the like in the flow packet, but also can perform content detection and deep decoding on application layer data. But the data redundancy obtained by deep packet inspection is too much, only the needed application usage information needs to be extracted from it. Such as user identification code/user identification, application name, time of use, traffic, number of packets, etc. These data are stored in a data repository to form a data table for applying the correlation analysis.
Correlation analysis data sheet
Fields Meaning of field
user_id User identification code
apply_name Application name
log_time Service time
flow Flow rate
pack Number of packets
app_class Application class
And 202, grouping and counting the acquired application use information by using the user identification code in a plurality of time periods of the analysis duration to obtain an application set used by each user in each time period.
An embodiment of the present disclosure sets the analysis duration to one day, dividing 24 hours a day into 7 periods. The time period from zero point to five points is late night time period, six points to eight points are early morning time period, nine points to eleven points are afternoon time period, twelve points and thirteen points are midday time period, fourteen points to seventeen points are afternoon time period, eighteen points and nineteen points are evening time period, and twenty points to twenty three points are night time period.
Time segment division
For some popular applications, such as WeChat, the independence of the applications is relatively high, the applications are often used independently, the flow data of the applications need to be filtered out from the flow data collected by the collector, otherwise, the mined frequent item sets are almost all the popular applications, so that the discovery of other application relations with low use amount is not facilitated, and the mined rule meanings are not great. Therefore, in an embodiment of the present disclosure, by calculating the usage rate of each application, application data with the usage rate greater than a preset threshold is filtered out, for example, application data with a quartile range of 1.5 times is filtered out, where the quartile range is the difference between the third quartile and the first quartile.
And carrying out grouping statistics on the processed data according to different user identification codes in different time periods, merging applications used by users, so as to obtain application sets used by each user in each time period, and carrying out association rule mining by taking the sets as input.
Step 203, calculating the use rate of each application in different time periods, and taking the average value of the use rate of each application in different time periods as the minimum support degree of each application.
The Apriori algorithm uses an iterative approach called a layer-by-layer search, in which a set of k terms is used to explore a set of (k+1) terms. The embodiment of the disclosure only searches 2 frequent item sets, firstly, the collected data sets are scanned, the count of each item is accumulated, the item meeting the minimum support degree is collected, and the set of 1 frequent item sets is found and recorded as L1. Then, find the set L2 of frequent 2 item sets using L1, and finally get the applied association rule R2.
The step is to calculate the usage rate of each application in different time periods based on the obtained application set data used by each user in each time period in the total analysis time period range, and the average value of the usage rate of each application in different time periods is taken as the minimum support degree of each application
Where user_num is the total number of users in the t period, user_app m _n is the number of users in the t period using app m.
Step 204, for each time period, obtaining the support degree of each application in the time period, and screening out the applications with the support degree greater than the minimum support degree in the time period, so as to form a frequent 1 item set L1.
In this step, a candidate set C1 is constructed for each time period, C1 is a 1-dimensional candidate set, the statistics and operation are performed on the number of times of application for each time period to obtain C1, and each element of C1 contains information of an application identifier and an application support degree, for example:
Then, C1 of each time period is scanned, whether each application in the time period is larger than the minimum support degree is judged, and the application which is larger than the minimum support degree in each time period is formed into a frequent 1 item set L1.
Step 205, combining the elements in the L1 in pairs to form a candidate item set C2 for expressing the two application precedence application relations, and calculating the credibility of each item in the C2 to obtain a credibility set L2 for expressing the two application precedence application relations.
In the step, elements in the L1 are combined two by two to form a candidate set C2, and each item of the candidate set C2 comprises application identifiers of two applications and support degree information of the combination of the two applications:
The set (app a,appb) calculates the order relationship between applications when computing the two sets L2, irrespective of the order of applications, for each item in C2, the following two trustworthiness is calculated:
And
The confidence level is calculated for each item in C2 to obtain a set L2.
Each item in L2 contains association rules and trust information between two applications.
Step 206, selecting items from L2, wherein the credibility from the previous application to the next application is larger than the product of the minimum support of the next application and a preset constant, and forming a set R2 for expressing the association regulation between the two applications.
The scan L2 judges whether the credibility of each item is larger than the product of the minimum support of the next application and a preset constant in the items, and the items larger than the product of the minimum support of the next application and the preset constant form a set R2 for expressing the association regulation between the two applications.
Where K is a preset constant, in an embodiment of the disclosure, the value of K is a constant value greater than 1.2, for example, 1.4.
Through the steps, the association rules of the application access by the user in different time periods can be obtained, the behavior of the application access by the user can be predicted according to the association rules, and meanwhile, the application with the association relationship is recommended to the user.
Step 207, respectively taking the front and back applications of each item in the association rule set R2 as points in the graph, and taking the credibility of the front and back applications as the weight of the edge between the two points to construct an application association graph.
This step takes each app a,appb in the association rule set R2 as a point,As the weight of the edges of the point app a to the point app b, an application association graph G is constructed for the purpose of application community division.
And step 208, taking the application association graph G as input of a community division algorithm to obtain different application communities divided based on the application association relationship, wherein the application in the communities is closely related, and the application in the communities is sparsely related.
The modularity is used to measure the quality of a community network division and can be simply understood as the sum of all edge weights inside the community minus the sum of edge weights connected with the community. The definition is as follows:
wherein A ij represents the weight of the edge between node i and node j, k i=∑jAij represents the sum (degree) of the weights of all the edges connected to node i, c i represents the community to which node i belongs; Representing the sum of the weights of all edges.
The Louvain algorithm is a modularity-based community discovery algorithm which is good in efficiency and effect, and can discover hierarchical community structures, and the optimization goal of the Louvain algorithm is to maximize the modularity of the whole community network. An embodiment of the disclosure utilizes a Louvain algorithm to achieve community partitioning between applications.
The community division flow between applications is realized by using Louvain algorithm as follows:
1) Each node in the graph is regarded as an independent community, and the number of communities is the same as the number of nodes;
2) For each node i, sequentially attempting to distribute the node i to communities where each neighbor node is located, calculating module degree change delta Q before and after distribution, recording the neighbor node with the largest delta Q, if max delta Q is more than 0, distributing the node i to communities where the neighbor node with the largest delta Q is located, otherwise, keeping unchanged;
3) Repeating the step 2) until communities to which all nodes belong are not changed;
4) Compressing the graph, compressing all nodes in the same community into a new node, converting the weight of edges between the nodes in the community into the weight of a new node ring, converting the weight of edges between the communities into the weight of edges between the new nodes, wherein the new node ring represents that after the graph is compressed, the points in the same community are taken as the new nodes, and the weight at the moment is the sum of the weights of the points in the same community.
5) Repeating 1) -4) until the modularity of the entire graph is no longer changed.
6) The node i with the largest k i (degrees) in each community is denoted as the community core, wherein k i=∑jAij represents the sum (degrees) of the weights of all the edges connected to the node i, and A ij represents the weight of the edge between the node i and the node j. The application community partitioning results are then returned as follows:
{ community 1 [ (community core, application a), (community member, application b), ];
community 2 [ (community core, application m), (community member, application n) ];
......}
the application community division result is obtained, key operation and maintenance guarantee can be carried out on core applications in the application communities, and the applications in the same community can be recommended to users using the community applications.
Fig. 2 is a schematic structural diagram of an excavating device applying association rules according to an embodiment of the present disclosure, where each functional module in the excavating device may be implemented in a software module manner or in a hardware unit manner. The functions of the modules of the device have corresponding relations with the steps in the mining method of the association rule between the applications provided by the implementation of the present disclosure. Each module in the apparatus may be executed on one hardware device, or one or more steps or module functions in the method provided in the disclosure may be respectively completed by different hardware devices. The device 200 comprises a grouping statistics module 201, a minimum support calculation module 202, an application screening module 203, a credibility calculation module 204 and an association rule determination module 205.
The grouping statistics module 201 is configured to perform grouping statistics with application usage information corresponding to a user identification code in a plurality of time periods of analysis duration, so as to obtain an application set used by each user in each time period;
A minimum support calculating module 202, configured to calculate a usage rate of each application in different time periods, and take an average value of the usage rates of each application in different time periods as a minimum support of each application;
The application screening module 203 is configured to calculate, for each time period, a support degree of each application in the time period, and screen out an application that is greater than a minimum support degree of the application in the time period, so as to form an item set L1;
the credibility calculation module 204 is configured to combine the items in L1 two by two to form a candidate item set C2 expressing two application precedence-order relationships, calculate the credibility of each item in C2, and obtain a credibility set L2 expressing two application precedence-order relationships;
The association rule determining module 205 is configured to screen out items from the L2, where the reliability from the previous application to the next application is greater than the product of the minimum support of the next application and a preset constant, to obtain a set R2 of association rules between expression applications.
To implement application community partitioning, in an embodiment of the present disclosure, the apparatus further includes:
the community division module 206 is configured to respectively use the front and rear applications of each item in the set R2 of association rules as points in the graph, and use the credibility of the front and rear applications as the weight of the edge between the two points to construct an application association graph;
in order to implement core application identification, in an embodiment of the present disclosure, the apparatus further includes a core application identification module 207, configured to calculate a sum of weights of edges of nodes corresponding to each application in each application community, and determine a core application of each application community according to the sum of weights of edges in each application community and a maximum application.
In order to filter out applications that are frequently used independently, in an embodiment of the present disclosure, the minimum support calculation module 202 is further configured to filter out application data with a usage rate greater than a preset threshold before calculating the minimum support of each application.
In another embodiment of the present disclosure, a storage medium is further provided, where the storage medium is located in a device having a processor and a bus structure, where the storage medium may be a volatile storage medium or a non-volatile storage medium, where a computer program is stored in the storage medium, where the computer program in the storage medium, when being read and executed by the processor, may be used to complete the step functions of the mining method for association rules between applications provided in the embodiments of the present disclosure.
The foregoing is merely exemplary of the present disclosure and is not intended to limit the present disclosure. Various modifications and variations of this disclosure will be apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present disclosure, are intended to be included within the scope of the claims of the present disclosure.

Claims (10)

1. A method for mining association rules between applications, the method comprising:
In a plurality of time periods of analysis duration, grouping statistics is carried out according to application use information corresponding to the user identification code, and an application set used by each user in each time period is obtained;
Calculating the use rate of each application in different time periods based on the obtained application set used by each user in each time period, and taking the average value of the use rate of each application in different time periods as the minimum support degree of each application;
For each time period, calculating the support degree of each application in the time period, and screening out the applications with the support degree greater than the minimum support degree per se in the time period to form a term set L1;
combining the items in the L1 in pairs to form a candidate item set C2 for expressing the application relation of the two applications, and calculating the credibility of each item in the C2 to obtain a credibility set L2 for expressing the application relation of the two applications;
The method comprises the steps of screening items from a previous application to a next application, wherein the credibility of the items is larger than the product of the minimum support degree of the next application and a preset constant, and obtaining a set R2 for expressing association rules between the applications;
Calculating the support degree of each application in the time period, wherein the calculation comprises the steps of calculating the proportion of the number of users using the application in the time period to the total number of users as the support degree of the application for each application;
The step of calculating the credibility of each item in the C2 comprises the step of calculating the support degree of the first application and the second application which are successively applied divided by the support degree of the first application which is used independently, wherein the first application is the application used first in the item, and the second application is the application used later in the item.
2. The method according to claim 1, wherein the method further comprises:
Respectively taking the front application and the rear application of each item in the association rule set R2 as points in the graph, taking the credibility of the front application and the rear application as the weight of the edge between the two points, and constructing an application association graph;
And carrying out community division on the application association graph by adopting a community division algorithm to obtain an application community.
3. The method according to claim 2, wherein the method further comprises:
and calculating the weight sum of the edges of the nodes corresponding to each application in each application community, and determining the core application of the application community by the application with the maximum weight sum of the edges in each application community.
4. The method of claim 1, wherein prior to said calculating the minimum support for each application, the method further comprises filtering out application data having a usage rate greater than a predetermined threshold.
5. The method of claim 2, wherein the step of determining the position of the substrate comprises,
The community division algorithm is a Louvain algorithm, and the preset constant is 1.4.
6. An apparatus for mining application of association rules between applications, the apparatus comprising:
the grouping statistics module is used for grouping statistics with application use information corresponding to the user identification codes in a plurality of time periods of analysis duration to obtain application sets used by each user in each time period;
The minimum support calculation module is used for calculating the use rate of each application in different time periods based on the obtained application set used by each user in each time period, and taking the average value of the use rates of each application in different time periods as the minimum support of each application;
the application screening module is used for calculating the support degree of each application in each time period according to each time period, and screening out the application with the support degree greater than the minimum support degree in the time period to form a term set L1;
the credibility calculation module is used for combining the items in the L1 in pairs to form a candidate item set C2 for expressing two application sequence application relations, and calculating the credibility of each item in the C2 to obtain a credibility set L2 for expressing two application sequence application relations;
The association rule determining module is used for screening items from the L2, the credibility of which from the previous application to the next application is greater than the product of the minimum support of the next application and a preset constant, and obtaining a set R2 for expressing association rules between the applications;
Calculating the support degree of each application in the time period, wherein the calculation comprises the steps of calculating the proportion of the number of users using the application in the time period to the total number of users as the support degree of the application for each application;
The step of calculating the credibility of each item in the C2 comprises the step of calculating the support degree of the first application and the second application which are successively applied divided by the support degree of the first application which is used independently, wherein the first application is the application used first in the item, and the second application is the application used later in the item.
7. The apparatus of claim 6, wherein the apparatus further comprises:
The community dividing module is used for respectively taking the front application and the rear application of each item in the association rule set R2 as points in the graph, taking the credibility of the front application and the rear application as the weight of the edge between the two points, and constructing an application association graph;
The core application identification module is used for calculating the weight sum of the edges of the nodes corresponding to each application in each application community, and determining the core application of the application community by the application with the largest weight sum of the edges in each application community.
8. The apparatus of claim 6, wherein the apparatus further comprises:
The minimum support calculation module is further configured to filter out application data with a usage rate greater than a preset threshold before the minimum support of each application is calculated.
9. The apparatus of claim 7, wherein the device comprises a plurality of sensors,
The community division algorithm is a Louvain algorithm, and the preset constant is 1.4.
10. A storage medium having a computer program stored therein, characterized in that the computer program in the storage medium, when being read and executed by a processor, is adapted to carry out the step functions of the method according to any one of claims 1 to 5.
CN202010612788.7A 2020-06-30 2020-06-30 A method, device and storage medium for mining association rules between applications Active CN111831706B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010612788.7A CN111831706B (en) 2020-06-30 2020-06-30 A method, device and storage medium for mining association rules between applications

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010612788.7A CN111831706B (en) 2020-06-30 2020-06-30 A method, device and storage medium for mining association rules between applications

Publications (2)

Publication Number Publication Date
CN111831706A CN111831706A (en) 2020-10-27
CN111831706B true CN111831706B (en) 2025-08-05

Family

ID=72900671

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010612788.7A Active CN111831706B (en) 2020-06-30 2020-06-30 A method, device and storage medium for mining association rules between applications

Country Status (1)

Country Link
CN (1) CN111831706B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022100009A1 (en) * 2020-11-13 2022-05-19 Guangdong Oppo Mobile Telecommunications Corp., Ltd. Method and system for dynamic categorization of applications in user devices
CN113330716B (en) * 2020-12-31 2023-05-12 山石网科通信技术股份有限公司 Method and device for determining dependency relationship of application service and processor
CN116049708A (en) * 2023-01-29 2023-05-02 中国银联股份有限公司 Association relation screening method and device based on atlas
CN116662673B (en) * 2023-07-28 2023-11-03 西安银信博锐信息科技有限公司 User preference data analysis method based on data monitoring
CN117891857B (en) * 2024-03-13 2024-05-24 广东工业大学 Data mining method and system based on big data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6301575B1 (en) * 1997-11-13 2001-10-09 International Business Machines Corporation Using object relational extensions for mining association rules
JP2012190061A (en) * 2011-03-08 2012-10-04 Sony Corp Information processor, terminal device, information presentation system, method for calculating evaluation score, and program

Family Cites Families (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101799810B (en) * 2009-02-06 2012-09-26 中国移动通信集团公司 Association rule mining method and system thereof
CN103368921B (en) * 2012-04-06 2016-08-10 三星电子(中国)研发中心 Distributed user modeling and method for smart machine
CN103020256B (en) * 2012-12-21 2016-04-20 电子科技大学 A kind of association rule mining method of large-scale data
CN103838804A (en) * 2013-05-09 2014-06-04 电子科技大学 Social network user interest association rule mining method based on community division
CN103341506B (en) * 2013-07-10 2015-03-11 鞍钢股份有限公司 Strip-shaped time series data mining method based on data pattern
CN107203566A (en) * 2016-03-18 2017-09-26 刘陈伟 Based on the correlation rule commending system for improving ORAR algorithms
CN105975608A (en) * 2016-05-17 2016-09-28 北京京东尚科信息技术有限公司 Data mining method and device
CN106126577A (en) * 2016-06-17 2016-11-16 北京理工大学 A kind of weighted association rules method for digging based on data source Matrix dividing
CN107730336A (en) * 2016-08-12 2018-02-23 苏宁云商集团股份有限公司 Commodity method for pushing and device in a kind of online transaction
CN107870934B (en) * 2016-09-27 2021-07-20 武汉安天信息技术有限责任公司 App user clustering method and device
CN106649479B (en) * 2016-09-29 2020-05-12 国网山东省电力公司电力科学研究院 A Transformer State Association Rule Mining Method Based on Probability Graph
CN107679895A (en) * 2017-09-21 2018-02-09 深圳市傲天科技股份有限公司 Method and device for screening target users and computer readable storage medium
CN107832416A (en) * 2017-11-08 2018-03-23 山东浪潮云服务信息科技有限公司 A kind of determination method and device of correlation rule
CN107943946B (en) * 2017-11-24 2019-08-30 重庆科技学院 Relevance Mining Method of Knowledge Points in Test Question Base Based on Apriori Algorithm
CN108052366B (en) * 2017-12-27 2021-11-02 努比亚技术有限公司 Application icon display method, terminal and storage medium
CN110020169A (en) * 2017-12-28 2019-07-16 北京京东尚科信息技术有限公司 A kind of method and apparatus of determining object dependencies
CN109062915B (en) * 2018-03-30 2020-11-17 山东管理学院 Method and device for mining positive and negative association rules of text data set
CN108830655A (en) * 2018-06-19 2018-11-16 郑州云海信息技术有限公司 A kind of user's operation Relation acquisition method and relevant apparatus
CN109299313A (en) * 2018-08-03 2019-02-01 昆明理工大学 A song recommendation method based on FP-growth
CN109815097A (en) * 2018-12-14 2019-05-28 中国平安财产保险股份有限公司 Function of application operation recommended method and system based on intelligent decision
CN110287382B (en) * 2019-05-30 2021-07-06 武汉理工大学 A mining method of association rules for battery production data
CN110334796A (en) * 2019-06-28 2019-10-15 北京科技大学 A method and device for mining association rules of social security events
CN110362755A (en) * 2019-07-23 2019-10-22 南京邮电大学 A kind of recommended method of the hybrid algorithm based on article collaborative filtering and correlation rule
CN110866047A (en) * 2019-11-13 2020-03-06 辽宁工程技术大学 Community discovery algorithm based on improved association rule
CN111158699A (en) * 2019-12-31 2020-05-15 青岛海尔科技有限公司 Application optimization method and device based on Apriori algorithm and intelligent equipment
CN111339427B (en) * 2020-03-23 2022-12-20 卓尔智联(武汉)研究院有限公司 Book information recommendation method, device and system and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6301575B1 (en) * 1997-11-13 2001-10-09 International Business Machines Corporation Using object relational extensions for mining association rules
JP2012190061A (en) * 2011-03-08 2012-10-04 Sony Corp Information processor, terminal device, information presentation system, method for calculating evaluation score, and program

Also Published As

Publication number Publication date
CN111831706A (en) 2020-10-27

Similar Documents

Publication Publication Date Title
CN111831706B (en) A method, device and storage medium for mining association rules between applications
CN111614690B (en) Abnormal behavior detection method and device
Ahmed et al. On sampling from massive graph streams
Amini et al. On density-based data streams clustering algorithms: A survey
CN104809108B (en) Information monitoring analysis system
WO2018014610A1 (en) C4.5 decision tree algorithm-based specific user mining system and method therefor
CN111506823A (en) Information recommendation method, device and computer equipment
CN110689368B (en) Method for designing advertisement click rate prediction system in mobile application
CN104809244B (en) Data digging method and device under a kind of big data environment
Ancy et al. Online learning model for handling different concept drifts using diverse ensemble classifiers on evolving data streams
CN112087316B (en) Network anomaly root cause positioning method based on anomaly data analysis
De Boom et al. Semantics-driven event clustering in Twitter feeds
CN112463859B (en) User data processing method and server based on big data and business analysis
Gaumont et al. Finding remarkably dense sequences of contacts in link streams
CN115392351A (en) Risky user identification method, device, electronic equipment and storage medium
CN114707685A (en) Event prediction method and device based on big data modeling analysis
CN111597399A (en) Computer data processing system and method based on data fusion
Aljeri Big data-driven approach to analyzing spatio-temporal mobility pattern
CN112488236B (en) Integrated unsupervised student behavior clustering method
CN113010884A (en) Real-time feature filtering method in intrusion detection system
Singh et al. Knowledge based retrieval scheme from big data for aviation industry
CN109582806B (en) Personal information processing method and system based on graph calculation
Barbucha et al. Detecting communities in organizational social network based on e-mail communication
Behera et al. Centrality approach for community detection in large scale network
CN111107493A (en) A mobile user location prediction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant