Detailed Description
The technical scheme provided by the application has the following overall thought:
The embodiment of the application provides a network information decision management method based on user flow statistical analysis. The edge processing units are dynamically activated by monitoring and analyzing the user traffic density of a target area in the network to optimize the allocation of network resources. When the traffic density exceeds a predetermined threshold, the system will read the user access characteristics and evaluate the dispersion of the content to determine and cache the hot content.
Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described in detail with reference to the accompanying drawings.
An embodiment, as shown in fig. 1, provides a network information decision management method based on user traffic statistics analysis, where the method includes:
and step 100, in the backbone network, periodically reading the user traffic density of the target area, and if the user traffic density is greater than a preset density threshold value, activating an edge processing unit of the target area, wherein the edge processing unit is in communication connection with the backbone network.
Specifically, a Backbone Network (Backbone Network) refers to a core Network that plays a role of primary data transmission in a communication system, and is responsible for a large amount of data transmission between different areas. It connects regional subnetworks with edge devices, ensuring that data can be efficiently and quickly transferred from one region to another. The target area refers to a specific geographical area or a user concentrated area in the network system where traffic monitoring and management is required. For example, a hot spot area may be a mall, a transportation hub, or a hot business, where sudden increases in user traffic tend to occur. The user traffic density is a measure of the number of users in a target area per unit time or the frequency of data access reflecting the user activity of that area. An increase in traffic density typically represents an increase in network demand for the area, e.g., a significant increase in the number of users in a mall during peak shopping hours, resulting in an increase in traffic density for the area. The predetermined density threshold refers to a critical value of the user traffic density preset by the system. The threshold value is used as a trigger condition, and when the flow density in the target area exceeds the threshold value, corresponding control measures, such as activating an edge processing unit to enhance the data processing capacity, are taken. An edge processing unit (Edge Processing Unit) refers to a device or node located at the edge of the network for processing data in proximity to the user. The method can share the burden of the backbone network and improve the data processing speed. Common edge devices include intelligent routers, small data centers, and the like.
In the backbone network, in order to timely cope with the sudden increase of user traffic in the target area, the user traffic density of the area is periodically monitored and read. Flow is typically analyzed deep using a flow monitoring system or flow detection device, such as using DEEP PACKET Inspection (DPI) tools, or flow data is monitored in real time via NetFlow protocols. And comparing the acquired flow density data with a preset density threshold. If the traffic density exceeds the threshold, the system triggers the edge processing unit to provide more direct data processing service for the target area, and avoids that all data needs to be transmitted back to the backbone network center for processing. The edge processing unit is connected with the backbone network through a high-speed communication link, so that data can be transmitted between the edge processing unit and the backbone network quickly.
According to the method, the system can share the processing pressure of the main network in the traffic peak period by dynamically activating the edge processing unit, so that the localized processing of data is realized, the bandwidth requirement of the main network is greatly reduced, the delay of a user in access is shortened, and the smoothness of user experience and the use efficiency of network resources are remarkably improved. The method is particularly suitable for areas with concentrated user traffic and large variation, can flexibly cope with the peak demand, and enhances the reliability and stability of network service.
And step 200, reading a user access feature set of a target area, performing content discrete evaluation according to the user access feature set, performing concentration analysis on the user access feature set if the content dispersion is smaller than or equal to a preset scalar, determining hot content, and caching the hot content to the edge processing unit.
Specifically, the user access feature set refers to a data set formed by counting and analyzing access behaviors of users in a specific area in a period of time, and the data set comprises access content, access frequency, access time and the like. These features may help identify user preferences and behavior patterns. The content discrete evaluation is an evaluation method for measuring the distribution condition of the access content in the user access characteristic set among different types or categories, and is generally realized by calculating the diversity and the dispersion degree of the access content. A lower dispersion means that the user has more concentrated access to the content, while a higher dispersion means that the user's interest in the content is more dispersed. The concentration analysis refers to deep analysis on the user access characteristic set, and identifies content with higher access frequency, so as to determine hot content. Common methods include frequency statistics, cluster analysis, and the like. Hot content refers to content that is frequently accessed by a user within a particular time, typically reflecting the user's current interests or needs. Such content is of great importance for content distribution and resource management. The edge processing unit (Edge Processing Unit) refers to a device located at the edge of the network, and is used for processing and buffering data, so as to reduce the burden of the backbone network and improve the response speed.
First, a user who needs to read a target area accesses a feature set, and data collection and arrangement are generally performed by means of a data collection tool, such as a Web analysis tool (e.g., google analysis, adobe analysis, etc.). The content, frequency and time of user access can be recorded to form an exhaustive set of access characteristics.
And then, carrying out content discrete evaluation according to the acquired user access characteristic set, and calculating the dispersion of the access content. The calculation of the dispersion may be achieved by counting the diversity of the content accessed by the user, for example, by evaluating the degree of dispersion of the content by calculating the standard deviation or variance of the content access. If the content dispersion is less than or equal to the predetermined scalar, the content dispersion represents that the interest of the user in the content is relatively concentrated, and concentration analysis is further performed. Specifically, the system clusters or classifies content with higher access frequency, for example, through a K-Means clustering algorithm, and determines hot content.
Once the hot content is identified, the system caches the content to the edge processing unit in order to quickly respond to the user's access request. For example, in a news website, assuming that a news article is accessed by a large number of users in a short time, the system determines that the article is hot content through discrete evaluation and concentration analysis, and caches the hot content to an edge processing unit, so that other users can quickly acquire the content without requesting from a backbone network when accessing the content.
In the step, the user access feature set is read, and discrete evaluation and concentration analysis are carried out, so that the system can rapidly identify and cache the hot content, and the response speed and experience of user access are greatly improved. Especially in the scene that the user demand changes rapidly, the data transmission time can be effectively reduced, and the fluency of content access is improved. In addition, the method can reduce the burden of the backbone network, optimize the resource allocation, and ensure that the whole network system can still stably run under the high-load condition.
And step 300, if the content dispersion is larger than the preset scalar, reading an associated access feature set of an associated area, carrying out concentration analysis on the associated access feature set, determining associated hot spot content, and caching the associated hot spot content to the edge processing unit.
Specifically, the predetermined scalar is a threshold value of content dispersion, and is used to determine the concentration or dispersion of the user access content. If the content dispersion is greater than the predetermined scalar, the accessed content of the area is indicated to be more dispersed, and the system needs to introduce the accessed data of the associated area for further analysis. An associated region refers to a region that has a certain correlation with a target region in terms of geographic location or content demand characteristics. The set of access characteristics of the associated region may provide a reference to the target region to help identify the hot content. The associated hot content is content which has higher access frequency and more concentrated user demands in the associated area, and can provide reference for content recommendation of the target area.
First, the content dispersion of the target area is detected. If the content dispersion is greater than the predetermined scalar, which indicates that the user accesses the content in the target area more dispersedly, the system needs to introduce access data of the associated area to supplement analysis. At this time, the system reads the associated access feature set of the associated area, that is, the user access data of the area, including the content frequently accessed by the user and the access frequency thereof. The associated region may be determined based on geographic proximity or user demand similarity
Next, the system performs a concentration analysis on the set of associated access features. In this analysis, the system will aggregate the content accessed by the users of the associated region and calculate their access frequency, defining the high frequency content as the hot content of the associated region. For example, in the access data of the associated area, if "local news" and "hot movie shows" are contents that are accessed more frequently, then these contents are identified as associated hot content.
The associated hot content is then cached to the edge processing unit of the target area in order to more quickly respond to the user's access needs. For example, a Wi-Fi network of a shopping mall may, upon detecting a dispersion in its user access needs, cache with reference to hot content (e.g., promotional information or popular activities for a class of merchandise) of a neighboring shopping mall. In this way, the system can enhance the user experience by providing hot content of the associated region even if the user demand is not concentrated in the target region.
By referring to the hot content of the associated area under the condition of high content dispersion, the system realizes dynamic expansion of the demand data, and can provide content recommendation meeting the user interest even in the area with scattered demand. The method not only improves the utilization rate of system resources, but also improves the access experience of users, so that the edge processing unit can refer to the demand trend of the adjacent area to distribute the content when the demand of the users is ambiguous, and the content response efficiency is optimized.
Step S400, using the edge processing unit, user access compensation of the target area is performed.
Specifically, the user access compensation refers to providing content through an edge processing unit under the condition that the user demand is high or the network resources of the target area are insufficient, so as to compensate the deficiency of the network and ensure the smoothness of the user access experience. For example, in case of network congestion, part of the data request is undertaken by the edge device.
Firstly, user traffic density data of a target area is acquired through a backbone network or a traffic monitoring tool, and whether the area needs to enable an edge processing unit to perform access compensation is judged. When the user demands frequently, the system can dynamically update the hot content in the edge node based on the real-time user traffic density. Specifically, the system will set the update frequency based on the user traffic density. Assuming that the user traffic density is high, meaning that the user requests frequently, the system will increase the update frequency. For example, the hot content is refreshed every 10 minutes, ensuring that the edge cached content is the most recently interesting content to the user. This may be accomplished by an automated script or algorithm-based update mechanism, with specific methods such as caching replacement policies (e.g., LFUs or LRUs) to determine which content is preferentially updated and replaced.
By executing user access compensation at the edge processing unit, the system significantly improves the efficiency of content distribution and the response speed of user access, and particularly has more obvious effect in the peak period of user traffic. The dynamic updating of the hot content ensures that the user obtains the current high-demand content, effectively improves the user experience, and reduces the bandwidth pressure of the backbone network.
Further, acquiring a preset density threshold value comprises the steps of acquiring a real-time load proportion of a backbone network, correcting an initial density threshold value according to the real-time load proportion, and outputting the preset density threshold value, wherein the preset density threshold value and the real-time load proportion are in negative correlation.
In particular, real-time load proportion refers to the current usage or load situation of the network backbone, typically expressed as a percentage of the usage of the network bandwidth. For example, if the bandwidth of the backbone network is 100 Mbps, and 60 Mbps is currently used, the real-time load ratio is 60%. This index is used to evaluate the operating state of the network and the available resources. The initial density threshold refers to a reference value for the user traffic density that is set at the beginning of network operation or when there is no real-time monitoring data. This value is typically estimated based on historical data or expected flow conditions as a starting reference for determining flow conditions.
First, the real-time load proportion of the backbone network needs to be obtained. Network bandwidth usage is tracked in real-time and real-time load data is provided by network monitoring tools, such as network performance monitoring software (e.g., solarWinds, nagios, etc.). The initial density threshold will then be corrected based on the obtained real-time load ratio. Specifically, when the real-time load proportion is higher, the system dynamically increases the preset density threshold value to avoid that too many users rush into a target area under the condition of high load so as to cause network congestion, and conversely, when the load proportion is lower, the preset density threshold value can be reduced to encourage more users to access. For example, if the initial density threshold is set at 500 user/square kilometers, but the real-time load ratio reaches 80%, the predetermined density threshold is adjusted to 400 user/square kilometers, thereby reducing the load pressure. Conversely, when the network load is low (e.g., 40%), the predetermined density threshold is raised to 600 users/square kilometer.
The network management system can more accurately adapt to different network load conditions by monitoring and dynamically adjusting the preset density threshold in real time, and the management of user traffic is optimized. When the network load is higher, the system can effectively prevent the network from being overloaded and ensure the stability of the access of the user, and when the load is lower, the system can maximize the resource utilization rate and promote the user experience. The method not only can improve the overall network performance, but also can obviously reduce the potential network bottleneck risk and improve the service quality and the user satisfaction.
Further, performing content discrete evaluation according to the user access characteristic set, wherein the user access characteristic set comprises access content and access frequency in a target area in a preset time zone, performing frequency identification on the access content according to the access frequency to determine a user access content set, performing content discrete evaluation on the user access content set, and outputting the content dispersion.
Specifically, the predetermined time zone refers to a specific time period set by the system, during which user access behaviors in the target area are collected and analyzed in a concentrated manner. The predetermined time zone may be different time periods such as an early peak, a lunch time, a late peak, etc., for capturing user behavior characteristics for a particular time period. The user access characteristics refer to descriptions of the access behavior of the user to the content within a specific time, including access content (specific information that the user browses or clicks) and access frequency (the number or frequency of times the user accesses the content). These features reflect the interests or needs of the user during a particular period of time. The ladder weight is a hierarchical weighting mode, and the user access content is divided into different weight levels according to different access frequencies. For example, content with higher access frequency is given higher weight, and conversely, lower weight, so that popularity of content is more clearly distinguished. The frequency identification refers to the process of classifying and marking the access content according to the access frequency, and helps the system to determine the hot spot condition of the user access content set. By this identification, the access content can be ranked according to popularity. Content dispersion is an indicator of how scattered a user accesses content in terms of category, topic, or genre, and is typically derived by calculating the variability between content. A low dispersion indicates that the user accesses the content set and, conversely, indicates that the user is interested in dispersing.
Firstly, a user access characteristic set of a target area in a preset time zone is read, and user access characteristics such as access content, access frequency and the like are extracted. This may be accomplished by a data collection tool or a log analysis tool (e.g., google analysis or Splunk) that is capable of collecting and collating user access data for a specific period of time within the target area.
Next, the content is subjected to hierarchical processing according to the frequency of user access, and a ladder weight is set. For example, if a content is accessed more frequently in the time zone, the system may give it a higher weight to highlight the popularity of the content. Conversely, less frequently accessed content is given a lower weight. With such a ranking, it may be further determined that the user has access to the primary hot and cold content in the content set.
Then, the system performs frequency identification on the content, marks the content with higher access frequency as high-frequency content and marks the content with lower access frequency as low-frequency content, so that a clear user access content set is formed. In this way, the system can more intuitively understand the needs of the user for content at different frequencies.
And finally, performing content discrete evaluation on the user access content set, and calculating the content dispersion. Such evaluation is typically achieved by statistical dispersion indicators (e.g., variances). If the dispersion is low, indicating that the user has more concentrated access to the content, the system can identify these high frequency content as hot content.
For example, during a predetermined time zone "late peak" of an e-commerce web site, the system reads the user access feature set, finds that the access frequency of "winter jacket" is significantly higher than other merchandise, and then gives the content a higher weight. This high frequency content is then marked out and the content dispersion calculated. If the dispersion is low, the user is indicated to have concentrated demands for 'winter jackets' during late peak periods, and the system can identify the content as hot content, so that the subsequent caching and distribution optimization is facilitated.
By the evaluation mode based on the frequency and the dispersion, the required content of the user in a specific period can be accurately identified, and whether the content has consistency or not can be determined. The method not only can effectively analyze the behavior characteristics of the user, but also can rapidly locate the hot content when the dispersion is low, thereby more accurately distributing the content. Finally, the method improves the accuracy of content recommendation and the use efficiency of system resources, and optimizes the user access experience.
Further, performing content discrete evaluation on the user access content set, and outputting the content dispersion, wherein the content discrete evaluation comprises the steps of distributing the user access content set in a preset three-dimensional space according to preset features, wherein the preset features comprise access frequency, content category and timeliness, setting the data quantity of a statistic unit grid area as unit data density in the preset three-dimensional space, determining a plurality of unit data densities, performing variance calculation on the plurality of unit data densities, and setting a variance calculation result as the content dispersion.
Specifically, the preset features are analysis dimensions selected by the system when evaluating the content accessed by the user, including the frequency of access (the number of times the content is accessed), the category of the content (the subject or category to which the content belongs, such as science and technology, news, etc.), and timeliness (the release time or relevance duration of the content). These features are used to fully characterize the access behavior of the user. The predetermined three-dimensional space is a multi-dimensional data analysis model, and the user access content set is distributed in a virtual three-dimensional coordinate system according to three preset characteristics (access frequency, content category and timeliness) so as to facilitate subsequent data distribution and density analysis. The unit data density represents the degree of data concentration within each unit mesh region, i.e., the number or frequency of user accesses to content within the region, in a predetermined three-dimensional space. Reflecting the concentration of user interests in a particular area.
Firstly, according to preset characteristics, user access content sets are distributed in a preset three-dimensional space. This space uses three dimensions of access frequency, content category and timeliness as axes. For example, the user accessing content for a news website may be mapped to a three-dimensional space, where the "access frequency" axis represents the number of accesses for a content, the "content category" axis represents the subject matter (e.g., sports, entertainment, science, etc.) of the content, and the "timeliness" axis represents the time of release or duration of relevance of the content.
Next, a plurality of unit cell areas are divided in this three-dimensional space, and the number of data in each unit cell is counted, which is set as a unit data density. For example, a grid region is mainly "access frequency is high, technology category is more recent" content, the data density is larger, and another grid region is smaller because the grid region contains "access frequency is low, entertainment category is more past due content". Such partitioning may be accomplished by a spatial grid partitioning algorithm, such as a K-D tree or other three-dimensional data segmentation algorithm, for partitioning and calculating the density of each region in three-dimensional space.
Subsequently, variance calculation is performed on the unit data densities of all the unit mesh areas to quantify the degree of dispersion of the content. The larger the variance, the larger the access frequency, category and timeliness difference among different contents, namely the content distribution is more scattered, and the smaller the variance, the more concentrated the user access content on the characteristics.
For example, in the access analysis of the e-commerce platform, the access content of the user in a certain predetermined time zone is distributed to the three-dimensional space according to three characteristics of "access frequency, commodity category and time to market". Assuming that the high-frequency access of the "smart phone" category is concentrated on several models which are newly marketed, and the "home product" category is scattered in multiple sub-categories and the access frequency is lower, the unit data density of the smart phone grid is higher and the density of the home product is lower. Finally, the system recognizes that the attention points of the user in the period are concentrated on the smart phone through the dispersion obtained through variance calculation.
By evaluating the dispersion of the user's access to the content set, the system can determine whether the user's access needs are concentrated. The low dispersion indicates that the focus of the user is concentrated, the system can buffer or recommend the high-demand content preferentially, the user access experience and the network resource utilization rate are improved, and the high dispersion prompts the content demand dispersion, so that the optimization of accurate content recommendation is facilitated. In the network information management, the method improves the pertinence and the efficiency of content distribution and simultaneously reduces the burden of a backbone network.
Further, the user access characteristic set is subjected to concentration analysis to determine hot content, category clustering is conducted on the user access characteristic set to determine a plurality of user access category sets, total access frequencies of the plurality of user access category sets are counted, an access category sequence is obtained according to the arrangement from large to small, the first N categories of the access category sequence are selected to be set as hot categories, N hot categories are determined, N is an integer greater than 1, the first M access content with the largest access frequency ratio in the N hot categories is selected to be set as access hot, and the hot content is obtained.
Specifically, category clustering is a data grouping technique that groups user access content by similar attributes (e.g., content subject matter or type). For example, on a news website, content may be categorized into categories such as "sports," "entertainment," "science and technology. The clustering help system extracts main access categories from the user access feature set, so that subsequent hot content analysis is facilitated. The access category sequence is the result of the user access category set arranged from high to low according to the access frequency. It may help identify the categories that are most interesting to the user and determine the ranking. The hotspot category is the top N categories with the highest access frequency, and represents the content category in the user access volume set. These categories reflect the main direction of user demand. The access hotspots are the first M specific contents with the highest access frequency ratio in the hotspot category. They may further help the system identify the specific content that is of most interest to the user, providing support for content recommendation.
First, category clustering is performed on a user access content set, and similar contents are clustered into one category. The clustering algorithm can be realized, for example, by K-means, DBSCAN or hierarchical clustering method. Then, the total access frequency in each user access category set is counted, and an access category sequence which is arranged according to the access frequency from large to small is generated. By this ordering, the system can identify the category of highest user interest. The top N categories are selected from the sequence as hotspot categories (where N is greater than 1) that collectively reflect the primary focus of the user.
Thereafter, in each hotspot category, the system further analyzes the access frequency duty ratio, and selects the content with the highest access frequency from the first M (where M is greater than 1) as the access hotspot. For example, in the "science and technology" category, the user has the highest frequency of accessing the "smart phone release" and "AI technology progress" contents, and then the two contents become access hotspots of the category, reflecting the requirements of the user for the specific contents.
This step can help the system accurately identify the user's hotspot needs. By determining the hot spot category and accessing the hot spot, the system can optimize content distribution and caching strategies, and the response speed and recommendation accuracy of the user request are improved. Finally, the method realizes the optimization of resource utilization and improves the user experience.
Further, as shown in fig. 2, determining an association region comprises the steps of obtaining a plurality of subareas in a target area, obtaining a position coordinate set, a region type set and a plurality of historical user access characteristic sets of the subareas, randomly selecting a first subarea in the subareas, obtaining a first position coordinate of the first subarea, a first region type and a first historical user access characteristic set, respectively performing type association degree analysis according to other region types of the first region type and the region type set, determining a plurality of type association degrees, respectively performing access characteristic association degree analysis according to the first historical user access characteristic set and the plurality of historical user access characteristic sets, determining a plurality of access association degrees, screening the plurality of type association degrees and the plurality of access association degrees, determining a plurality of initial association regions, setting the subareas with the type association degrees meeting a type association threshold and the access association threshold as initial association regions, screening the plurality of initial association regions according to the position coordinate sets, determining a preset number of first association regions, respectively performing access characteristic association degree analysis according to the first historical user access characteristic sets, respectively selecting a first subarea with a larger distance, and setting a mapping table according to the first association table and the first association region, and determining the target association region.
Specifically, a sub-region refers to a smaller region divided within a target region. Each sub-region has different characteristics and user behavior. For example, a city may be divided into multiple sub-areas, such as business, residential, and industrial areas. The location coordinate set refers to geographical coordinate information associated with each sub-area. It is usually expressed in terms of latitude and longitude and is used to describe the specific location of a sub-region. The region type set refers to classification information for each sub-region, such as a business region, a residential region, a leisure region, and the like. The region type can reflect the main functions and characteristics of the sub-region, and influence the access behavior of the user. The historical user access characteristic set refers to an access record of a user to specific content in a certain time period, and the access record comprises information such as access frequency, access time, access content and the like. These features may help the system learn of the user's access habits. Type association analysis refers to the process of determining similarity or correlation by evaluating the relationship between different region types. This analysis helps identify other regions associated with the target region. The access association analysis is to analyze the access characteristics of the historical users to determine the similarity or the difference of the access behaviors of the users among different areas, so as to identify the areas with stronger correlation. The region mapping table is a structured data table for recording the relationship between the target region and the associated region, and can help the system to perform region matching and quick query.
First, a plurality of subareas in a target area are acquired, and a position coordinate set, an area type set and a plurality of historical user access characteristic sets of each subarea are collected. Providing a basis for subsequent analysis. For example, the sub-regions of city a include business regions, residential regions, etc., each having corresponding coordinates and access characteristics.
Next, the system randomly selects a first sub-region from the plurality of sub-regions and obtains a first location coordinate of the region, a first region type, and a first historical user access characteristic set. Assuming that the selected first sub-region is a business region, the system will extract its latitude and longitude information, type (e.g., business region), and user access data.
The system then performs a type association analysis, i.e., compares the first region type with other region types to determine its similarity. For example, if the first sub-area is a "business area" and there are multiple "residential areas" and "leisure areas" around, the system will evaluate the association of these areas with the business area and determine multiple types of association similar to the target area type.
And meanwhile, carrying out access association degree analysis on the first historical user access characteristic set and all the historical user access characteristic sets, and evaluating the similarity of user access behaviors between the first historical user access characteristic set and all the historical user access characteristic sets. For example, if a first sub-region user accesses certain electronic products frequently, and the access behavior of other region users is similar, the two regions may be considered to have a higher degree of access association.
Next, the system will screen out a number of initial association areas, which are sub-areas where both the type association and the access association meet a set threshold. For example, if the type association exceeds 0.8 and the access association exceeds 0.7, then these regions will be considered as acceptable initial association regions.
And finally, screening the plurality of initial association areas according to the position coordinate set by the system, and determining that the area closer to the first sub-area has higher priority so as to select a preset number of first association areas. By constructing the region mapping table, the system inputs the target region into the table and performs matching to determine the final associated region.
This approach enables the system to refer to the user's behavior patterns in similar regions during content recommendation and data processing by effectively determining the associated regions. By aggregating the access characteristics of different areas, the system can more accurately identify the user demands and rapidly provide the content conforming to the user interests under the condition of insufficient content supply of the target area, thereby improving the user experience and optimizing the resource utilization efficiency.
Further, determining a plurality of access association degrees, including randomly selecting a second subarea, acquiring a second historical user access feature set of the second subarea, determining a plurality of historical time periods according to a preset time step, clustering the first historical user access feature set and the second historical user access feature set according to the plurality of historical time periods, performing content coincidence analysis and flow density coincidence analysis on a plurality of clustering results, determining a plurality of content coincidence degrees and a plurality of density coincidence degrees, counting the number of clustering results that the content coincidence degrees meet a preset coincidence scalar and the density coincidence degrees meet a preset density coincidence scalar, setting the number as the second access association degrees, and adding the second access association degrees to the plurality of access association degrees.
Specifically, the second subarea is a target area for comparison analysis with the first subarea, so that similarity calculation of access characteristics between different areas is facilitated. The historical user access feature set refers to an access record of the user in a specific time period, and includes information such as the type, frequency and time of the accessed content, and is used for describing the access preference of the user. The first historical user access feature set and the second historical user access feature set correspond to access feature sets of the first sub-region and the second sub-region, respectively. The predetermined time step is a time interval for dividing the historical access data into a plurality of time periods, and fine-grained time period analysis is convenient. For example, data sampling is performed in hours every day so as to observe the access pattern for each period. The content overlap ratio indicates the degree to which the types of content accessed by the users of the first sub-region and the second sub-region overlap in the cluster analysis. The flow density coincidence degree refers to the coincidence degree of the access volume concentration degree of users in different areas in a specific time period, so as to measure the similarity of the access behaviors of the users. The coincidence scalar is a set value for judging whether the coincidence degree of the content or the flow density reaches the standard. Only if the overlap ratio is higher than this scalar is it considered to have a higher correlation.
First, a second subarea is randomly selected from a plurality of subareas, and a second historical user access characteristic set of the area is obtained. Assuming that the first sub-area is a business area and the second sub-area is a neighboring residential area, the access feature set of these areas will contain information such as specific content accessed and access frequency.
Next, the system will divide the set of access features into a plurality of time periods at predetermined time steps (e.g., hourly or daily) for finer granularity of the comparative analysis. During each time period, the system performs clustering operation on the first historical user access characteristic set and the second historical user access characteristic set respectively. For example, k-means clustering algorithms may be used to divide access to similar content into a set and divide data for different time periods into multiple clustering results.
After the clustering results are obtained, the system performs content coincidence analysis and flow density coincidence analysis on each clustering result. In content duplication analysis, the system compares the user access content of the first and second subareas, and counts the overlapped access types. For example, if users in both regions frequently access video streaming media content, the content overlap ratio is high, whereas in the traffic density overlap analysis, the system compares the access volume intensity of different regions in the same time period. If the access densities of the two regions are similar during peak hours, the flow density overlap ratio is higher.
The system counts the clustering result meeting the condition, namely the clustering quantity that the content coincidence degree is higher than the preset coincidence scalar and the flow density coincidence degree is higher than the preset density coincidence scalar, and defines the second access association degree. The relevance value is added to a plurality of access relevance sets and is used for measuring the similarity between two sub-areas.
This step enables identification of user access behavior similarity between adjacent regions by analyzing the overlap ratio of access characteristics and traffic density between different regions. The method ensures that the system can make decisions according to the similarity of the associated areas when content distribution or network resource allocation is carried out. For example, when the user access content requirement of the commercial district increases, the residential district content characteristics with high relevance can be referred to cache hot content in advance and optimize resource allocation, so that the accuracy of content recommendation is improved, the user experience is improved, and the overall resource utilization of the system is improved.
Further, user access compensation of the target area is performed, and then content updating frequency is set according to the user traffic density, wherein the content updating frequency and the user traffic density are positively correlated, and hot content updating of the edge processing unit is performed based on the content updating frequency.
Specifically, the content update frequency refers to the update frequency of the cache content in the system. The higher the update frequency, the faster the content refresh to ensure that the user gets up to date hot content. In this scheme, the frequency of content update is positively correlated with the user traffic density, i.e., the denser the user traffic, the higher the frequency of content update. Hot content update of an edge processing unit refers to that a system dynamically updates content with large user demand in an edge node (such as an edge server or edge cache device) so as to ensure that real-time and hot data service is provided in a high-traffic area. For example, short video or news content currently most popular is refreshed periodically so that the user can obtain the latest content without accessing the central server.
And dynamically setting the frequency of content updating according to the user traffic density of the target area so as to optimize the updating strategy of the hot content in the edge processing unit. When the system monitors that the traffic density of the user increases, the content updating frequency increases, so that the user can acquire the latest hot content under the high access frequency. Typically with automated scripts and content update algorithms, the system will set different update frequencies based on the traffic density threshold. For example, using LFU (least used) or LFU-K (least used-K frequency) cache replacement algorithms, evaluating and replacing content in the edge cache, preferentially retaining content with high access frequency. The algorithms can effectively screen out content with large access to users and update the content preferentially.
For example, in a certain traffic-dense subway station, a user frequently accesses hot spot videos of a certain short video platform. When the system detects that the traffic density of the subway station is abnormally high, the hot spot video updating frequency in the edge node is automatically adjusted, so that the latest and hottest video content is buffered in the area. If the user traffic continues to increase, the edge node further improves the content update frequency, so that the user can acquire the latest hot spot video at any time.
By dynamically adjusting the content update frequency according to the traffic density of the user, the content service efficiency of the edge node can be remarkably improved, and particularly when the user accesses densely, the user can acquire the latest hot content in time. The dynamic updating mechanism effectively reduces the load pressure of the backbone network and improves the user experience.
In summary, the network information decision management method based on user traffic statistical analysis provided by the embodiment of the application has the following technical effects:
1. By periodically reading the user traffic density of the target area, when the density exceeds a predetermined threshold, the edge processing unit is activated, ensuring that the high traffic area is preferentially supported for more efficient content processing. The dynamic starting mechanism optimizes resource allocation, and can buffer hot content through the edge nodes under the condition of not increasing the burden of a backbone network, thereby improving the access speed and experience of users, reducing the pressure of a central server and realizing the distributed management of network information.
2. The initial density threshold value is adjusted through the real-time load proportion, and the preset density threshold value is dynamically generated, so that the network resource allocation can flexibly cope with load change. When the load of the backbone network is increased, the threshold value is automatically reduced, and the edge node is rapidly activated to provide service, so that the user response speed of a high-density area is improved, and the load self-adaption capability and efficiency of the system are enhanced under the condition that the overall delay is not increased.
3. By performing discrete evaluation of content and calculating the content dispersion in a three-dimensional space, the system can intuitively analyze the diversity of the user access behaviors. After the dispersion is determined through variance calculation, the hot content with high concentration degree can be identified and cached in priority, and the access response speed of the hot content is improved. Meanwhile, unnecessary repeated storage of the content is avoided, and the storage efficiency of the edge processing unit is optimized.
4. The content updating frequency is dynamically set according to the user traffic density, so that the content updating speed is improved in the peak period of the user traffic, and the content accessed by the user always keeps real-time. The positive correlation ensures the timeliness of the hot content and the content availability of the high-flow area, effectively improves the user experience, avoids access delay caused by untimely updating of the content, and improves the overall response speed and the resource utilization efficiency of the system.
Any of the steps of the methods described above may be stored as computer instructions or programs in a non-limiting computer memory and may be called by a non-limiting computer processor to identify any method for implementing an embodiment of the present application, without unnecessary limitations.
Further, the first or second element may not only represent a sequential relationship, but also represent a specific concept, and/or may be selected individually or in whole among a plurality of elements. It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the scope of the application. Thus, the present application is intended to include such modifications and alterations insofar as they come within the scope of the application or the equivalents thereof.