[go: up one dir, main page]

CN118363527B - Distributed storage-based data intelligent management method and system - Google Patents

Distributed storage-based data intelligent management method and system Download PDF

Info

Publication number
CN118363527B
CN118363527B CN202410398097.XA CN202410398097A CN118363527B CN 118363527 B CN118363527 B CN 118363527B CN 202410398097 A CN202410398097 A CN 202410398097A CN 118363527 B CN118363527 B CN 118363527B
Authority
CN
China
Prior art keywords
information
data block
data
distributed storage
acquiring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410398097.XA
Other languages
Chinese (zh)
Other versions
CN118363527A (en
Inventor
张腾
谢作斌
怀丹阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Ai Rui Good Technology Co ltd
Original Assignee
Shenzhen Ai Rui Good Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Ai Rui Good Technology Co ltd filed Critical Shenzhen Ai Rui Good Technology Co ltd
Priority to CN202410398097.XA priority Critical patent/CN118363527B/en
Publication of CN118363527A publication Critical patent/CN118363527A/en
Application granted granted Critical
Publication of CN118363527B publication Critical patent/CN118363527B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data intelligent management method and system based on distributed storage, which relate to the technical field of data storage and comprise the steps of acquiring data information and distributed storage partition information, acquiring data block storage planning information according to data block hot spot data information and distributed storage node information, and dynamically adjusting distributed storage according to distributed storage node load information. The invention improves the intelligentization degree of data management by classifying standard data, improves the distributed storage efficiency of data through the hot spot index of the data block, ensures the access efficiency of high-frequency access data, improves the response rate of the data block copy while accessing the data block copy when ensuring the data block fault through the node matching index of the data block copy, dynamically adjusts the distributed storage through the load evaluation index of the distributed storage node, and avoids the overhigh node load and the influence on the access speed of the hot spot data.

Description

Distributed storage-based data intelligent management method and system
Technical Field
The invention relates to the technical field of data storage, in particular to an intelligent data management method and system based on distributed storage.
Background
With the advent of the big data age, the data volume has been increased explosively, and the traditional network storage system adopts a centralized storage server to store all data, and the storage server becomes a bottleneck of system performance, is also a focus of reliability and security, and cannot meet the requirements of large-scale storage application. Distributed storage is the decentralized storage of data on multiple independent devices. The distributed network storage system adopts an expandable system structure, utilizes a plurality of storage servers to share the storage load, and utilizes the position servers to position the storage information, thereby improving the reliability, availability and access efficiency of the system and being easy to expand.
The distributed storage technology is widely applied to the processing and management of mass data, but the intelligent degree of data management is limited, and manual intervention is still needed for configuration and management. Therefore, a method and a system for intelligently managing data based on distributed storage are needed to realize automatic optimization and management of data.
At present, the data management of distributed storage in the market also has the problems that the data cannot be accurately classified, the hot spot data in the data cannot be selected according to the access information of the data, the hot spot data cannot be distributed to the node with high response speed, the access efficiency is reduced, the distributed storage cannot be dynamically adjusted according to the load of the distributed storage node, the spare resources of the distributed storage node cannot be utilized, and the resource waste is caused.
Disclosure of Invention
In order to solve the technical problems, the technical scheme solves the problems that the data cannot be accurately classified, hot spot data in the data cannot be selected according to access information of the data, so that the hot spot data cannot be distributed to nodes with high response speed, the access efficiency is reduced, the distributed storage cannot be dynamically adjusted according to the load of the distributed storage nodes, the spare resources of the distributed storage nodes cannot be utilized, and the resource waste is caused.
In order to achieve the above purpose, the invention adopts the following technical scheme:
A data intelligent management method based on distributed storage comprises the following steps:
Acquiring data information, wherein the data information comprises data attribute information and data characteristic information;
acquiring distributed storage partition information, wherein the distributed storage partition information comprises distributed storage buffer layer information and distributed storage node information;
Acquiring data block information based on data classification according to the data information;
Acquiring data block access information according to the data block information, wherein the data block access information comprises data access mode information and data access frequency information corresponding to the data access mode;
Acquiring hot spot data information of the data block according to the access information of the data block;
Acquiring data block storage planning information according to the data block hot spot data information and the distributed storage node information;
storing the data in a distributed mode according to the data block storage planning information;
acquiring distributed storage node load information, wherein the distributed storage node load information comprises distributed storage node load state information, distributed storage node load spare resource information and distributed storage node response speed information;
Dynamically adjusting the distributed storage according to the load information of the distributed storage nodes;
acquiring response information of the distributed storage nodes;
acquiring data response fault information based on a response time threshold according to the distributed storage node response information;
Acquiring node information of the data copy according to the data response fault information;
And outputting response data according to the node information of the data copy.
Preferably, the acquiring the data block information based on the data classification according to the data information specifically includes:
Acquiring data attribute information according to the data information, wherein the data attribute information comprises data type information and data format information;
according to the data attribute information, unifying data formats of the data to obtain corrected data information;
obtaining duplication removal data information based on a hash duplication removal method according to the correction data information;
obtaining duplication removal data missing information according to the duplication removal data information;
Acquiring a data missing threshold based on the data distributed storage requirement;
Judging whether the deduplication data missing information exceeds the data missing threshold according to the deduplication data missing information and the data missing threshold, if so, the deduplication data does not accord with the distributed storage standard, and if not, according to the deduplication data information, based on data standardization, obtaining standard data information;
Acquiring standard data characteristic information according to standard data information, wherein the standard data characteristic information comprises standard data keyword information and standard data timestamp information;
and classifying the standard data according to the standard data characteristic information to obtain data block information.
Preferably, the obtaining the data block storage planning information according to the data block hot spot data information and the distributed storage node information specifically includes:
Classifying the data blocks based on the data block hotspot indexes according to the data block hotspot data information to obtain data block classification information;
acquiring first hot spot data block information according to the data block classification information;
Acquiring distributed storage buffer layer information according to the distributed storage partition information;
Distributing the first hot spot data block to a distributed storage buffer layer according to the first hot spot data block information and the distributed storage buffer layer information, and obtaining buffer layer storage planning information;
acquiring second hot spot data block information and third hot spot data block information according to the data block classification information;
according to the second hot spot data block information and the third hot spot data block information, ordering the data blocks based on the order of the hot spot indexes of the data blocks from big to small, and obtaining data block ordering information;
planning the data block storage according to the data block ordering information to obtain node storage information;
and acquiring data block storage planning information according to the distributed storage buffer layer information and the node storage information.
Preferably, the classifying the data blocks based on the data block hotspot indexes according to the data block hotspot data information to obtain data block classification information specifically includes:
Acquiring a data block hotspot index according to the data block hotspot data information;
Acquiring a first threshold value of a data block hotspot index and a second threshold value of the data block hotspot index based on the data block access requirement;
classifying the data blocks according to the data block hotspot indexes, the first data block hotspot index threshold and the second data block hotspot index threshold to obtain data block classification information;
If the data block hotspot index is higher than the first threshold value of the data block hotspot index, dividing the data block into first hotspot data blocks;
if the data block hotspot index is lower than the first data block hotspot index threshold and higher than the second data block hotspot index threshold, dividing the data block into second hotspot data blocks;
if the data block hotspot index is lower than the second threshold value of the data block hotspot index, dividing the data block into a third hotspot data block;
the calculation formula of the data block hotspot index is as follows:
wherein, Q is a hotspot index of the data block, S i is the size of the ith data of the data block, S is the size of the data block, ω ij is the access frequency of the jth access mode of the ith data of the data block, w j is a hotspot coefficient of the jth access mode of the data block, n is the total number of data of the data block, and m is the total number of access modes of the data block.
Preferably, the planning the data block storage according to the data block ordering information, to obtain node storage information, specifically includes:
Acquiring distributed storage node information according to the distributed storage partition information;
acquiring a distributed storage node matching index based on a distributed storage node matching evaluation model according to the second hot spot data block information, the third hot spot data block information and the distributed storage node information;
Planning data block storage according to the distributed storage node matching index and the data block ordering information, and obtaining node storage information;
according to the first hot spot data block information, the second hot spot data block information and the third hot spot data block information, ordering the data blocks from the big order to the small order based on the hot spot indexes of the data blocks, and obtaining the copy information of the data blocks;
acquiring spare state information of the distributed nodes according to node storage information;
Acquiring a node matching index of a data block copy based on a distributed storage node matching evaluation model according to the spare state information, the first hot spot data block information, the second hot spot data block information and the third hot spot data block information of the distributed node;
according to the node matching index of the data block copy and the information of the data block copy, the data block copy is stored in a distributed mode;
the distributed storage node matching evaluation model is as follows:
Where R (h, g) is a matching index of the h data block and the g distributed storage node, x g represents the available capacity size of the g distributed storage node, x h represents the size of the h data block, T j (h, g) represents response time of the h data block to the j access mode when the h data block is stored to the g distributed storage node, And m is the total number of access modes of the data block, wherein m is the access frequency of the j-th access mode.
Preferably, the dynamically adjusting the distributed storage according to the load information of the distributed storage node specifically includes:
acquiring a distributed storage node load evaluation index according to the distributed storage node load information;
acquiring a load evaluation index threshold of a distributed storage node based on the distributed storage requirement;
Judging whether the load evaluation index of the distributed storage node exceeds the load evaluation index threshold of the distributed storage node according to the load evaluation index of the distributed storage node and the load evaluation index threshold of the distributed storage node, if not, the state of the distributed storage node is normal, if so, the load of the distributed storage node is too high, and obtaining the information of the data block to be responded according to the load information of the distributed storage node;
acquiring load vacant resource information of the distributed storage nodes according to the load information of the distributed storage nodes;
According to the information of the data block to be responded and the load spare resource information of the distributed storage node, acquiring a node matching index of the data block to be responded based on a distributed storage node matching evaluation model;
Dynamically adjusting the distributed storage according to the node matching index of the data block to be responded;
the calculation formula of the distributed storage node load evaluation index is as follows:
Wherein D is a distributed storage node load evaluation index, α, β, γ are distributed storage node load evaluation coefficients, μ is a CPU utilization rate of the distributed storage node, μ 0 is a CPU standard utilization rate of the distributed storage node, τ is a disk utilization rate of the distributed storage node, τ 0 is a disk standard utilization rate of the distributed storage node, ρ is a network bandwidth of the distributed storage node, σ k is an access frequency of a kth data block of the distributed storage node, θ k is an access load coefficient of the kth data block to the distributed storage node, and E is a total number of data blocks of the distributed storage node.
Further, a data intelligent management system based on distributed storage is provided, which is used for implementing the intelligent management method, and includes:
The main control module is used for classifying standard data according to standard data characteristic information, acquiring data block information, distributing a first hot data block to a distributed storage buffer layer according to the first hot data block information and distributed storage buffer layer information, acquiring buffer layer storage planning information, planning data block storage according to data block ordering information, acquiring node storage information, judging whether the load of a distributed storage node is too high according to a distributed storage node load evaluation index and a distributed storage node load evaluation index threshold, acquiring data block information to be responded according to the distributed storage node load information, acquiring spare resource information of the distributed storage node load according to the distributed storage node load information, dynamically adjusting the distributed storage according to node matching index of the data block to be responded, and acquiring data copy node information according to data response fault information;
The information acquisition module is used for acquiring data information, data attribute information, data characteristic information, distributed storage partition information, distributed storage buffer layer information and distributed storage node information, acquiring data block access information according to the data block information, acquiring data block hot spot data information according to the data block access information, acquiring distributed storage node load information, distributed storage node load state information, distributed storage node load spare resource information and distributed storage node response speed information, and transmitting the data block hot spot data information to the calculation module;
The computing module is used for acquiring a data block hotspot index according to the data block hotspot data information, classifying the data block according to the data block hotspot index, a first threshold value of the data block hotspot index and a second threshold value of the data block hotspot index, acquiring data block classification information, acquiring a distributed storage node matching index according to the second hotspot data block information, the third hotspot data block information and the distributed storage node information, acquiring a data block copy node matching index according to the distributed node spare state information, the first hotspot data block information, the second hotspot data block information and the third hotspot data block information, acquiring a data block node matching index to be responded according to the data block information to be responded and the distributed storage node load spare resource information, and acquiring a distributed storage node load assessment index according to the distributed storage node load information;
And the display module is used for displaying the data block information, the data block hot spot data information, the data block storage planning information, the distributed storage node load assessment index and the data response fault information.
Optionally, the main control module specifically includes:
the control unit is used for classifying standard data according to standard data characteristic information, acquiring data block information, distributing first hot spot data blocks to the distributed storage buffer layers according to the first hot spot data block information and the distributed storage buffer layer information, acquiring buffer layer storage planning information, planning data block storage according to data block ordering information, acquiring node storage information, and acquiring data copy node information according to data response fault information;
The information receiving unit is interacted with the information acquisition module and the calculation module and is used for acquiring data and transmitting the data to the dynamic adjustment unit;
The dynamic adjustment unit is used for judging whether the load of the distributed storage nodes is too high according to the load evaluation index of the distributed storage nodes and the load evaluation index threshold of the distributed storage nodes, acquiring data block information to be responded according to the load information of the distributed storage nodes, acquiring spare resource information of the load of the distributed storage nodes according to the load information of the distributed storage nodes, and dynamically adjusting the distributed storage according to the node matching index of the data block to be responded.
Optionally, the information acquisition module specifically includes:
The first acquisition unit is used for acquiring data information, data attribute information, data characteristic information, distributed storage partition information, distributed storage buffer layer information and distributed storage node information, and acquiring data block access information according to the data block information;
The second acquisition unit is used for acquiring data block hot spot data information according to the data block access information, acquiring distributed storage node load information, distributed storage node load state information, distributed storage node load spare resource information and distributed storage node response speed information, and transmitting the information to the calculation module.
Optionally, the computing module specifically includes:
The hot spot index unit is used for acquiring a data block hot spot index according to the data block hot spot data information, classifying the data block according to the data block hot spot index, the first threshold value of the data block hot spot index and the second threshold value of the data block hot spot index, and acquiring data block classification information;
The node matching unit is used for acquiring a distributed storage node matching index according to the second hot spot data block information, the third hot spot data block information and the distributed storage node information, acquiring a data block copy node matching index according to the distributed node spare state information, the first hot spot data block information, the second hot spot data block information and the third hot spot data block information, and acquiring a data block node matching index to be responded according to the data block information to be responded and the distributed storage node load spare resource information;
the load evaluation unit is used for acquiring a distributed storage node load evaluation index according to the distributed storage node load information and transmitting the distributed storage node load evaluation index to the main control module.
Compared with the prior art, the invention has the beneficial effects that:
The invention provides a data intelligent management method and system based on distributed storage, which improves the intelligent degree of data management by classifying standard data, improves the distributed storage efficiency of data by classifying data blocks through data block hot spot indexes, ensures the access efficiency of high-frequency access data, improves the response rate of the data block copies while ensuring the access of the data block copies when the data block fails through data block copy node matching indexes, dynamically adjusts the distributed storage through distributed storage node load evaluation indexes, and avoids overhigh node load and influence on the access speed of hot spot data.
Drawings
FIG. 1 is a flow chart of a method for intelligently managing data based on distributed storage;
FIG. 2 is a flow chart of data block acquisition in the present invention;
FIG. 3 is a flow chart of a data block storage planning in accordance with the present invention;
FIG. 4 is a flow chart of distributed storage of copies of data blocks in accordance with the present invention;
Fig. 5 is a block diagram of a distributed storage-based intelligent data management system according to the present invention.
Detailed Description
The following description is presented to enable one of ordinary skill in the art to make and use the invention. The preferred embodiments in the following description are by way of example only and other obvious variations will occur to those skilled in the art.
Referring to fig. 1-4, an intelligent data management method based on distributed storage in an embodiment of the present invention includes:
Acquiring data information, wherein the data information comprises data attribute information and data characteristic information;
acquiring distributed storage partition information, wherein the distributed storage partition information comprises distributed storage buffer layer information and distributed storage node information;
Acquiring data block information based on data classification according to the data information;
specifically, according to the data information, based on the data classification, the data block information is acquired, specifically including:
Acquiring data attribute information according to the data information, wherein the data attribute information comprises data type information and data format information;
according to the data attribute information, unifying data formats of the data to obtain corrected data information;
obtaining duplication removal data information based on a hash duplication removal method according to the correction data information;
obtaining duplication removal data missing information according to the duplication removal data information;
Acquiring a data missing threshold based on the data distributed storage requirement;
Judging whether the deduplication data missing information exceeds the data missing threshold according to the deduplication data missing information and the data missing threshold, if so, the deduplication data does not accord with the distributed storage standard, and if not, according to the deduplication data information, based on data standardization, obtaining standard data information;
Acquiring standard data characteristic information according to standard data information, wherein the standard data characteristic information comprises standard data keyword information and standard data timestamp information;
and classifying the standard data according to the standard data characteristic information to obtain data block information.
According to the scheme, the data format is unified, the influence of different data formats on later data classification and data access is avoided, the data distributed storage efficiency is reduced, the repeated data in the data are removed through a hash deduplication method, excessive missing data is avoided by judging the missing data in the data, the reliability and accuracy of the data are reduced, the data are classified according to standard data keyword information and standard data timestamp information, the data are stored in a distributed mode later, and the storage efficiency is improved.
Acquiring data block access information according to the data block information, wherein the data block access information comprises data access mode information and data access frequency information corresponding to the data access mode;
Acquiring hot spot data information of the data block according to the access information of the data block;
Acquiring data block storage planning information according to the data block hot spot data information and the distributed storage node information;
specifically, according to the hot spot data information of the data block and the distributed storage node information, the data block storage planning information is obtained, which specifically comprises:
Classifying the data blocks based on the data block hotspot indexes according to the data block hotspot data information to obtain data block classification information;
acquiring first hot spot data block information according to the data block classification information;
Acquiring distributed storage buffer layer information according to the distributed storage partition information;
Distributing the first hot spot data block to a distributed storage buffer layer according to the first hot spot data block information and the distributed storage buffer layer information, and obtaining buffer layer storage planning information;
acquiring second hot spot data block information and third hot spot data block information according to the data block classification information;
according to the second hot spot data block information and the third hot spot data block information, ordering the data blocks based on the order of the hot spot indexes of the data blocks from big to small, and obtaining data block ordering information;
planning the data block storage according to the data block ordering information to obtain node storage information;
and acquiring data block storage planning information according to the distributed storage buffer layer information and the node storage information.
In the scheme, a caching layer is arranged at the front end of the distributed storage system, and hot spot data with high-frequency access is cached in a cache, so that the number of access times to the rear end storage is reduced. Therefore, the user request can be responded quickly, and the access speed of the hot spot data is improved.
Specifically, according to the data block hot spot data information, classifying the data blocks based on the data block hot spot indexes to obtain data block classification information, specifically including:
Acquiring a data block hotspot index according to the data block hotspot data information;
Acquiring a first threshold value of a data block hotspot index and a second threshold value of the data block hotspot index based on the data block access requirement;
classifying the data blocks according to the data block hotspot indexes, the first data block hotspot index threshold and the second data block hotspot index threshold to obtain data block classification information;
If the data block hotspot index is higher than the first threshold value of the data block hotspot index, dividing the data block into first hotspot data blocks;
if the data block hotspot index is lower than the first data block hotspot index threshold and higher than the second data block hotspot index threshold, dividing the data block into second hotspot data blocks;
if the data block hotspot index is lower than the second threshold value of the data block hotspot index, dividing the data block into a third hotspot data block;
the calculation formula of the data block hotspot index is as follows:
wherein, Q is a hotspot index of the data block, S i is the size of the ith data of the data block, S is the size of the data block, ω ij is the access frequency of the jth access mode of the ith data of the data block, w j is a hotspot coefficient of the jth access mode of the data block, n is the total number of data of the data block, and m is the total number of access modes of the data block.
Still further, according to the data block ordering information, planning the data block storage to obtain node storage information, which specifically includes:
Acquiring distributed storage node information according to the distributed storage partition information;
acquiring a distributed storage node matching index based on a distributed storage node matching evaluation model according to the second hot spot data block information, the third hot spot data block information and the distributed storage node information;
Planning data block storage according to the distributed storage node matching index and the data block ordering information, and obtaining node storage information;
according to the first hot spot data block information, the second hot spot data block information and the third hot spot data block information, ordering the data blocks from the big order to the small order based on the hot spot indexes of the data blocks, and obtaining the copy information of the data blocks;
acquiring spare state information of the distributed nodes according to node storage information;
Acquiring a node matching index of a data block copy based on a distributed storage node matching evaluation model according to the spare state information, the first hot spot data block information, the second hot spot data block information and the third hot spot data block information of the distributed node;
according to the node matching index of the data block copy and the information of the data block copy, the data block copy is stored in a distributed mode;
the distributed storage node matching evaluation model is as follows:
Where R (h, g) is a matching index of the h data block and the g distributed storage node, x g represents the available capacity size of the g distributed storage node, x h represents the size of the h data block, T j (h, g) represents response time of the h data block to the j access mode when the h data block is stored to the g distributed storage node, And m is the total number of access modes of the data block, wherein m is the access frequency of the j-th access mode.
In the scheme, the data blocks are classified according to the data block hotspot indexes, the first data block hotspot index threshold and the second data block hotspot index threshold, the data blocks are divided into first hot data blocks, second hot data blocks and third hot data blocks, the first hot data blocks are distributed to a buffer layer, the access rate is ensured, the second hot data blocks and the third hot data blocks are distributed and stored according to the data block ordering information and the sequence of the hot indexes from large to small, the distributed storage node matching indexes of the data blocks and each node are calculated, after the distribution of one data block is completed, the distributed storage node information is updated, the resource occupation information of the node is changed, and then the distributed storage is carried out until the whole data block ordering is traversed.
Meanwhile, for data blocks of different types, different numbers of data block copies are generated, for example, in the embodiment, a first hot spot data block generates three data block copies, a second hot spot data block generates two data block copies, a third hot spot data block generates one data block copy, when a certain node fails, the system can still acquire data from other nodes through a data copying mechanism, reliability and durability of the data are guaranteed, multiple nodes store the same data copy, reading performance can be improved, because the data can be read from different nodes in parallel, pressure of a single node is reduced, and load balancing is achieved.
Storing the data in a distributed mode according to the data block storage planning information;
acquiring distributed storage node load information, wherein the distributed storage node load information comprises distributed storage node load state information, distributed storage node load spare resource information and distributed storage node response speed information;
Dynamically adjusting the distributed storage according to the load information of the distributed storage nodes;
specifically, according to the load information of the distributed storage nodes, the distributed storage is dynamically adjusted, and the method specifically comprises the following steps:
acquiring a distributed storage node load evaluation index according to the distributed storage node load information;
acquiring a load evaluation index threshold of a distributed storage node based on the distributed storage requirement;
Judging whether the load evaluation index of the distributed storage node exceeds the load evaluation index threshold of the distributed storage node according to the load evaluation index of the distributed storage node and the load evaluation index threshold of the distributed storage node, if not, the state of the distributed storage node is normal, if so, the load of the distributed storage node is too high, and obtaining the information of the data block to be responded according to the load information of the distributed storage node;
acquiring load vacant resource information of the distributed storage nodes according to the load information of the distributed storage nodes;
According to the information of the data block to be responded and the load spare resource information of the distributed storage node, acquiring a node matching index of the data block to be responded based on a distributed storage node matching evaluation model;
Dynamically adjusting the distributed storage according to the node matching index of the data block to be responded;
the calculation formula of the distributed storage node load evaluation index is as follows:
Wherein D is a distributed storage node load evaluation index, α, β, γ are distributed storage node load evaluation coefficients, μ is a CPU utilization rate of the distributed storage node, μ 0 is a CPU standard utilization rate of the distributed storage node, τ is a disk utilization rate of the distributed storage node, τ 0 is a disk standard utilization rate of the distributed storage node, ρ is a network bandwidth of the distributed storage node, σ k is an access frequency of a kth data block of the distributed storage node, θ k is an access load coefficient of the kth data block to the distributed storage node, and E is a total number of data blocks of the distributed storage node.
In the scheme, the load of the distributed storage nodes is judged to be too high through the load evaluation index of the distributed storage nodes and the load evaluation index threshold of the distributed storage nodes, so that the load abnormality of the distributed storage nodes is ensured to be found timely, the load spare resource information of the distributed storage nodes is acquired through the load information of the distributed storage nodes, the distributed storage is dynamically adjusted according to the node matching index of the data block to be responded, the flexibility and the expandability of the system are improved, the spare resources of the distributed storage nodes are fully utilized, and the influence on the access speed of hot spot data due to the fact that the load of certain nodes is too high is avoided.
Acquiring response information of the distributed storage nodes;
acquiring data response fault information based on a response time threshold according to the distributed storage node response information;
Acquiring node information of the data copy according to the data response fault information;
And outputting response data according to the node information of the data copy.
In the scheme, when a certain node fails, the system can quickly copy data from other nodes for recovery, so that the risk of data loss is reduced, and the availability of the system is ensured.
Referring to fig. 5, further, in combination with the above-mentioned method for intelligently managing data based on distributed storage, an intelligent system for intelligently managing data based on distributed storage is provided, which includes:
The main control module is used for classifying standard data according to standard data characteristic information, acquiring data block information, distributing a first hot data block to a distributed storage buffer layer according to the first hot data block information and distributed storage buffer layer information, acquiring buffer layer storage planning information, planning data block storage according to data block ordering information, acquiring node storage information, judging whether the load of a distributed storage node is too high according to a distributed storage node load evaluation index and a distributed storage node load evaluation index threshold, acquiring data block information to be responded according to the distributed storage node load information, acquiring spare resource information of the distributed storage node load according to the distributed storage node load information, dynamically adjusting the distributed storage according to node matching index of the data block to be responded, and acquiring data copy node information according to data response fault information;
The information acquisition module is used for acquiring data information, data attribute information, data characteristic information, distributed storage partition information, distributed storage buffer layer information and distributed storage node information, acquiring data block access information according to the data block information, acquiring data block hot spot data information according to the data block access information, acquiring distributed storage node load information, distributed storage node load state information, distributed storage node load spare resource information and distributed storage node response speed information, and transmitting the data block hot spot data information to the calculation module;
The computing module is used for acquiring a data block hotspot index according to the data block hotspot data information, classifying the data block according to the data block hotspot index, a first threshold value of the data block hotspot index and a second threshold value of the data block hotspot index, acquiring data block classification information, acquiring a distributed storage node matching index according to the second hotspot data block information, the third hotspot data block information and the distributed storage node information, acquiring a data block copy node matching index according to the distributed node spare state information, the first hotspot data block information, the second hotspot data block information and the third hotspot data block information, acquiring a data block node matching index to be responded according to the data block information to be responded and the distributed storage node load spare resource information, and acquiring a distributed storage node load assessment index according to the distributed storage node load information;
And the display module is used for displaying the data block information, the data block hot spot data information, the data block storage planning information, the distributed storage node load assessment index and the data response fault information.
The main control module specifically comprises:
the control unit is used for classifying standard data according to standard data characteristic information, acquiring data block information, distributing first hot spot data blocks to the distributed storage buffer layers according to the first hot spot data block information and the distributed storage buffer layer information, acquiring buffer layer storage planning information, planning data block storage according to data block ordering information, acquiring node storage information, and acquiring data copy node information according to data response fault information;
The information receiving unit is interacted with the information acquisition module and the calculation module and is used for acquiring data and transmitting the data to the dynamic adjustment unit;
The dynamic adjustment unit is used for judging whether the load of the distributed storage nodes is too high according to the load evaluation index of the distributed storage nodes and the load evaluation index threshold of the distributed storage nodes, acquiring data block information to be responded according to the load information of the distributed storage nodes, acquiring spare resource information of the load of the distributed storage nodes according to the load information of the distributed storage nodes, and dynamically adjusting the distributed storage according to the node matching index of the data block to be responded.
The information acquisition module specifically comprises:
The first acquisition unit is used for acquiring data information, data attribute information, data characteristic information, distributed storage partition information, distributed storage buffer layer information and distributed storage node information, and acquiring data block access information according to the data block information;
The second acquisition unit is used for acquiring data block hot spot data information according to the data block access information, acquiring distributed storage node load information, distributed storage node load state information, distributed storage node load spare resource information and distributed storage node response speed information, and transmitting the information to the calculation module.
The computing module specifically comprises:
The hot spot index unit is used for acquiring a data block hot spot index according to the data block hot spot data information, classifying the data block according to the data block hot spot index, the first threshold value of the data block hot spot index and the second threshold value of the data block hot spot index, and acquiring data block classification information;
The node matching unit is used for acquiring a distributed storage node matching index according to the second hot spot data block information, the third hot spot data block information and the distributed storage node information, acquiring a data block copy node matching index according to the distributed node spare state information, the first hot spot data block information, the second hot spot data block information and the third hot spot data block information, and acquiring a data block node matching index to be responded according to the data block information to be responded and the distributed storage node load spare resource information;
the load evaluation unit is used for acquiring a distributed storage node load evaluation index according to the distributed storage node load information and transmitting the distributed storage node load evaluation index to the main control module.
In summary, the invention has the advantages that: the standard data is classified according to the standard data characteristic information, the data block information is obtained, the intelligent degree of data management is improved, the data access efficiency is improved, the data block hot spot data information is obtained according to the data block access information, the data block hot spot index is obtained according to the data block hot spot data information, the data block is classified according to the data block hot spot index, the data distributed storage efficiency is improved, meanwhile, the access efficiency of high-frequency access data is ensured, the data block copy is stored according to the data block copy node matching index, the response rate of the data block copy is improved while the data block copy is accessed when the data block fails is ensured, the distributed storage is dynamically adjusted according to the distributed storage node load evaluation index, and the access speed of hot spot data is prevented from being influenced.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made therein without departing from the spirit and scope of the invention, which is defined by the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (6)

1. The intelligent data management method based on distributed storage is characterized by comprising the following steps of:
Acquiring data information, wherein the data information comprises data attribute information and data characteristic information;
acquiring distributed storage partition information, wherein the distributed storage partition information comprises distributed storage buffer layer information and distributed storage node information;
Acquiring data block information based on data classification according to the data information;
Acquiring data block access information according to the data block information, wherein the data block access information comprises data access mode information and data access frequency information corresponding to the data access mode;
Acquiring hot spot data information of the data block according to the access information of the data block;
Acquiring data block storage planning information according to the data block hot spot data information and the distributed storage node information;
storing the data in a distributed mode according to the data block storage planning information;
acquiring distributed storage node load information, wherein the distributed storage node load information comprises distributed storage node load state information, distributed storage node load spare resource information and distributed storage node response speed information;
Dynamically adjusting the distributed storage according to the load information of the distributed storage nodes;
acquiring response information of the distributed storage nodes;
acquiring data response fault information based on a response time threshold according to the distributed storage node response information;
Acquiring node information of the data copy according to the data response fault information;
outputting response data according to the node information of the data copy;
the step of acquiring the data block storage planning information according to the data block hot spot data information and the distributed storage node information specifically comprises the following steps:
Classifying the data blocks based on the data block hotspot indexes according to the data block hotspot data information to obtain data block classification information;
acquiring first hot spot data block information according to the data block classification information;
Acquiring distributed storage buffer layer information according to the distributed storage partition information;
Distributing the first hot spot data block to a distributed storage buffer layer according to the first hot spot data block information and the distributed storage buffer layer information, and obtaining buffer layer storage planning information;
acquiring second hot spot data block information and third hot spot data block information according to the data block classification information;
according to the second hot spot data block information and the third hot spot data block information, ordering the data blocks based on the order of the hot spot indexes of the data blocks from big to small, and obtaining data block ordering information;
planning the data block storage according to the data block ordering information to obtain node storage information;
Acquiring data block storage planning information according to the distributed storage buffer layer information and the node storage information;
classifying the data blocks based on the data block hotspot indexes according to the data block hotspot data information to obtain data block classification information, wherein the method specifically comprises the following steps of:
Acquiring a data block hotspot index according to the data block hotspot data information;
Acquiring a first threshold value of a data block hotspot index and a second threshold value of the data block hotspot index based on the data block access requirement;
classifying the data blocks according to the data block hotspot indexes, the first data block hotspot index threshold and the second data block hotspot index threshold to obtain data block classification information;
If the data block hotspot index is higher than the first threshold value of the data block hotspot index, dividing the data block into first hotspot data blocks;
if the data block hotspot index is lower than the first data block hotspot index threshold and higher than the second data block hotspot index threshold, dividing the data block into second hotspot data blocks;
if the data block hotspot index is lower than the second threshold value of the data block hotspot index, dividing the data block into a third hotspot data block;
the calculation formula of the data block hotspot index is as follows:
where Q is a data block hotspot index, For the size of the ith data of the data block,For the size of the data block,For the access frequency of the jth access mode of the ith data of the data block,The hot spot coefficient of the j-th access mode of the data block is n, the total number of the data block is n, and m is the total number of the access modes of the data block;
planning data block storage according to the data block ordering information to obtain node storage information, wherein the method specifically comprises the following steps:
Acquiring distributed storage node information according to the distributed storage partition information;
acquiring a distributed storage node matching index based on a distributed storage node matching evaluation model according to the second hot spot data block information, the third hot spot data block information and the distributed storage node information;
Planning data block storage according to the distributed storage node matching index and the data block ordering information, and obtaining node storage information;
according to the first hot spot data block information, the second hot spot data block information and the third hot spot data block information, ordering the data blocks from the big order to the small order based on the hot spot indexes of the data blocks, and obtaining the copy information of the data blocks;
acquiring spare state information of the distributed nodes according to node storage information;
Acquiring a node matching index of a data block copy based on a distributed storage node matching evaluation model according to the spare state information, the first hot spot data block information, the second hot spot data block information and the third hot spot data block information of the distributed node;
according to the node matching index of the data block copy and the information of the data block copy, the data block copy is stored in a distributed mode;
the distributed storage node matching evaluation model is as follows:
In the formula, For the matching index of the h data block to the g distributed storage node,Representing the available capacity size of the g-th distributed storage node,Indicating the size of the h-th data block,Representing the response time to the jth access mode when the jth data block is stored to the jth distributed storage node,M is the total number of access modes of the data block, wherein m is the access frequency of the j-th access mode;
the dynamic adjustment of the distributed storage according to the load information of the distributed storage nodes specifically comprises:
acquiring a distributed storage node load evaluation index according to the distributed storage node load information;
acquiring a load evaluation index threshold of a distributed storage node based on the distributed storage requirement;
Judging whether the load evaluation index of the distributed storage node exceeds the load evaluation index threshold of the distributed storage node according to the load evaluation index of the distributed storage node and the load evaluation index threshold of the distributed storage node, if not, the state of the distributed storage node is normal, if so, the load of the distributed storage node is too high, and obtaining the information of the data block to be responded according to the load information of the distributed storage node;
acquiring load vacant resource information of the distributed storage nodes according to the load information of the distributed storage nodes;
According to the information of the data block to be responded and the load spare resource information of the distributed storage node, acquiring a node matching index of the data block to be responded based on a distributed storage node matching evaluation model;
Dynamically adjusting the distributed storage according to the node matching index of the data block to be responded;
the calculation formula of the distributed storage node load evaluation index is as follows:
where D is a distributed storage node load assessment index, The coefficients are evaluated for the distributed storage node load,For CPU utilization of the distributed storage node,CPU standard usage for distributed storage nodes,For disk usage of distributed storage nodes,Disk standard usage for distributed storage nodes,For the network bandwidth of the distributed storage nodes,For the access frequency of the kth data block of the distributed storage node,And (3) an access load coefficient of the distributed storage node for the kth data block, wherein E is the total number of the data blocks of the distributed storage node.
2. The intelligent data management method based on distributed storage according to claim 1, wherein the acquiring data block information based on data classification according to data information specifically comprises:
Acquiring data attribute information according to the data information, wherein the data attribute information comprises data type information and data format information;
according to the data attribute information, unifying data formats of the data to obtain corrected data information;
obtaining duplication removal data information based on a hash duplication removal method according to the correction data information;
obtaining duplication removal data missing information according to the duplication removal data information;
Acquiring a data missing threshold based on the data distributed storage requirement;
Judging whether the deduplication data missing information exceeds the data missing threshold according to the deduplication data missing information and the data missing threshold, if so, the deduplication data does not accord with the distributed storage standard, and if not, according to the deduplication data information, based on data standardization, obtaining standard data information;
Acquiring standard data characteristic information according to standard data information, wherein the standard data characteristic information comprises standard data keyword information and standard data timestamp information;
and classifying the standard data according to the standard data characteristic information to obtain data block information.
3. An intelligent data management system based on distributed storage, for implementing the intelligent management method according to any one of claims 1-2, comprising:
The main control module is used for classifying standard data according to standard data characteristic information, acquiring data block information, distributing a first hot data block to a distributed storage buffer layer according to the first hot data block information and distributed storage buffer layer information, acquiring buffer layer storage planning information, planning data block storage according to data block ordering information, acquiring node storage information, judging whether the load of a distributed storage node is too high according to a distributed storage node load evaluation index and a distributed storage node load evaluation index threshold, acquiring data block information to be responded according to the distributed storage node load information, acquiring spare resource information of the distributed storage node load according to the distributed storage node load information, dynamically adjusting the distributed storage according to node matching index of the data block to be responded, and acquiring data copy node information according to data response fault information;
The information acquisition module is used for acquiring data information, data attribute information, data characteristic information, distributed storage partition information, distributed storage buffer layer information and distributed storage node information, acquiring data block access information according to the data block information, acquiring data block hot spot data information according to the data block access information, acquiring distributed storage node load information, distributed storage node load state information, distributed storage node load spare resource information and distributed storage node response speed information, and transmitting the data block hot spot data information to the calculation module;
The computing module is used for acquiring a data block hotspot index according to the data block hotspot data information, classifying the data block according to the data block hotspot index, a first threshold value of the data block hotspot index and a second threshold value of the data block hotspot index, acquiring data block classification information, acquiring a distributed storage node matching index according to the second hotspot data block information, the third hotspot data block information and the distributed storage node information, acquiring a data block copy node matching index according to the distributed node spare state information, the first hotspot data block information, the second hotspot data block information and the third hotspot data block information, acquiring a data block node matching index to be responded according to the data block information to be responded and the distributed storage node load spare resource information, and acquiring a distributed storage node load assessment index according to the distributed storage node load information;
And the display module is used for displaying the data block information, the data block hot spot data information, the data block storage planning information, the distributed storage node load assessment index and the data response fault information.
4. The intelligent data management system based on distributed storage according to claim 3, wherein the main control module specifically comprises:
the control unit is used for classifying standard data according to standard data characteristic information, acquiring data block information, distributing first hot spot data blocks to the distributed storage buffer layers according to the first hot spot data block information and the distributed storage buffer layer information, acquiring buffer layer storage planning information, planning data block storage according to data block ordering information, acquiring node storage information, and acquiring data copy node information according to data response fault information;
The information receiving unit is interacted with the information acquisition module and the calculation module and is used for acquiring data and transmitting the data to the dynamic adjustment unit;
The dynamic adjustment unit is used for judging whether the load of the distributed storage nodes is too high according to the load evaluation index of the distributed storage nodes and the load evaluation index threshold of the distributed storage nodes, acquiring data block information to be responded according to the load information of the distributed storage nodes, acquiring spare resource information of the load of the distributed storage nodes according to the load information of the distributed storage nodes, and dynamically adjusting the distributed storage according to the node matching index of the data block to be responded.
5. The intelligent data management system based on distributed storage according to claim 3, wherein the information acquisition module specifically comprises:
The first acquisition unit is used for acquiring data information, data attribute information, data characteristic information, distributed storage partition information, distributed storage buffer layer information and distributed storage node information, and acquiring data block access information according to the data block information;
The second acquisition unit is used for acquiring data block hot spot data information according to the data block access information, acquiring distributed storage node load information, distributed storage node load state information, distributed storage node load spare resource information and distributed storage node response speed information, and transmitting the information to the calculation module.
6. A distributed storage-based data intelligent management system according to claim 3, wherein the computing module specifically comprises:
The hot spot index unit is used for acquiring a data block hot spot index according to the data block hot spot data information, classifying the data block according to the data block hot spot index, the first threshold value of the data block hot spot index and the second threshold value of the data block hot spot index, and acquiring data block classification information;
The node matching unit is used for acquiring a distributed storage node matching index according to the second hot spot data block information, the third hot spot data block information and the distributed storage node information, acquiring a data block copy node matching index according to the distributed node spare state information, the first hot spot data block information, the second hot spot data block information and the third hot spot data block information, and acquiring a data block node matching index to be responded according to the data block information to be responded and the distributed storage node load spare resource information;
the load evaluation unit is used for acquiring a distributed storage node load evaluation index according to the distributed storage node load information and transmitting the distributed storage node load evaluation index to the main control module.
CN202410398097.XA 2024-04-03 2024-04-03 Distributed storage-based data intelligent management method and system Active CN118363527B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410398097.XA CN118363527B (en) 2024-04-03 2024-04-03 Distributed storage-based data intelligent management method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410398097.XA CN118363527B (en) 2024-04-03 2024-04-03 Distributed storage-based data intelligent management method and system

Publications (2)

Publication Number Publication Date
CN118363527A CN118363527A (en) 2024-07-19
CN118363527B true CN118363527B (en) 2024-10-25

Family

ID=91877402

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410398097.XA Active CN118363527B (en) 2024-04-03 2024-04-03 Distributed storage-based data intelligent management method and system

Country Status (1)

Country Link
CN (1) CN118363527B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119645322B (en) * 2025-02-19 2025-04-29 蜀汇乾鲲科技有限公司 Storage control method and device based on big data

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116991332A (en) * 2023-09-26 2023-11-03 长春易加科技有限公司 Intelligent factory large-scale data storage and analysis method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116107797A (en) * 2021-11-09 2023-05-12 上海哔哩哔哩科技有限公司 Data storage method and device, electronic device and storage medium
CN116455919A (en) * 2023-03-14 2023-07-18 中国航天科工集团第二研究院 Dynamic cloud data copy management method based on storage node state awareness

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116991332A (en) * 2023-09-26 2023-11-03 长春易加科技有限公司 Intelligent factory large-scale data storage and analysis method

Also Published As

Publication number Publication date
CN118363527A (en) 2024-07-19

Similar Documents

Publication Publication Date Title
US12314230B2 (en) Intelligent layout of composite data structures in tiered storage with persistent memory
US10057367B2 (en) Systems and methods for data caching in a communications network
CN101674233B (en) Peterson graph-based storage network structure and data read-write method thereof
US10579272B2 (en) Workload aware storage platform
US8463846B2 (en) File bundling for cache servers of content delivery networks
CN104111804B (en) A kind of distributed file system
US6732117B1 (en) Techniques for handling client-oriented requests within a data storage system
CN105183839A (en) Hadoop-based storage optimizing method for small file hierachical indexing
US10409804B2 (en) Reducing I/O operations for on-demand demand data page generation
EP2534571B1 (en) Method and system for dynamically replicating data within a distributed storage system
CN110636122A (en) Distributed storage method, server, system, electronic device and storage medium
CN109714229B (en) Performance bottleneck positioning method of distributed storage system
CN106933868A (en) A kind of method and data server for adjusting data fragmentation distribution
CN102904948A (en) Super-large-scale low-cost storage system
CN118363527B (en) Distributed storage-based data intelligent management method and system
CN109766318A (en) File reading and device
CN110276713A (en) A high-efficiency caching method and system for remote sensing image data
CN106528451A (en) Cloud storage framework for second level cache prefetching for small files and construction method thereof
CN119292962B (en) Shared cache management method, device and storage medium
CN108540510B (en) A cloud host creation method, device and cloud service system
CN112947860A (en) Hierarchical storage and scheduling method of distributed data copies
CN109767274B (en) Method and system for carrying out associated storage on massive invoice data
US20120297010A1 (en) Distributed Caching and Cache Analysis
CN116795878B (en) Data processing method and device, electronic equipment and medium
CN108920095A (en) A kind of data store optimization method and apparatus based on CRUSH

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant