CN119847452B - Storage space management method and electronic device - Google Patents
Storage space management method and electronic deviceInfo
- Publication number
- CN119847452B CN119847452B CN202510328524.1A CN202510328524A CN119847452B CN 119847452 B CN119847452 B CN 119847452B CN 202510328524 A CN202510328524 A CN 202510328524A CN 119847452 B CN119847452 B CN 119847452B
- Authority
- CN
- China
- Prior art keywords
- data
- index
- block
- life
- indicator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/023—Free address space management
- G06F12/0253—Garbage collection, i.e. reclamation of unreferenced memory
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/064—Management of blocks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0643—Management of files
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0646—Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
- G06F3/0652—Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0656—Data buffering arrangements
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A storage space management method and electronic equipment relate to the technical field of computers. And the electronic equipment determines the predicted life index of the data block according to the life cycle index of the file to which the file system data block belongs. And then, the electronic equipment manages the storage space of the file system according to the predicted life index of the data block. The predicted lifetime indicator is used to indicate the expected number of data blocks allocated by the file system between writing to a failure. The lifecycle indicator of a file is used to indicate the expected number of data blocks that the file system allocates between creation and deletion of the file. The electronic device can more efficiently manage the storage space by predicting the service life of the data block and knowing the predicted failure time of the data block in advance, thereby reducing the management load of the storage space.
Description
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a storage space management method and electronic equipment.
Background
A file system is a method for an operating system to organize files on a storage device, wherein a log file system is widely used on electronic devices. Data blocks in a file system may fail for various reasons, resulting in a large number of invalid data blocks. In order to maintain effective utilization of storage space, electronic devices need to frequently execute storage space management operations such as data block movement or recovery, which results in a large storage space management load.
Therefore, how to efficiently manage data blocks and reduce the load of storage space management are important issues to be resolved.
Disclosure of Invention
The embodiment of the application provides a storage space management method and electronic equipment, which are used for reducing the management load of the storage space of the electronic equipment.
In order to achieve the above purpose, the embodiment of the present application adopts the following technical scheme:
According to a first aspect, a storage space management method is provided, and the method comprises the steps of determining a predicted life index used for representing the life condition of a data block according to the life cycle index of a file to which the data block belongs in a file system, wherein the predicted life index is used for indicating the predicted number condition of the data block distributed by the file system from writing to failure, and the life cycle index of the file is used for indicating the predicted number condition of the data block distributed by the file system from creation to deletion. And then, according to the predicted life index of the data block, managing the storage space of the file system.
It should be appreciated that the allocation of data blocks to a file system is a continuous process with increasing numbers of allocations, and that both the lifecycle index and the predicted lifetime index count data blocks allocated in the view of the file system. The electronic device can determine the predicted life index of the data block according to the life index of the file to which the data block belongs in the file system, and then manage the storage space of the file system according to the life condition of the data block represented by the predicted life index.
According to the storage space management method, the electronic equipment acquires the required parameters and calculates the predicted life index of the data block, the predicted failure time of the data block is known in advance through predicting the life of the data block, and the storage space management can be more efficiently performed according to the failure time, so that the moving frequency of the data block is effectively reduced, management operations such as recovery and the like are efficiently performed on the data block, and the storage space management load of the electronic equipment is reduced.
In a possible implementation manner of the first aspect, in a case that the data block is an initial block, a prediction lifetime index of the initial block is consistent with a life cycle index of the file to which the data block belongs, and the initial block is a data block in which data is written for the first time.
It should be understood that the initial block is a block of data that is written for the first time and is not written based on the data update requirements. In the case that the data block is an initial block, the life cycle index of the file to which the initial block belongs can be inherited by the life cycle index of the initial block.
According to the storage space management method, the life index of the data block is predicted in an inheritance mode, so that the prediction process is simplified, and the prediction efficiency is improved.
In another possible implementation manner of the first aspect, in a case that the data block is an update block, the prediction lifetime index of the update block is obtained based on an update period index, and the update period index is used to indicate a number of data blocks allocated by the file system between writing and invalidation of the updated block.
The data written by the update block is used for updating the data content in the corresponding updated block, the corresponding updated block fails after the data is written by the update block, or the data written by the update block is used for updating the data content in the data block in the same storage position, and the updated block is the original data block before the current update.
It will be appreciated that after a data block is written to data, the data within the data block may be updated. Depending on the file system type, the storage locations of the update block and the updated block may differ. In some embodiments, such as a log file system, the update block and the updated block are data blocks in different storage locations, the data written by the update block is used to update the data content in the corresponding updated block, and the corresponding updated block fails after the update block writes the data. In other embodiments, the current write data of the update block is used to update the data content in the same memory location data block.
In the case that the data block is an update block, the electronic device predicts the lifetime index of the update block, taking into account the update interval, i.e. the update period. The electronic device may directly use the update period as a predicted lifetime index of the update block, or may comprehensively consider the update period and the predicted lifetime index of the updated block to obtain the predicted lifetime index of the update block.
According to the storage space management method, the updated file system state can be reflected more accurately by introducing the update period index, and the accuracy of the life prediction of the updated block is improved.
In another possible implementation manner of the first aspect, the predicted lifetime index of the updated block is a weighted sum of an update period index and a historical lifetime index, and the historical lifetime index is a predicted lifetime index of the updated block, wherein the updated block is the updated block or the initial block. When the updated block is an initial block, the prediction lifetime index of the updated block matches the lifetime index of the file to which the updated block belongs.
It should be appreciated that the updated block itself may also be an update block or an initial block. When the updated block is an initial block, the historical life index is the predicted life index of the initial block, that is, the historical life index is consistent with the life cycle index of the file to which the initial block belongs. The predicted lifetime index of the update block is derived based on the update period and the historical lifetime index. In the case where the updated block is an update block, the historical lifetime index may iterate to the historical lifetime index of the updated block corresponding to the previous update, and may eventually iterate to the predicted lifetime index of the corresponding initial block.
According to the storage space management method, the predicted life index of the update block is the weighted sum of the update period index and the historical life index, and the update period index can reflect the current state of the file system. The historical life index represents the predicted life index of the updated block, is calculated based on the previous updating period index and the predicted life index, and reflects the stability of the related data block in the past period and the comprehensive condition of the predicted life index. Therefore, the service life of the update block is predicted by comprehensively considering the update period index and the historical service life index, and a more comprehensive and more accurate service life condition prediction result can be obtained.
In another possible implementation manner of the first aspect, the initial life cycle index of the file is consistent with the life cycle index of the directory where the file is located, and the initial life cycle index of the directory is consistent with the life cycle index of the parent directory; the method further comprises the steps of updating a life cycle index of the target object according to a deleting time index and a creating time index of the target object when the target object is deleted, wherein the target object is a file or a catalog, the deleting time index is used for indicating the global block cumulative number of the file system when the target object is deleted, the creating time index is used for indicating the global block cumulative number of the file system when the target object is created, and the global block cumulative number represents the total number of data blocks which are cumulatively distributed in the file system.
It should be appreciated that the initial lifecycle index of the file may inherit the lifecycle index of the directory in which it is located, and that the lifecycle index of the directory may inherit the lifecycle index of the parent directory. When a file (or directory) is deleted, the electronic device updates the lifecycle index before the file (or directory) according to the actual lifecycle index of the file (or directory). The actual life cycle index is calculated according to the creation time index and the deletion time index of the file (or the catalog). The electronic device may obtain the creation timing indicator and the deletion timing indicator from a global block count of the file system.
According to the storage space management method, the electronic equipment records the creation time index and the deletion time index through the global block cumulative count, so that the relative position of the file (or the directory) in the process of increasing the storage space of the file system can be clearly known. The method is beneficial to analyzing the use condition and the update condition of the storage space of the file system, and is convenient for predicting the future storage space requirement. The electronic equipment updates the life cycle index before the file (or the catalog) according to the actual life cycle index of the file (or the catalog), and then the life cycle index after the file update is inherited by the life cycle index of the initial block under the file, so that the accuracy of the life cycle index is improved.
In another possible implementation manner of the first aspect, after updating the lifecycle index of the target object, the method further includes updating the lifecycle index of the parent directory of the target object according to the updated lifecycle index of the target object.
The storage space management method can ensure that the life cycle index of the father directory can be smoothly transited to a new value after the life cycle index of the file (or directory) is updated, and the life cycle index of the father directory is updated in time, so that the accuracy of the life cycle index of the father directory is improved.
In another possible implementation manner of the first aspect, the updated lifecycle index of the parent directory of the target object is obtained based on the updated lifecycle index of the target object, the updated lifecycle index of the parent directory of the target object, and the number of files under the parent directory of the target object.
It should be understood that the updated life cycle index of the target object, the original life cycle index of the parent directory and the number of files in the parent directory are comprehensively considered, so that the actual state of the file system can be more comprehensively reflected. The accuracy of the lifecycle index of the father directory is improved.
In another possible implementation manner of the first aspect, updating the lifecycle index of the parent directory of the target object according to the updated lifecycle index of the target object includes calculating a product of the lifecycle index of the parent directory of the target object and the number of files updated by the parent directory of the target object to obtain the first value. And adding the first value and the life cycle index updated by the target object to obtain a second value. And calculating the ratio of the second value to the number of files before the update of the parent directory of the target object, and obtaining the life cycle index after the update of the parent directory of the target object.
According to the storage space management method, the electronic equipment calculates the life cycle index average value of all files in the father catalog in the updating period. The average value may reflect the variation of the parent directory lifecycle index. The electronic equipment timely updates the life cycle index of the father catalog, and accuracy of the life cycle index of the father catalog is improved.
In another possible implementation manner of the first aspect, after updating the lifecycle index of the parent directory of the target object, when creating a new object under the parent directory, the initial lifecycle index of the new object is consistent with the updated lifecycle index of the parent directory.
According to the storage space management method, the initial life cycle index of the newly-built file under the father directory is predicted based on the more accurate life cycle index of the father directory, so that the accuracy of predicting the life of the data block under the newly-built file is improved.
In another possible implementation manner of the first aspect, the lifecycle index of the root directory is set to a preset number of data blocks, where the preset number of data blocks is a maximum number of data blocks allowed to be allocated to the root directory by the file system.
According to the storage space management method, the life cycle index of the root directory is preset as the maximum number of data blocks allowed to be distributed for the root directory by the file system. The electronic equipment does not need to carry out complex initialization evaluation and calculation, and directly adopts the maximum value as the life cycle index of the root directory, thereby improving the prediction efficiency.
In another possible implementation manner of the first aspect, the inode structure of the object includes a first parameter and a second parameter which are newly added, the structure of the file system includes a third parameter, the object is a file or a directory, the first parameter is used for recording a creation time index of the object, the second parameter is used for recording a life cycle index of the object, and the third parameter is used for recording the total number of data blocks which are distributed in a cumulative way in the file system.
According to the storage space management method, the electronic equipment can record the creation time index and the predicted life cycle index of the file (or the catalog) through the newly added parameters. When calculating the predicted lifetime index of the data block, relevant parameters can be called at any time to provide accurate and real-time assessment.
In another possible implementation manner of the first aspect, the index node structure of the file includes a data block lifetime tree structure, the data block lifetime tree structure includes M data block lifetime nodes, the data block lifetime nodes include a fourth parameter and a fifth parameter, the fourth parameter is used for recording a creation timing index of the data block, the fifth parameter is used for recording a prediction lifetime index of the data block, and M is a positive integer.
According to the storage space management method, the electronic equipment can record the creation time index and the predicted life index of the data block through the newly added parameters. When calculating the predicted lifetime index of the data block, relevant parameters can be called at any time to provide accurate and real-time assessment.
In another possible implementation manner of the first aspect, the management of the storage space of the file system according to the predicted lifetime index of the data blocks includes selecting a target data segment for garbage collection GC according to the predicted lifetime index of each data block, where one data segment includes N consecutive data blocks, and N is an integer greater than 1.
According to the storage space management method, the electronic equipment utilizes the predicted life indexes of the data blocks to screen out more suitable data segments as target data segments, so that the additional overhead caused by frequent movement of effective data blocks is reduced, the management efficiency of the data blocks is improved, and the storage space management load of the electronic equipment is obviously reduced.
In another possible implementation manner of the first aspect, selecting the target data segment for garbage collection GC according to the predicted lifetime index of each data block includes screening candidate data segments according to a movement overhead, and the movement overhead characterizes the overhead of the valid data blocks in the movement candidate data segments. And obtaining the average residual life index and the residual life variance index of the data blocks in each candidate data segment according to the predicted life index of each data block. And determining the target data segment according to the average residual life index and the residual life variance index of the data blocks in each candidate data segment.
According to the storage space management method, the electronic equipment comprehensively considers the moving expense and the service life of the data block, and screens out more suitable data segments as target data segments, so that the GC expense is reduced, and the storage space management load of the electronic equipment is reduced.
In another possible implementation manner of the first aspect, selecting the target data segment for garbage collection GC according to the predicted lifetime of each data block includes selecting the target data segment from the cache list in case the cache list has a data segment to be selected, wherein an average remaining lifetime index of the data segment to be selected is higher than a lifetime index threshold, and a remaining lifetime variance index is higher than a variance index threshold. And screening candidate data segments according to the moving expense under the condition that the cache list has no data segment to be selected, determining a target data segment and the data segment to be selected according to the candidate data segments, and storing the data segment to be selected into the cache list, wherein the average residual life index of the target data segment is higher than a life index threshold value, and the residual life variance index is smaller than or equal to a variance index threshold value.
In the storage space management method, for the candidate data segment with high average residual life index and large residual life variance index, the residual life distribution of the data blocks in the candidate data segment is more discrete, and possibly contains more data blocks with low residual life, and after the data blocks with low residual life are marked to fail, the moving cost is rapidly reduced in a short period. To increase efficiency, it is desirable to quickly select the target data segment for spatial reclamation. In order to avoid the performance overhead caused by full disk traversal, the selection is directly performed in the cache list. Therefore, candidate data segments with high average residual life indexes and large residual life variance indexes can be temporarily stored in the cache list to serve as data segments to be selected, and the next round of selecting target data segments can be directly selected from the data segments to be selected in the cache list, so that the recovery efficiency of the GC is improved.
In another possible implementation manner of the first aspect, the managing the storage space of the file system according to the predicted lifetime index of the data block includes classifying each data block according to the predicted lifetime of each data block to obtain a classification result, where the classification result includes at least a hot data block and a cold data block, and the predicted lifetime index of the hot data block is smaller than the predicted lifetime index of the cold data block. And writing each data block into corresponding different storage areas according to the classification result, wherein the storage areas at least comprise a hot data storage area and a cold data storage area, the hot data blocks are written into the hot data storage area, and the cold data blocks are written into the cold data storage area.
According to the storage space management method, through data splitting, the hot data and the cold data can be ensured to be reasonably stored in the corresponding storage areas. Therefore, the data blocks in each storage area fail in a similar time, thereby greatly reducing GC frequency and cost and obviously reducing the storage space management load of the electronic equipment.
In another possible implementation manner of the first aspect, the managing the storage space of the file system according to the predicted lifetime index of the data blocks includes preferentially writing back the data blocks with the predicted lifetime index greater than or equal to the set threshold to the main storage medium according to the predicted lifetime index of each data block and the set threshold for the data blocks in the cache.
According to the storage space management method, for the data blocks in the cache, the data blocks which are preferentially written back to the main storage medium are selected according to the predicted life index. For data blocks with higher predicted lifetime indicators, the electronic device performs write-back operations to synchronize data to the primary storage medium preferentially in view of the expectation that no changes will occur over a long period of time. For the data blocks with lower predicted life indexes, the data blocks are considered to be updated and lose efficacy quickly, so that temporary non-write-back to the main storage medium (called main memory for short) is selected, the cost that the data blocks which are written back to the main memory need to be deleted from the main memory and the data blocks need to be written back to the main memory again after the data blocks which are written back to the main memory lose efficacy quickly is reduced, the storage space management load of the electronic equipment is reduced, the unnecessary occupation of input/output bandwidth is reduced, the physical loss of the main storage medium is reduced, and the service life of the main storage medium is effectively prolonged.
In a second aspect, the application provides an electronic device comprising one or more processors and memory, the memory and the processors being coupled, the memory being for storing computer program code comprising computer instructions which, when executed by the one or more processors, cause the electronic device to carry out a method as described in the first aspect and any one of its possible designs.
In a third aspect, the present application provides a computer readable storage medium having instructions stored therein which, when run on an electronic device, cause the electronic device to perform the method of the first aspect and any one of its possible designs.
In a fourth aspect, the application provides a computer program product for causing a computer to carry out the method according to the first aspect and any one of its possible designs when the computer program product is run on a computer.
It will be appreciated that the advantages achieved by the electronic device according to the second aspect, the computer readable storage medium according to the third aspect, and the computer program product according to the fourth aspect provided above may refer to the advantages in the first aspect and any possible design manner thereof, and are not described herein.
Drawings
FIG. 1 is a diagram illustrating a remote update feature and a data block validity of a log file system according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a method for managing storage space according to an embodiment of the present application;
fig. 3 is a schematic hardware structure of an electronic device according to an embodiment of the present application;
fig. 4 is a schematic software architecture diagram of an electronic device according to an embodiment of the present application;
FIG. 5 is a schematic diagram illustrating a hierarchical structure and a data structure expansion of a file system according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a status of a data segment before and after updating a data block in a log file system according to an embodiment of the present application;
FIG. 7 is a schematic flow chart of garbage collection according to an embodiment of the present application;
FIG. 8 is a schematic diagram of selecting a target data segment according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a record of a creation timing index and a life cycle index during file creation according to an embodiment of the present application;
FIG. 10 is a diagram of a life cycle index of an updated file and a parent directory when deleting a file according to an embodiment of the present application;
FIG. 11 is a diagram of life cycle index of a new file after updating a directory according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of a chip system according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application are described below with reference to the accompanying drawings in the embodiments of the present application. In the description of embodiments of the application, the terminology used in the embodiments below is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include, for example, "one or more" such forms of expression, unless the context clearly indicates to the contrary. It should also be understood that in the following embodiments of the present application, "at least one", "one or more" means one or more than two (including two). The term "and/or" is used to describe an associative relationship of associative objects, and indicates that three relationships may exist, for example, a and/or B may indicate that a exists alone, while a and B exist together, and B exists alone, where A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise. The term "coupled" includes both direct and indirect connections, unless stated otherwise. The terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated.
In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
The software mechanism responsible for managing and storing file information in an operating system is called a file management system, which is called a file system for short. The file systems may include different types, such as log file systems, virtual file systems (Virtual FILE SYSTEMS, VFS), and file allocation table file systems (File Allocation Table FILE SYSTEMS, FAT), among others.
The log file system has the characteristic of updating in different places, namely updating data is not directly modified at the original data position, and new data is written into a new storage space in sequence in an additional mode. The method for updating the memory in different places has the characteristics of good random writing performance and friendly flash memory, but has higher requirements on the continuity of the memory space.
For file systems, a large number of invalid data blocks are inevitably generated during the user's continual writing and updating, while the characteristics of the off-site updating exacerbate the generation of invalid data blocks.
The storage space of the file system can comprise a plurality of data blocks, when the data is written, the file system needs to apply for the data blocks to store the data, the data blocks storing valid data are called valid data blocks, and the data blocks which do not store the data or are marked to be invalid are called invalid data blocks. Reasons for the data block being marked for failure include being marked for failure due to the deletion of the data block, being updated differently, the original data block being marked for failure.
By way of example, FIG. 1 illustrates the characteristics of a log file system update from elsewhere. As shown in fig. 1, when the data blocks 12 and 13 are to be updated, new data is written in order to the unused data blocks 18 and 19 in an additional manner, and the data blocks 18a and 19a are obtained after the update. Whereas the original data blocks 12 and 13 are marked as invalid data blocks 12a and 13a, respectively.
As shown in fig. 1, if the data block 15 is not subsequently used, the data in the data block 15 is deleted, and the data block 15 is marked as an invalid data block 15a accordingly. In fig. 1, the data block storing valid data is a valid data block, and the updated data block is exemplified by valid data blocks 11, 14, 16, 17, 18a, and 19 a. The updated, deleted and unused data blocks are invalid data blocks, for example, the data blocks before updating, and the data blocks 18 and 19 are both invalid data blocks.
Invalid data blocks not only occupy storage space, but also affect the storage space continuity of the log file system. Therefore, in order to improve the continuity and utilization of the storage space, a storage space management method such as garbage collection (Garbage Collection, GC) and data distribution is proposed. However, in the GC and data splitting process, the electronic device needs to frequently perform the data block moving and reclaiming operations for a large number of invalid data blocks, resulting in a large storage space management load.
In addition, in the process of saving data in the cache back to the persistent main storage medium (hereinafter referred to as cache write-back), data having a short partial lifetime is unnecessarily written to the persistent storage medium (e.g., a magnetic disk or a solid state disk), and then the data is promptly invalidated. In the process, not only is the storage space management load increased and the Input/Output (IO) bandwidth waste caused, but also the physical abrasion of the storage medium is increased, and the service life of the storage medium is shortened.
In some related art, the file system may allow a user to set a file storage age level according to his own needs through an Input/Output Control (ioctl) interface provided by a Virtual file system (Virtual FILE SYSTEMS, VFS). However, storing the aging level to a preset fixed level, e.g., distinguishing only temporary files from long-term files, does not provide an explicit and quantified time-defining criterion. The user needs to manually set the service life gear of each file, so that the operation is complicated, and the application scene is limited.
Therefore, the embodiment of the application provides a storage space management method which is applied to electronic equipment. As shown in fig. 2, the electronic device determines a predicted lifetime index of a data block according to a lifetime index of a file to which the data block belongs in a file system. The predicted lifetime index is used to characterize the lifetime situation of the data block. And then, the electronic equipment manages the storage space of the file system according to the predicted life index of the data block. The predicted lifetime indicator is used to indicate the predicted number of data blocks allocated by the file system between writing to failure of the data block. The lifecycle indicator of a file is used to indicate the expected number of data blocks that the file system allocates between creation and deletion of the file.
Wherein the allocation of data blocks of the file system is a continuous process with increasing numbers of allocations, similar to the accumulation of time. The height of the predicted lifetime indicator for any data block reflects how many other data blocks the file system may need to allocate in order to store the relevant data between writing from that data block to the failure. In particular, the higher the predicted lifetime indicator, the more other data blocks the file system needs to allocate from writing to failing, and the longer the predicted lifetime of that data block.
In some embodiments, the electronic device may accumulate the total number of data blocks allocated by the file system, which acts similarly to making a time-series record. The cycle index and the lifetime index then correspond to specific "time span" statistics in the time series. In this way, lifetime prediction of the data block can be achieved. For example, suppose the total number of data blocks allocated by the file system when the data blocks are written to data is a, and the predicted lifetime index of the data blocks is b. In this case, when the electronic device predicts that the total number of data blocks allocated by the file system reaches a+b, the data blocks may fail.
According to the storage space management method, the electronic equipment acquires the required parameters and calculates the predicted life index of the data block, the predicted failure time of the data block is known in advance through predicting the life of the data block, and the storage space management can be more efficiently performed according to the failure time, so that the moving frequency of the data block is effectively reduced, management operations such as recovery and the like are efficiently performed on the data block, and the storage space management load of the electronic equipment is reduced.
By way of example, the electronic device may be a mobile phone, a tablet computer, a television (also referred to as a smart television, a smart screen or a large screen device), a notebook computer, an Ultra-mobile Personal computer (Ultra-mobile Personal Computer, UMPC), a handheld computer, a camera (e.g., a digital camera), a video camera (e.g., a digital video camera), a netbook, a Personal digital assistant (Personal DIGITAL ASSISTANT, PDA), a wearable electronic device (e.g., a smart watch, a smart bracelet, a smart glasses), a vehicle-mounted device, a virtual reality device, or the like, which has a file system, and the embodiments of the present application are not limited in this respect.
For easy understanding, taking a mobile phone as an example, a hardware structure and a software system of an electronic device to which the method in the embodiment of the application is applied are schematically described.
Fig. 3 shows a schematic diagram of the structure of a mobile phone.
The handset may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, etc.
It will be appreciated that the structure illustrated in the embodiments of the present application is not limited to a specific configuration of the mobile phone. In other embodiments of the application, the handset may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (IMAGE SIGNAL processor, ISP), a controller, a memory, a video codec, a digital signal processor (DIGITAL SIGNAL processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.
The processor 110 may be used to read and execute computer readable instructions. Optionally, the processor 110 may also include a controller, an operator, and registers. The controller is mainly responsible for instruction decoding and sending out control signals for operations corresponding to the instructions. The arithmetic unit is mainly responsible for storing register operands, intermediate operation results and the like temporarily stored in the instruction execution process. Registers are high-speed memory devices of limited memory capacity that can be used to temporarily store instructions, data, and addresses.
In particular implementations, the hardware architecture of the processor 110 may be an Application SPECIFIC INTEGRATED Circuit (ASIC) architecture, a lockless pipelined microprocessor (microprocessor without interlocked PIPED STAGES, MIPS) architecture, ARM (advanced risc machines) architecture, or a Network Processor (NP) architecture, among others.
In some embodiments, the mobile phone may perform the memory management method through the processor 110 to improve the accuracy of the data lifetime prediction.
The internal memory 121 may be used to store computer executable program code including instructions. The processor 110 executes various functional applications of the electronic device and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device (e.g., audio data, phonebook, etc.), and so forth. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.
The wireless communication function of the mobile phone can be realized by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor and the like.
The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc. applied to a cell phone. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.
The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wi-Fi network, WIRELESS FIDELITY), bluetooth (BT), global navigation satellite system (global navigation SATELLITE SYSTEM, GNSS), frequency modulation (frequency modulation, FM), near field communication (NEAR FIELD communication, NFC), infrared (IR), etc. applied to a cell phone.
The handset may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.
The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The handset may receive key inputs, generating key signal inputs related to user settings and function controls of the handset.
The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration alerting as well as for touch vibration feedback.
The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.
Camera 193 is the physical camera that is actually present. In some embodiments of the application, the camera 193 of the cell phone may be multiple, for example, the multiple cameras 193 may include a front camera and a rear camera. The front camera can be one or more, and the rear camera can be one or more.
The cell phone may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display 194, an application processor, and the like. For example, an application in the handset may access the camera 193 and control the camera 193 to capture images.
The cell phone implements display functions through the GPU, the display 194, and the application processor, etc. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
The display screen 194 is used to display images, videos, and the like.
The SIM card interface 195 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 195 or removed from the SIM card interface 195 to enable contact and separation with the handset.
The software system of the mobile phone can adopt a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture or a cloud architecture. The embodiment of the invention takes a software system with a layered architecture as an example, and illustrates the software structure of a mobile phone.
Fig. 4 is a software structure block diagram of a mobile phone according to an embodiment of the present invention.
The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer, an application framework layer, a system library, and a kernel layer.
The application layer may include a series of application packages.
As shown in fig. 4, the application package may include applications for cameras, gallery, calendar, phone calls, maps, navigation, WLAN, bluetooth, music, video, short messages, etc.
The application framework layer provides an application programming interface (application programming interface, API) and programming framework for the application of the application layer. The application framework layer includes a number of predefined functions.
As shown in fig. 4, the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.
The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.
The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.
The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.
The telephone manager is used for providing communication functions of the mobile phone. Such as the management of call status (including on, hung-up, etc.).
The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.
The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.
The system library may include a plurality of functional modules. Such as surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (e.g., openGL ES), two-dimensional graphics engine (e.g., SGL), etc.
The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.
The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.
The two-dimensional graphics engine is a drawing engine for 2D drawing.
As shown in fig. 4, the kernel layer is a layer between hardware and software. The kernel layer includes at least a file system for calculating and recording lifetime information of the data blocks. In addition, the kernel layer may further include drivers not shown in fig. 4, such as a display driver, a camera driver, an audio driver, a sensor driver, and the like.
Fig. 5, 9 and 10 illustrate a file system hierarchy with a root directory containing a plurality of directories under which sub-directories are further nested, with files stored under the sub-directories. Illustratively, the root directory contains directory 1 and directory 2, directory 1 contains directory 3 and directory 4, and directory 2 contains directory 5. File 1 and file 5 are contained under directory 3, file 2 and file 3 are contained under directory 4, and file 4 is contained under directory 5. In the hierarchy, the upper level directory of a file (or directory) is referred to as the parent directory of that file (or directory). The next level of files (or directories) of a directory are called sub-files (or sub-directories) of the directory. For example, directory 1 is the parent directory of directory 3, directory 3 is the child directory of directory 1, directory 5 is the parent directory of file 4, and file 4 is the child file of directory 5.
In the application, the internal data structure of the file system of the kernel layer is expanded.
In some embodiments, the first parameter and the second parameter are added in an inode structure of the file (or directory). In the structure of the file system, a third parameter is added. The first parameter is used for recording the creation time index of the file (or directory), and the second parameter is used for recording the life cycle index of the file (or directory). The third parameter is used to record the total number of data blocks cumulatively allocated in the file system.
In some embodiments, the inode structure of the file is augmented with a data block lifetime tree structure that includes a plurality of data block lifetime nodes. Each data block lifetime node comprises a fourth parameter and a fifth parameter. The fourth parameter is used for recording the creation time index of the data block, and the fifth parameter is used for recording the prediction life index of the data block.
The electronic device may match different fourth parameters for different data blocks and/or different fifth parameters for different data blocks. The electronic device may also enable multiple data blocks to share the same fourth parameter and/or multiple data blocks to share the same fifth parameter according to actual requirements. The electronic equipment can trace back the parameters corresponding to the data blocks when the service life index of the data blocks is calculated as long as the clear corresponding relation between the data blocks and the parameters is established.
Specifically, a global block count (total_count) may be added to a super block (superblock) structure of the file system. total_count is used to accumulate the total number of data blocks allocated by the file system.
As shown in fig. 5, a directory creation timing index (c_num) and a directory life cycle index (life time) are newly added to a directory inode (dir inode) structure, and a file creation timing index (c_num) and a file life cycle index (life time) are newly added to a file inode (file inode) structure.
As shown in FIG. 5, a structure, a data block lifetime tree (blocks LIFETIME TREE), is added to the file inode structure. The data block lifetime tree includes a series of data block lifetime nodes (block lifetime node) in which parameters, a data block creation timing parameter (w_num) and a data block predicted lifetime parameter (node_lifetime), are newly added.
Specifically, a data block lifetime tree is newly added to the structure of an extension tree (extension tree), which includes a series of extension tree nodes (extension tree nodes), each of which records the length and start address of a segment of continuous data blocks (hereinafter referred to as data segments). In this way, in the newly added data block creation timing parameter and data block prediction lifetime parameter of the present application, the data block creation timing parameter structure may be used to record the earliest creation timing index of the data block in the data segment, and the data block prediction lifetime parameter structure may be used to record the average prediction lifetime index of the data block in the data segment.
The following is an exemplary process flow for executing the method in the embodiment of the present application by the mobile phone with reference to fig. 4.
For example, when an application in the application layer creates a file during execution, this status message is passed through the application framework layer and the system library to the kernel layer.
The file system of the kernel layer records the creation time index and the life cycle index of the file in the corresponding structure, wherein the life cycle index can inherit the life cycle index of the father directory. In addition, the file system records information of the data blocks in the file in the corresponding data block life tree, and records an earliest creation time index of the data blocks in the data segment and a prediction life index of the data blocks in the corresponding structure for each data block life node, wherein the prediction life index of the data blocks can inherit the life cycle index of the file to which the data blocks belong.
For another example, when an application in the application layer updates a block of data during execution, this status message is passed through the application framework layer and the system library to the kernel layer.
For the updated block, the file system of the kernel layer records the writing time in the data block creation time parameter structure, calculates an update period index according to the writing time of the updated block and the creation time index of the updated block, calculates the predicted life index of the updated block according to the update period index, and writes the predicted life index of the updated block in the corresponding structure.
In some embodiments, the update block and the updated block are data blocks in different storage locations, the data written by the update block is used for updating the data content in the corresponding updated block, and the updated block is disabled after the corresponding update block writes the data. For example, in the case where the file system is a log file system, the updated blocks and the update blocks are data blocks of different storage locations because the log file system has the property of being updated remotely. As shown in fig. 1, the data block 18a is an update block, and the data block 12 is an updated block corresponding to the data block 18 a.
In other embodiments, the data written by the update block at this time is used to update the data content in the data block at the same storage location, and the updated block is the original data block before the update.
The storage space management method can be applied to various scenes, including but not limited to a GC scene, a data distribution scene and a cache write-back scene.
For ease of understanding, in the following embodiments, application scenario one, GC scenario, is described in detail.
In a file system, a certain number of data blocks are organized into data segments, which are the basic units of storage space management. The data segment is divided into three states, namely an empty data segment, a full data segment and a dirty data segment according to the validity of the data blocks in the data segment. If the data segment is an invalid data block, the data segment is a null data segment. If the data segment is a valid data block, the data segment is a full data segment. If there are both valid and invalid data blocks within a data segment, the data segment is a dirty data segment.
Fig. 6 shows the status of the data segments before and after the update of the data blocks in the log file system. As shown in fig. 6, taking the pre-update data segment as an example, the data segment 201 and the data segment 202 are full data segments, and the data segment 203 and the data segment 204 are empty data segments. Taking the updated data segment as an example, the data segment 201a and the data segment 202a are dirty data segments, the data segment 203a is full data segments, and the data segment 204 is still empty data segments.
As the user uses, the valid data blocks gradually become invalid data blocks, the dirty data segments increase and the empty data segments decrease. The insufficient number of empty data segments has a great influence on the performance and stability of the log file system, in order to ensure the number of empty data segments, the electronic equipment periodically cleans the dirty data segments, moves the effective data in the dirty data segments to other positions, and releases the storage space occupied by the whole dirty data segments, so that continuous storage space (namely, the empty data segments are obtained) is obtained, and the process is called GC.
In the GC process, the electronic device needs to select a suitable data segment for cleaning due to a part of overhead caused by the moving of the valid data block, and the selected data segment to be cleaned is called a target data segment. The selection of the current target data segment only considers the moving overhead and does not consider the lifetime of the data block. The situation that the valid data block is invalid soon after being moved may occur, so that not only is the cost of GC increased, but also the data block is difficult to effectively manage, and a large storage space management load is generated.
By applying the storage space management method to the GC scene, the moving cost and the service life of the data block are comprehensively considered, and more proper data segments are screened out to serve as target data segments, so that the GC cost is reduced, and the storage space management load of the electronic equipment is reduced.
The electronic device may screen candidate data segments according to the moving overhead, and then calculate the predicted lifetime index of each data block by using the storage space management method of the present application, so as to obtain the average remaining lifetime index and the remaining lifetime variance index of each data block in each candidate data segment, and then select the candidate data segment with the average remaining lifetime index and the remaining lifetime variance index meeting the requirements as the target data segment.
In some embodiments, the electronic device may calculate the remaining lifetime index from the current total_count, the creation timing index of the data block, and the predicted lifetime index. For example, assuming that the total number of data blocks allocated by the file system at the time of data block creation (creation timing index) is a and the predicted lifetime index of the data block is b, when the total number of data blocks allocated by the electronic device predicted file system reaches a+b, the data block may fail. Assuming that the total number of data blocks allocated by the current file system (current total_count) is c, in this case, the electronic device calculates the remaining lifetime index of the data block as a+b-c.
After the electronic equipment obtains the residual life indexes of each data block in the candidate data segment, calculating the arithmetic average value and variance of the residual life indexes of each data block, thereby obtaining the average residual life index and residual life variance index of the candidate data segment.
In the partial mode of the GC, the electronic device may select a target data segment from the cache list. And traversing the data blocks under the condition that the cache list has no data segment to be selected, screening candidate data segments according to the moving expense, and calculating the predicted life index of each data block so as to obtain the average residual life index and the residual life variance index of each data block in each candidate data segment. The electronic device takes a data segment with an average remaining life index higher than a life index threshold and a remaining life variance index less than or equal to (not exceeding) a variance index threshold as a target data segment. And the electronic equipment takes the data segment with the average residual life index higher than the life index threshold and the residual life variance index higher than the variance index threshold as the data segment to be selected, and stores the data segment into the cache list.
In some embodiments, as shown in fig. 7, the electronic device first screens candidate data segments according to the movement overhead. And then, calculating the average residual life index and the residual life variance index of the data blocks in the candidate data segment according to the predicted life index of each data block. And comprehensively considering the average residual life index and the residual life variance index of the candidate data segments, and taking the candidate data segments with high average residual life index and small residual life variance index as target data segments of the round. And adding the candidate data segments with high average residual life indexes and large residual life variance indexes into a cache list to serve as candidate data segments. The data segment to be selected is used as a target data segment to be selected in the next round of garbage collection GC.
The residual life variance index is an important index for measuring the residual life discrete degree of the data block in the candidate data segment. In order to weaken the interference of the extreme value to the calculation result of the average residual life index, selecting the candidate data segment with smaller residual life variance index, so that the average residual life index of the candidate data segment can reflect the whole expected life of the data block more accurately. In addition, the candidate data segment with smaller residual life variance index means that the residual life of the internal data block is closer, which helps to reduce the situation that the data block is scattered and fails at different time points. And, the candidate data segment with smaller residual life variance index is easier to totally fail in similar time, thereby indirectly reducing the generation of dirty data segments.
For the candidate data segment with high average residual life index and large residual life variance index, the residual life distribution of the data blocks in the candidate data segment is more discrete, and the data blocks with low residual life possibly contain more data blocks with low residual life, and after the data blocks with low residual life are marked to fail, the moving cost is rapidly reduced in a short period. To increase efficiency, it is desirable to quickly select the target data segment for spatial reclamation. In order to avoid the performance overhead caused by full disk traversal, the selection is directly performed in the cache list. Therefore, candidate data segments with high average residual life indexes and large residual life variance indexes can be temporarily stored in the cache list to serve as data segments to be selected, and the next round of selecting target data segments can be directly selected from the data segments to be selected in the cache list, so that the recovery efficiency of the GC is improved.
In some embodiments, the electronic device may determine the movement overhead based on the number of valid data blocks in the data segment. A move overhead threshold may be set and data segments below the move overhead threshold are candidate data segments.
And comprehensively considering the average residual life index and the residual life variance index of the candidate data segments to select the target data segments. In some embodiments, a lifetime index threshold and a variance index threshold may be set, respectively, and candidate data segments with average remaining lifetime index higher than the lifetime index threshold and remaining lifetime variance index not exceeding the variance index threshold are used as target data segments for the present round. And taking the candidate data segments with the average residual life index higher than the life index threshold and the residual life variance index higher than the variance index threshold as the data segments to be selected.
For example, as shown in fig. 8, the number of valid data blocks of each of the data segments a to C is 5, which is smaller than the moving overhead threshold, and each of the data segments a to C is a candidate data segment. In fig. 8, the numbers within the valid data blocks represent the remaining life of the corresponding data blocks. The average residual life index of the candidate data segment A is 11, the average residual life index of the candidate data segment C is 11.2, and the average residual life indexes of the candidate data segments A and C are higher than the life index threshold. The average residual life index of the candidate data segment B is 8, is lower than the life index threshold value, and the candidate data segment B is excluded. In addition, the residual life variance index of the candidate data segment A is 0 and is lower than the variance index threshold, while the residual life variance index of the candidate data segment C is 33.76 and is higher than the variance index threshold.
Therefore, the average residual life index of the candidate data segment a is higher than the life index threshold, and the residual life variance index is lower than the variance index threshold, and the candidate data segment a is taken as the target data segment of the round. The average residual life index of the candidate data segment C is higher than the life index threshold, and the residual life variance index is higher than the variance index threshold, as shown in fig. 8, two data blocks will fail soon, the moving overhead after failure is obviously reduced, and the candidate data segment C is used as the candidate data segment.
In other embodiments, the electronic device may determine the target data segment from the candidate data segments based on a weighted sum evaluation method. For example, first, a weighted sum of the average remaining life index and the remaining life variance index of the candidate data segments is calculated, wherein the weight of the average remaining life index is positive to embody the preference for the data blocks with high remaining life, and the weight of the remaining life variance index may be set negative to consider the degree of dispersion of the remaining life distribution of the data blocks. The first weighted sum threshold may be set as desired, and the second weighted sum threshold may be higher than the first weighted sum threshold. And taking the candidate data segment corresponding to the weighted sum higher than the first weighted sum threshold value as the target data segment of the round, and taking the candidate data segment of which the weighted sum is higher than the second weighted sum threshold value but not exceeding the first weighted sum threshold value as the candidate data segment.
For easy understanding, in the following embodiments, the application scenario two, the data splitting scenario, is described in detail.
Data splitting aims at distributing data into different storage areas by distinguishing between the access frequency and the update frequency of the data. During data splitting, data is classified into different categories, such as hot data, warm data, and cold data. Wherein, the hot data refers to data that is frequently accessed and updated. Temperature data refers to data having an access frequency and an update frequency intermediate between hot data and cold data. Cold data refers to data that is less frequently accessed and updated.
By data splitting, it can be ensured that hot data, warm data and cold data are reasonably stored into the corresponding storage areas. Therefore, the data blocks in each storage area fail in a similar time, thereby greatly reducing GC frequency and cost and obviously reducing the storage space management load of the electronic equipment.
In some embodiments, the lifetime prediction method of the present application is applied in a data offloading scenario to further reduce the storage space management load of an electronic device.
First, the method of the application is utilized to obtain the predicted life index of each data block. And guiding data distribution according to the predicted life indexes of each data block to obtain a classification result. The classification result includes at least a hot data block and a cold data block. For example, data blocks that are predicted to have a low life expectancy are classified as hot data blocks, and data blocks that are predicted to have a high life expectancy and are not updated for a long period of time are classified as cold data blocks. And then, respectively writing the hot data block and the cold data block into corresponding storage areas.
For easy understanding, in the following embodiment, the third application scenario, the cache write-back scenario, is described in detail.
Cache write back is used to synchronize the data in the cache back into the persistent main storage medium. The cache memory is a high-speed data storage area capable of temporarily storing frequently accessed data to increase the data access speed. However, in order to ensure the durability and consistency of the data, when the data in the cache changes, the updated data needs to be written back to the main storage medium. The cache write-back timing refers to a condition for synchronizing the updated data in the cache back to the main storage medium.
In some embodiments, the lifetime prediction method of the present application is applied in a selection scenario of a cache write-back occasion. And selecting the data blocks which are preferentially written back to the main storage medium according to the predicted life index for the data blocks in the cache. For data blocks with higher predicted lifetime indicators, the electronic device performs write-back operations to synchronize data to the primary storage medium preferentially in view of the expectation that no changes will occur over a long period of time. For the data blocks with lower predicted life indexes, the data blocks are considered to be updated and lose efficacy quickly, so that temporary non-writing back to the main storage medium is selected, the cost that the data blocks which are written back to the main storage medium need to be deleted from the main storage medium and the data blocks need to be written back to the main storage medium again after the data blocks which are written back to the main storage medium lose efficacy quickly is reduced, the storage space management load is reduced, the unnecessary occupation of input/output bandwidth is reduced, the physical loss of the main storage medium is reduced, and the service life of the main storage medium is effectively prolonged.
For example, a write-back threshold may be set, and the electronic device preferentially writes back to the main storage medium the data blocks having the predicted lifetime index equal to or greater than the write-back threshold.
In each of the above application scenarios, it is necessary to manage the storage space of the file system based on the predicted lifetime index of the data block. It should be understood that the storage space management method of the present application is not limited to the application scenario listed above, and the method is equally applicable to other scenarios in which file system storage management is performed using data block prediction lifetime information, and is not limited herein.
For ease of understanding, in the following embodiments, the storage space management method of the present application is described in detail.
In some embodiments, a directory tree-dependent life prediction system is established, and the electronic device determines a predicted life index of a data block in the file system according to a life cycle index of a file to which the data block belongs. And managing the storage space according to the predicted life indexes of the data blocks.
For the initial block, the predicted lifetime index may inherit the lifecycle index of the file to which it belongs, while the initial lifecycle index of the file may inherit the lifecycle index of the parent directory. The initial block is a data block in which data is written for the first time, and is not written based on the data update requirement.
When the file is created, the electronic equipment records the creation time index and the life cycle index of the file, wherein the life cycle index inherits the life cycle index of the father catalog. When the directory is created, the electronic device records the creation time index and the life cycle index of the directory, wherein the life cycle index inherits the life cycle index of the parent directory (namely, the upper-level directory). The initial lifecycle index of the root directory may be set to a preset number of data blocks. For example, the preset number of data blocks may be a maximum number of data blocks that the file system allows to allocate for the root directory.
In some embodiments, the creation opportunity index is the global block count (total_count) of the file system at the time of creation of the file or directory. the total_count is directly related to the use condition of the storage space of the file system, and the relative position of the file or the directory in the process of increasing the storage space of the file system can be clearly known through recording the total_count when being created. The method is beneficial to analyzing the use condition and the update condition of the storage space of the file system, and is convenient for predicting the future storage space requirement. For example, as shown in fig. 9, when the file 3 is created, the electronic device records the creation timing index of the file 3 according to the global block accumulation amount at this time, the directory 4 is the parent directory of the file 3, and records the life cycle index of the file 3 according to the life cycle index of the directory 4. Can be expressed as:
;
;
wherein, the A creation timing index indicating file 3; Representing the total number of data blocks allocated by the file system of the global block cumulative number record when the file 3 is created; a lifecycle index representing file 3; representing the lifecycle index of the directory 4.
For the initial block in the file 3, the electronic device records the predicted lifetime index of the initial block according to the lifetime index of the file 3. Can be expressed as:
;
wherein, the Representing the predicted lifetime index of the initial block,Representing the lifecycle index of file 3.
For the update block, the electronic device obtains a predicted lifetime indicator from the update cycle indicator.
In some embodiments, the electronic device may calculate a weighted sum of the update period index and the historical lifetime index to obtain a predicted lifetime index for the update block. The electronic device may also calculate an average of the update cycle index and the historical lifetime index to obtain a predicted lifetime index for the update block. The electronic device may also use the update period indicator directly as a predicted lifetime indicator for the update block.
Specifically, the historical lifetime index refers to a predicted lifetime index of the block being updated. Illustratively, in the log file system, as shown in fig. 1, the data block 18a is an update block, the updated block corresponding to the data block 18a is the data block 12, and for the data block 18a, the electronic device obtains the predicted lifetime index of the data block 18a according to the weighted sum of the update cycle index and the predicted lifetime index of the data block 12. Wherein the update cycle index is the number of data blocks allocated by the log file system between the time the data block 12 is written to and marked for failure.
For example, the prediction lifetime index of the update block may be expressed as:
;
wherein, the A prediction lifetime indicator representing an update block; Representing the total number of data blocks allocated by the file system of the global block accumulation record when the update block is created; Representing the number of data blocks allocated by the file system when the updated block is created; Representing a historical lifetime indicator; The weight coefficient is represented by a number of weight coefficients, The value range of (C) is [0,1].
The update cycle index indicates the number of data blocks allocated by the file system between writing of the updated block and invalidation, and the update cycle index can reflect the current state of the file system. The historical life index represents the predicted life index of the updated block, is calculated based on the previous updating period index and the predicted life index, and reflects the stability of the related data block in the past period and the comprehensive condition of the predicted life index. Therefore, the service life of the update block is predicted by comprehensively considering the update period index and the historical service life index, and a more comprehensive and more accurate prediction result can be obtained.
In some embodiments, when a file (or a directory) is deleted, the electronic device updates the lifecycle index of the file according to the deletion time index and the creation time index of the file (or the directory) to obtain the actual lifecycle index of the file.
For example, as shown in fig. 10, the file 1 is deleted, and the electronic device updates the life cycle index of the file 1 to obtain the actual life cycle index of the file 1. The actual lifecycle index of file 1 can be expressed as:
;
wherein, the An actual lifecycle index representing file 1; Indicating the total number of data blocks allocated by the file system of the global block cumulative number record when the file 1 is deleted; A creation timing index indicating file 1.
In some embodiments, after the electronic device updates the lifecycle index of the deleted file, the lifecycle index of the corresponding parent directory is updated. The updated lifecycle index of the parent directory may be obtained based on the updated lifecycle index of the deleted file, the updated lifecycle index of the parent directory, and the number of files under the parent directory. The method can ensure that the life cycle index of the directory can be smoothly transited to a new value after the file is deleted, and the life cycle index of the father directory is updated in time, so that the accuracy of the life cycle index of the father directory is improved.
For example, as shown in fig. 10, the file 1 is deleted, and the electronic device updates the life cycle index of the file 1 and then updates the life cycle index of the directory 3. The new life cycle index of catalog 3 may be expressed as:
;
wherein, the A new life cycle index representing catalog 3,Representing the actual lifecycle index of file 1,Indicating the lifecycle index of the directory 3 before updating,Representing the number of files under directory 3. In the embodiment shown in fig. 10, the number of files under directory 3 is 2.
After the life cycle index of the father catalog is updated, a file is newly built under the corresponding father catalog, and the life cycle index of the new file inherits the new life cycle index of the father catalog. Illustratively, as shown in fig. 11, after updating the life cycle index of the directory 3, a new file 6 is created under the directory 3, and the life cycle index of the new file 6 inherits the new life cycle index of the directory 3. While file 5 is the file created before directory 3 updates the lifecycle index, so the lifecycle index of file 5 is unchanged.
The life cycle index of the file (or the directory) is updated in time and fed back to the parent directory, so that the life cycle index of the parent directory is dynamically updated. The accuracy of the life cycle index of the father directory is improved, and the initial life cycle index of the newly-built file under the father directory can be predicted based on the more accurate life cycle index of the father directory, so that the accuracy of the life prediction of the data block is improved.
In other embodiments, the creation time indicator is the specific time of creation of the file or directory, and the deletion time indicator is the specific time of deletion of the file or directory, and may be represented in a standard date and time format, such as "year-month-day: minutes: seconds". The creation timing index and the deletion timing index can also be expressed by UNIX time. The UNIX time is the total seconds from the coordinated world to the present (leap seconds are not considered). And the frequency information of how many minutes (/ seconds) a file system allocates a block of data or how many minutes (/ seconds) it takes to allocate a block of data needs to be recorded. The number of data blocks allocated by the file system between creation and deletion is convenient for subsequent calculation, namely, the actual life cycle index and the update cycle index are also convenient for calculation, so that the prediction life index is convenient for calculation.
According to the data block life prediction method, the electronic equipment can know the predicted failure time of the data block in advance by predicting the life of the data block, so that the moving and recovery strategy of the data block can be planned in advance, prospective data block management is realized, and the storage space management load of the electronic equipment is obviously reduced.
Embodiments of the present application also provide an electronic device that may include one or more processors and memory coupled to the processors, the memory for storing computer program code that, when executed by the one or more processors, causes the electronic device to perform the functions or steps described above as being performed by the method embodiments.
Embodiments of the present application also provide a chip system, as shown in FIG. 12, the chip system 1700 includes at least one processor 1701 and at least one interface circuit 1702. The processor 1701 and the interface circuit 1702 may be interconnected by wires. For example, the interface circuit 1702 may be used to receive signals from other devices (e.g., a memory of an electronic apparatus). For another example, the interface circuit 1702 may be used to send signals to other devices, such as the processor 1701. The interface circuit 1702 may, for example, read instructions stored in a memory and send the instructions to the processor 1701. The instructions, when executed by the processor 1701, may cause the electronic device to perform the various steps described in the embodiments above. Of course, the system-on-chip may also include other discrete devices, which are not particularly limited in accordance with embodiments of the present application.
The embodiment of the application also provides a computer storage medium, which comprises computer instructions, and when the computer instructions are run on the electronic device, the electronic device is caused to execute the steps in the embodiment of the method.
The present application also provides a computer program product which, when run on a computer, causes the computer to perform the steps of the method embodiments described above.
It will be apparent to those skilled in the art from this description that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and the parts displayed as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the method described in the embodiments of the present application. The storage medium includes a U disk, a removable hard disk, a Read Only Memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
The foregoing is merely illustrative of specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (20)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510328524.1A CN119847452B (en) | 2025-03-19 | 2025-03-19 | Storage space management method and electronic device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202510328524.1A CN119847452B (en) | 2025-03-19 | 2025-03-19 | Storage space management method and electronic device |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN119847452A CN119847452A (en) | 2025-04-18 |
| CN119847452B true CN119847452B (en) | 2025-09-02 |
Family
ID=95367397
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202510328524.1A Active CN119847452B (en) | 2025-03-19 | 2025-03-19 | Storage space management method and electronic device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN119847452B (en) |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115480707A (en) * | 2022-09-30 | 2022-12-16 | 三星(中国)半导体有限公司 | Method and device for data storage |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| KR20090097050A (en) * | 2008-03-10 | 2009-09-15 | 주식회사 하이닉스반도체 | Semiconductor storage system and its control method |
| US8051241B2 (en) * | 2009-05-07 | 2011-11-01 | Seagate Technology Llc | Wear leveling technique for storage devices |
| CN104572489B (en) * | 2013-10-23 | 2019-12-24 | 深圳市腾讯计算机系统有限公司 | Wear leveling method and device |
| US20170068467A1 (en) * | 2015-09-04 | 2017-03-09 | HGST Netherlands B.V. | Wear management for flash memory devices |
| US10496293B2 (en) * | 2017-03-14 | 2019-12-03 | International Business Machines Corporation | Techniques for selecting storage blocks for garbage collection based on longevity information |
-
2025
- 2025-03-19 CN CN202510328524.1A patent/CN119847452B/en active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115480707A (en) * | 2022-09-30 | 2022-12-16 | 三星(中国)半导体有限公司 | Method and device for data storage |
Also Published As
| Publication number | Publication date |
|---|---|
| CN119847452A (en) | 2025-04-18 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11947489B2 (en) | Creating snapshots of a storage volume in a distributed storage system | |
| CN115022342B (en) | Data processing method, device, electronic equipment and computer readable storage medium | |
| CN113900590A (en) | Shingled disk storage method, apparatus, device and medium | |
| CN110321321A (en) | Network control techology snapshot reading/writing method, device, equipment and storage medium | |
| CN113688139A (en) | Object storage method, gateway, device and medium | |
| CN115495020A (en) | File processing method and device, electronic equipment and readable storage medium | |
| WO2025157210A1 (en) | Thumbnail display method and electronic device | |
| JP2016515258A (en) | File aggregation for optimized file operation | |
| CN119847452B (en) | Storage space management method and electronic device | |
| CN117668319A (en) | Data query method, electronic device and storage medium | |
| US20190073270A1 (en) | Creating Snapshots Of A Storage Volume In A Distributed Storage System | |
| CN113485642A (en) | Data caching method and device | |
| CN103761194A (en) | Memory management method and device | |
| CN117369712B (en) | Garbage collection method, page storage method and electronic device | |
| CN115168298B (en) | Evaluation method and electronic equipment for file system fragmentation | |
| CN110018987B (en) | Snapshot creating method, device and system | |
| CN114490442B (en) | File fragment adjusting method and electronic equipment | |
| CN115981573A (en) | Data management method, electronic device and computer readable and writable storage medium | |
| CN116484078A (en) | Retrieval method, retrieval device, electronic equipment and computer readable storage medium | |
| CN115048035B (en) | A cache management method, device and related equipment | |
| CN119248463B (en) | Garbage recycling method, electronic device and storage medium | |
| CN117311605B (en) | Distributed storage method, data indexing method, device and storage medium | |
| CN114461405B (en) | Storage method and related device for locking page in memory | |
| CN118733482B (en) | Garbage collection method, device and storage medium for partition storage device | |
| CN120578609A (en) | File writing back method and electronic device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant |