[go: up one dir, main page]

CN114297096B - Data retention period prediction method and related components - Google Patents

Data retention period prediction method and related components

Info

Publication number
CN114297096B
CN114297096B CN202111679426.0A CN202111679426A CN114297096B CN 114297096 B CN114297096 B CN 114297096B CN 202111679426 A CN202111679426 A CN 202111679426A CN 114297096 B CN114297096 B CN 114297096B
Authority
CN
China
Prior art keywords
data
retention period
error
data retention
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111679426.0A
Other languages
Chinese (zh)
Other versions
CN114297096A (en
Inventor
王岩
杨亚飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Dapu Microelectronics Co Ltd
Original Assignee
Shenzhen Dapu Microelectronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Dapu Microelectronics Co Ltd filed Critical Shenzhen Dapu Microelectronics Co Ltd
Priority to CN202111679426.0A priority Critical patent/CN114297096B/en
Publication of CN114297096A publication Critical patent/CN114297096A/en
Application granted granted Critical
Publication of CN114297096B publication Critical patent/CN114297096B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种数据保存期的预测方法及相关组件,在确定了样本存储数据区域中的N个采样存储单元后,首先基于各个第一采样存储单元的最小单元存储数据的错误情况来建立各个第一采样存储单元对应的第一错误混淆矩阵,通过对各个第一错误混淆矩阵执行机器学习操作,以获得数据保存期预测模型,从而在期望对待测存储数据区域的数据保存期进行预测时,利用数据保存期预测模型对待测存储数据区域的数据保存期进行预测。可见,本申请中先通过样本存储数据区域建立数据保存期预测模型,以便待测存储数据区域均能够根据数据保存期预测模型进行数据保存期的预测,从而根据预测的数据保存期对待测存储数据区域存储的数据进行处理,避免存储数据的丢失。

The present invention discloses a data retention period prediction method and related components. After determining N sampling storage units in a sample storage data area, firstly, based on the error situation of the minimum unit storage data of each first sampling storage unit, a first error confusion matrix corresponding to each first sampling storage unit is established. By performing machine learning operations on each first error confusion matrix, a data retention period prediction model is obtained. Thus, when it is desired to predict the data retention period of the storage data area to be tested, the data retention period prediction model is used to predict the data retention period of the storage data area to be tested. It can be seen that in this application, a data retention period prediction model is first established based on the sample storage data area, so that the data retention period of each storage data area to be tested can be predicted according to the data retention period prediction model, and the data stored in the storage data area to be tested is processed according to the predicted data retention period to avoid the loss of stored data.

Description

Data retention period prediction method and related components
Technical Field
The invention relates to the field of data processing, in particular to a data retention period prediction method and related components.
Background
When the solid state disk is used for storing data, NAND FLASH (NAND flash memory) is usually used for storing data, and the principle of writing and erasing data is actually the process of charging and discharging the medium carrier of the storage unit. Over time, the previously written electrons in NAND FLASH particles may be lost, or may be accidentally powered up, resulting in the stored data being altered, and thus causing errors in the data reading. Thus, the data has a data retention period (i.e., without performing a flash refresh operation, during which the data can be successfully read by ECC (Error CHECKING AND correction) Error correction) techniques) in the storage. The data retention period is different for different particles, even for different storage locations on the same particle, and varies with the use state, such as oxidation effects caused by NAND FLASH materials used, resulting in a reduced data retention period.
Due to the limitation of data retention period in the memory, in the memory product using flash memory as a medium, the data cannot be permanently stored after one-time writing, and the data retention period is usually preset in the prior art, and the data is ensured not to be lost according to the preset data retention period. Specifically, when the storage time of the stored data reaches or is about to reach the preset data storage period, the data of the storage position is refreshed or moved to a new storage position in time so as to refresh the data storage period of the data, and the risk of data reading failure caused by further error of the data due to electronic loss or injection of the data storage position is prevented.
However, in practical application, if the preset data retention period is too small, the data will be frequently moved and rewritten, the workload and power consumption of the memory will be increased, thereby affecting the performance of the memory, and if the preset data retention period is too large, the risk of data loss will be increased, thereby affecting the security of the data. Therefore, the prior art more selects shorter preset data retention period, and ensures the safety of data even if the workload and the power consumption of the memory are increased, but even if the workload and the power consumption are additionally increased, the abnormal condition that data is lost when some data does not exceed the preset data retention period under special conditions can not be avoided because the preset data retention period can not realize the dynamic and differential setting and the like.
In summary, how to accurately judge and predict the data retention period is a problem to be solved by those skilled in the art.
Disclosure of Invention
The invention aims to provide a data retention period prediction method and related components, wherein a data retention period prediction model is established through a sample storage data area, so that the storage data area to be detected can predict the data retention period according to the data retention period prediction model, and the data stored in the storage data area to be detected is processed according to the predicted data retention period, so that the loss of the storage data is avoided.
In order to solve the technical problems, the invention provides a data retention period prediction method, which comprises the following steps:
determining N first sampling storage units in a sample storage data area, wherein N is a positive integer;
Establishing a first error confusion matrix corresponding to each first sampling storage unit based on the error condition of the data stored by the minimum unit of each first sampling storage unit;
Performing machine learning on each of the first confusion matrices to obtain a data retention period prediction model;
And predicting the data retention period of the data area to be tested by using the data retention period prediction model.
Preferably, performing machine learning on each of said first confusion matrices to obtain a data retention period prediction model, comprising:
Establishing a machine learning model based on each first error confusion matrix;
the machine learning model is trained using each of the first confusion matrices to obtain the data retention period prediction model.
Preferably, the machine learning model comprises a regression model or a neural network model.
Preferably, the establishing a first error confusion matrix corresponding to each first sampling storage unit based on the error condition of the data stored by the minimum unit of each first sampling storage unit includes:
Performing error classification on data errors possibly occurring in the minimum unit based on the change relation of the minimum unit programming state of each first sampling storage unit;
counting the error condition of the minimum unit storage data in each first sampling storage unit according to the error classification so as to obtain the error counting number of each type of error corresponding to each first sampling storage unit;
And establishing a corresponding first error confusion matrix based on the error statistics of each type of errors corresponding to each first sampling storage unit.
Preferably, predicting the data retention period of the data area to be measured by using the data retention period prediction model includes:
Determining M second sampling storage units in the to-be-detected storage data area, wherein M is a positive integer;
establishing a second error confusion matrix corresponding to each second sampling storage unit based on the error condition of the data stored by the minimum unit of each second sampling storage unit;
And inputting the second error confusion matrix into the data retention period prediction model for calculation so as to obtain the data retention period of the to-be-measured storage data area.
Preferably, the establishing a second error confusion matrix corresponding to each second sampling storage unit based on the error condition of the data stored by the minimum unit of each second sampling storage unit includes:
performing error classification on data errors possibly occurring in the minimum unit based on the change relation of the minimum unit programming state of each second sampling storage unit;
counting the error condition of the minimum unit storage data in each second sampling storage unit according to the error classification so as to obtain the error counting number of each type of error corresponding to each second sampling storage unit;
And establishing a corresponding second error confusion matrix based on the error statistics of each type of errors corresponding to each second sampling storage unit.
Preferably, after establishing the first error confusion matrix corresponding to each first sampling storage unit based on the error condition of the data stored by the minimum unit of each first sampling storage unit, the method further includes:
normalizing the first error confusion matrix of each first sampling storage unit to determine a third error confusion matrix;
performing machine learning on each of the first confusion matrices to obtain a data retention period prediction model, comprising:
Machine learning is performed on each of the third confusion matrices to obtain a data retention period prediction model.
Preferably, the normalizing process is performed on the first confusion matrix of each first sampling storage unit, and after determining a third confusion matrix, the normalizing process further includes:
performing nonlinear mapping processing on each element in the third error confusion matrix to generate a fourth error confusion matrix;
performing machine learning on each of said third confusion matrices to obtain a data retention period prediction model, comprising:
Machine learning is performed on each of the fourth confusion matrices to obtain a data retention period prediction model.
Preferably, before predicting the data retention period of the data area to be measured by using the data retention period prediction model, the method further comprises:
Judging whether a patrol instruction or a read instruction for the to-be-detected storage data area is received or not;
if yes, the step of predicting the data retention period of the data area to be tested by using the data retention period prediction model is carried out.
Preferably, after predicting the data retention period of the data area to be measured by using the data retention period prediction model, the method further comprises:
judging whether the data retention period of the to-be-measured storage data area is smaller than a preset retention period threshold value or not;
and if so, rewriting or transferring the data stored in the to-be-detected storage data area to a standby storage position.
In order to solve the above technical problems, the present invention provides a data retention period prediction system, including:
A determining unit, configured to determine N first sampling storage units in a sample storage data area, where N is a positive integer;
the matrix establishing unit is used for establishing a first error confusion matrix corresponding to each first sampling storage unit based on the error condition of the data stored by the minimum unit of each first sampling storage unit;
A model acquisition unit for performing machine learning on each of the first error confusion matrices to obtain a data retention period prediction model;
and the prediction unit is used for predicting the data retention period of the data area to be measured by using the data retention period prediction model.
In order to solve the above technical problems, the present invention provides a data retention period prediction apparatus, including:
A memory for storing a computer program;
a processor for implementing the steps of the data retention period prediction method as described above when executing the computer program.
After N sampling storage units in a sample storage data area are determined, first establishing first error confusion matrixes corresponding to all first sampling storage units based on the error condition of the minimum unit storage data of all first sampling storage units, and performing machine learning operation on all first error confusion matrixes to obtain a data storage life prediction model, so that when the data storage life of the storage data area to be detected is expected to be predicted, the data storage life of the storage data area to be detected is predicted by using the data storage life prediction model. Therefore, the data retention period prediction model is established through the sample storage data area, so that the storage data area to be detected can predict the data retention period according to the data retention period prediction model, and the data stored in the storage data area to be detected is processed according to the predicted data retention period, so that the loss of the storage data is avoided.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required in the prior art and the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for predicting data retention period according to the present invention;
FIG. 2 is a schematic diagram of a prior art data retention period;
FIG. 3 is a schematic diagram of a TLC-based memory cell programming state definition according to the present invention;
FIG. 4 is a schematic diagram of a first confusion matrix according to the present invention;
FIG. 5 is a schematic diagram of the machine learning model according to the present invention as SVR model;
FIG. 6 is a schematic diagram of a machine learning model according to the present invention;
FIG. 7 is a schematic diagram of a third confusion matrix according to the present invention;
FIG. 8 is a diagram of a fourth confusion matrix according to the present invention;
FIG. 9 is a flow chart of another embodiment of a method for predicting data retention period according to the present invention;
FIG. 10 is a schematic diagram of a data retention period prediction system according to the present invention;
fig. 11 is a schematic structural diagram of a data retention period prediction device provided by the invention.
Detailed Description
The core of the invention is to provide a data retention period prediction method and related components, wherein a data retention period prediction model is established through a sample storage data area, so that the storage data area to be detected can predict the data retention period according to the data retention period prediction model, and the data stored in the storage data area to be detected is processed according to the predicted data retention period, so that the loss of the storage data is avoided.
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The applicant considers that the data stored in each storage data area in the memory has its own corresponding data retention period, and when the data is stored in the storage data area until the end of the own data retention period is approached, the probability of error or loss of the data increases correspondingly, therefore, the data retention period of the data needs to be predetermined, so that the data is rewritten or stored in a new storage position before the time of data storage reaches the data retention period, and the reliable storage period of the data is prolonged.
However, in the prior art, the data retention period of each storage data area is manually set, and the setting of the data retention period includes two implementation methods, namely, a statistical threshold method is adopted, that is, a relatively safe time threshold is set according to multiple experiment statistics, and the data retention period is set as a unified given data retention period of the storage data area of all storage positions. Obviously, the set time threshold cannot be too long (for example, the consumer-level flash memory is generally set to 180 days, the enterprise level is set to 90 days or even shorter), for the storage data area with a shorter actual data retention period, the shorter given data retention period can avoid data loss, for the storage data area with a longer actual data retention period, other treatments such as data transfer or overwriting can be frequently performed, the workload of the memory is increased, and if the confidence of 99.5% or higher is to be ensured in this way, the actual data retention period and the given data retention period of at least 99.5% of the storage data area have a larger difference, as shown in fig. 2, and fig. 2 is a schematic diagram of the data retention period in the prior art, and in addition, the fixed given data retention period cannot ensure that the data retention period also changes due to the change of the storage medium of the memory with the use state.
Another way is to express the relationship between data retention period and wear or time of use using a relational expression, such as a first or second order relational expression. Although the problem that the storage period of the fixed given data cannot be changed along with the change of the environment can be partially alleviated, the problem of differentiation between different storage media of different memories is still not solved, and the difference change of the storage period of the data of different storage data areas cannot be adapted.
Referring to fig. 1, fig. 1 is a flow chart of a data retention period prediction method provided by the present invention, where the method includes:
S11, determining N first sampling storage units in a sample storage data area, wherein N is a positive integer;
In order to solve the above-mentioned technical problem, the present application predicts the data retention period of the storage data area, specifically, N first sampling storage units in the sample storage data area are determined first, and it should be noted that, as a preferred embodiment, when determining the plurality of first sampling storage units in the sample storage data area, the first and second storage pages or data blocks in the sample storage data block may be set as the first sampling storage units, but not limited to.
For a storage data area containing a plurality of storage units (for example, one storage data area contains a plurality of storage data blocks, each storage data block comprises a plurality of storage data pages, and each storage data page is one storage unit), an error confusion matrix of the plurality of storage units can be obtained, and a new matrix or a three-dimensional array is spliced by using a plurality of error confusion matrices. For example, current 96-layer 3DTLC (Trinary-LEVEL CELL, three-level memory cell) flash memory is typically formed by stacking two 48-layer stacks, and data errors are often apparent in the first and last two layers of a 48-layer stack, so that the stored data pages in the four layers 1, 48, 49 and 96 in the selectable stored data area are used as sampling memory cells, and the corresponding at least 4 error confusion matrices can be at least connected into a new 16×16 matrix or a 3-dimensional array of 8×8×4 to perform machine learning or prediction of data retention period. Of course, more layers of data pages may be selected as the first sampling memory unit, and the details thereof are omitted herein.
S12, establishing a first error confusion matrix corresponding to each first sampling storage unit based on the error condition of the minimum unit storage data of each first sampling storage unit;
The applicant considers that each memory cell includes a plurality of minimum cells in which different data is stored, and errors of the stored data occur due to the loss or injection of electrons in each minimum cell, and since the data stored in the memory cells are all in 0/1 binary form, the stored data can be classified into two types when changed (i.e., 0/1 inverted) or 1 inverted to 0, and thus, the number of bits of errors of the data stored in the minimum cells, the number of 0 inverted to 1 or the number of 1 inverted to 0 can be used as a data error index to respond to a policy for managing the data stored in the storage data area, for example RBER (Raw Bit Error Rate) is generally used as an important index for evaluating the severity of the data error in the storage data area.
However, with the development of flash memory in recent years, MLC (Multi-LEVEL CELL, multi-level memory cell), TLC, or even QLC (quad-LEVEL CELL, four-level memory cell) has been developed from SLC (Single-LEVEL CELL, single-level memory cell) and the like, and the smallest cell (cell) of each memory data area can store a plurality of bits of information. For example, taking a 3D TLC flash memory as an example, the smallest unit of each storage data area can store 3 bits of information, as shown in fig. 3, fig. 3 is a schematic diagram of a TLC memory cell programming state definition according to the present invention. In TLC, the interval of charge of a memory cell is divided into 8 segments, that is, the minimum cell may have 8 program states, different memory charge intervals represent different program states, that is, ER, A, B, C, D, E, F and G in fig. 3 correspond to the program states of a minimum cell, respectively, where each program state corresponds to one code of 3'b000 to 3' b111, respectively, so that a minimum cell of a TLC flash memory may store 3 bits of information.
When each minimum unit can store a plurality of bits of information, if the number of 0/1 inversions is continuously adopted as a main or unique data error evaluation index, error metering is too coarse, error variability of data cannot be reflected, and even an unscientific evaluation conclusion is obtained. For example, the programming state of the minimum cell in fig. 3 is D, i.e., the corresponding stored data is 3'b010, and there is only one bit encoding difference from each of the C-state (3' b 000), the E-state (3 'b 011) and the a-state (3' b 110), but when a data error occurs, the probability that the D-state changes to the C-state or the D-state changes to the E-state is much greater than that the D-state transitions to the a-state, and in fact the probability that the D-state changes to the C-state is also greater than that the D-state changes to the E-state (because the D-state to the C-state is electrons lost and the D-to the E-state requires additional electron injection). In other words, while there is only 1 bit encoded 0/1 inversion from D-state to C-state, D-state to E-state, or D-state to A-state, it is apparent that among the above three errors, the error that transitions from D-state to A-state is the most severe type, followed by the case of transition from D-state to E-state, and the case of transition from D-state to C-state with the lowest error severity.
Therefore, in order to fully reflect the difference of data errors of new flash media such as MLC or TLC, the application can use transition between different programming states represented by the change of the storage capacity of the minimum unit to replace the original simple error classification of 0/1 inversion. For example, the change in the stored charge state from D-state to C-state, E-state, or A-state in the above example represents three different error categories, here named D-C, D-E and D-A, respectively. It is easy to deduce that there are a total of 56 types of errors in TLC according to the definition above. For another example, the same principle may be used to define that there are 12 types of errors in the MLC altogether, and will not be described here.
Based on this, when the memory cell is collected as the first sampling memory cell to determine the first error confusion matrix, the first error confusion matrix may be specifically established according to the number of errors corresponding to each type of errors that may occur in the first sampling memory cell, where, still taking TLC as an example, the first error confusion matrix is as follows:
Wherein e ij represents the number of errors converted from the programming state i to the programming state j, the diagonal element may be set to 0, that is, e ii =0, where the programming state ER is abbreviated as R, please refer to fig. 4, fig. 4 is a schematic diagram of the first error confusion matrix provided by the present invention, fig. 4 illustrates the first error confusion matrix in a table format by taking TLC as an example, 56 types of errors existing in TLC may be divided into three major types, that is, 1/2/3-bit error, where 1,2,3 represent the number of 0/1 inversions occurring in 3-bit data in a minimum unit, and the severity of three major types of errors of 1/2/3-bit error is sequentially emphasized, and the color of the figure is gradually deepened. It can be seen that, when the first error confusion matrix of each first sampling storage unit is generated based on the error condition of the data stored by the minimum unit of each first sampling storage unit, if the corresponding error type occurs in the data stored by the minimum unit of the first sampling storage unit, the characteristic value of the first error confusion matrix is changed accordingly.
S13, performing machine learning on each first error confusion matrix to obtain a data retention period prediction model;
and S14, predicting the data retention period of the data area to be tested by using the data retention period prediction model.
And performing machine learning on the determined first error confusion matrix of each first sampling storage unit to obtain a data retention period prediction model, and after determining the data retention period prediction model, whether a storage data area to be detected is a sample storage data area or not, predicting the data retention period of the storage data area to be detected through the data retention period prediction model to determine the predicted data retention period of the storage data block to be detected, wherein the machine learning model can be but is not limited to an SVR (support vector regression ) model, a CNN (Convolutional Neural Network, convolutional neural network) model, an FCN (Fully Convolutional Network, fully connected network) model, a decision tree, a random forest or a logistic regression lamp model, wherein the SVR model is used for predicting the data retention period by establishing and solving an objective function meeting constraint, converting the first error confusion matrix into a vector characteristic value, and inputting the vector characteristic value into the SVR model, such as establishing a Lagrange function, introducing a process of solving a relaxation variable and weighting and converting a problem solving offset to obtain an objective function such as f (x) =w.x+b.
Referring to fig. 5 and 6, fig. 5 is a schematic diagram of the machine learning model provided by the present invention when the machine learning model is an SVR model, and fig. 6 is a schematic diagram of the machine learning model provided by the present invention when the machine learning model is a CNN model.
For the CNN model, taking LeNet-5 type as an example, 7 layers are used in total when predicting the data retention period of a data area to be measured, wherein the first layer and the third layer are convolution layers, the purpose of convolution operation is to extract different characteristic values input into a first error confusion matrix, ensure parameter sharing, the second layer and the fourth layer are pooling layers, keep robustness when rotating and translating after inputting the characteristic values, reduce calculation dimensions, the fifth layer and the sixth layer are full-connection layers and are used for mapping the distributed characteristic representation obtained by calculation to a low-dimensional vector space, and the seventh layer is an activation function layer and is used for mapping the low-dimensional vector space characteristic representation obtained by calculation to a sample mark space by using a Softmax function so as to realize prediction of the data retention period.
The method can be divided into two stages in actual use, namely a first stage of sample sampling, model building, model training and parameter tuning, and a second stage of predicting by using a data retention period prediction model for training and optimizing, and providing reference basis for storage management strategies (the data retention periods of other storage data areas can be predicted according to the data retention period prediction model obtained by training).
In addition, in order to further ensure the training effect of the data retention period prediction model, various sample proportions need to be controlled in the determining process of the first sampling storage unit, and the distribution balance is paid attention to, for example, according to the actual retention period of the residual data of the sampled first sampling storage unit, the data retention period is divided into multiple classes by taking 10 days as one segment, and the proportion balance of each class of samples is ensured.
As a preferred embodiment, in performing machine learning on each first confusion matrix to obtain a data retention period prediction model, the machine learning model may be, but is not limited to, built based on each first confusion matrix, and the machine learning model may be trained using each first confusion matrix to obtain the data retention period prediction model.
When the data retention period prediction model is determined, a machine learning model can be established according to each first error confusion matrix, so that the machine learning model is trained, the machine learning model can deduce the error condition of data stored in a to-be-detected storage data area, the data retention period prediction model is obtained, and the data retention periods of different to-be-detected storage data areas can be predicted conveniently.
As a preferred embodiment, when the data retention period of the data area to be measured is predicted by using the data retention period prediction model, the method specifically includes but is not limited to determining M second sampling storage units in the data area to be measured, wherein M is a positive integer, establishing a second error confusion matrix corresponding to each second sampling storage unit based on the error condition of the minimum unit storage data of each second sampling storage unit, and inputting the second error confusion matrix into the data retention period prediction model for calculation so as to obtain the data retention period of the data area to be measured.
When the data retention period of the to-be-measured storage data area is determined, a plurality of second sampling storage units are determined in the to-be-measured storage data area, second error confusion matrixes corresponding to the second sampling storage units are obtained, a new matrix or a three-dimensional array is spliced by using the error confusion matrixes, and after the newly spliced matrix or the three-dimensional array is input into the data retention period prediction model, the data condition of the to-be-measured storage data area can be deduced, so that the data retention period of the to-be-measured storage data area is determined. For the method of splicing a plurality of error confusion matrices into a new matrix or three-dimensional array, please refer to the above method embodiment, and the disclosure is not repeated here.
As a preferred embodiment, the second error confusion matrix corresponding to each second sampling storage unit is established based on the error condition of the minimum unit storage data of each second sampling storage unit, which can specifically but not be limited to performing error classification on the data errors possibly occurring in the minimum unit based on the change relation of the minimum unit programming state of each second sampling storage unit, counting the error condition of the minimum unit storage data in each second sampling storage unit according to the error classification so as to obtain the error statistics number of each type of errors corresponding to each second sampling storage unit, and establishing the corresponding second error confusion matrix based on the error statistics number of each type of errors corresponding to each second sampling storage unit.
In this embodiment, when predicting the data retention period of the data area to be stored based on the data retention period prediction model, the process of determining the second error confusion matrix is specifically, for example, the process of determining the first error confusion matrix.
In summary, the data retention period prediction model is established through the sample storage data area, so that the storage data area to be detected can predict the data retention period according to the data retention period prediction model, and the data stored in the storage data area to be detected is processed according to the predicted data retention period, so that the loss of the storage data is avoided.
Based on the above embodiments:
as a preferred embodiment, after establishing the first error confusion matrix corresponding to each first sampling storage unit based on the error condition of the data stored by the minimum unit of each first sampling storage unit, the method further includes:
normalizing the first error confusion matrix of each first sampling storage unit to determine a third error confusion matrix;
performing machine learning on each of the first confusion matrices to obtain a data retention period prediction model, comprising:
Machine learning is performed on each of the third confusion matrices to obtain a data retention period prediction model.
The applicant considers that the base numbers of the statistics of the first error confusion matrix are different when the sizes of the first sampling storage units are different, so that the application adopts a data normalization method to normalize the values of the elements in the first error confusion matrix into a unified value space, namely:
Where τ is the number of minimum units in the first sample storage unit. E 'is a third confusion matrix and E' ij is each eigenvalue in the third confusion matrix.
Referring to fig. 7, fig. 7 is a schematic diagram of a third confusion matrix according to the present invention. Note that, the color of each eigenvalue in the third error confusion matrix at the corresponding position in fig. 7 indicates the high-low error ratio, and the color depth indicates the high error ratio, otherwise, the error ratio is low. There is no correspondence with the color definition of the three major classes of errors of fig. 4.
As a preferred embodiment, the normalizing process is performed on the first confusion matrix of each first sampling storage unit, and after determining a third confusion matrix, the normalizing process further includes:
Performing nonlinear mapping processing on each element in the third error confusion matrix to generate a fourth error confusion matrix;
performing machine learning on each of said third confusion matrices to obtain a data retention period prediction model, comprising:
Machine learning is performed on each of the fourth confusion matrices to obtain a data retention period prediction model.
In addition, the applicant considers that when the cardinality of the first sampling storage unit is larger, the value of the eigenvalue in the third error confusion matrix is far smaller than 1, and because the data retention period prediction model established later obtains the optimal parameter value in a gradient descending mode, in order to prevent the problem of gradient disappearance during model parameter adjustment or more favorable accurate transfer of gradient, the application carries out nonlinear mapping processing on the third error confusion matrix, namely:
E′′=[e′′ij]8×8,i,j∈{R,A,B,C,D,E,F,G};
where E "is a fourth confusion matrix, E" ij is each eigenvalue in the fourth confusion matrix, α is an amplification factor, and usually 1 is taken, but when some data formats are calculated using pure decimal numbers with low accuracy (e.g., fp16, using 16-bit half-precision floating point format) or integer data formats, it is often necessary to further project the data into an impure decimal value interval, where α may be 128, 256, etc., which is not a limitation of the present application.
Referring to fig. 8, fig. 8 is a schematic diagram of a fourth confusion matrix according to the present invention. Note that, the color of each eigenvalue in the fourth error confusion matrix at the corresponding position in fig. 8 indicates the high error ratio, and the color depth indicates the high error ratio, otherwise, the error ratio is low. There is no correspondence with the color definition of the three major classes of errors of fig. 4.
As a preferred embodiment, before predicting the data retention period of the to-be-measured storage data area by using the data retention period prediction model, the method further includes:
Judging whether a patrol instruction or a read instruction for a data storage area to be detected is received or not;
if yes, the step of predicting the data retention period of the data area to be tested by using the data retention period prediction model is carried out.
In this embodiment, before determining the data retention period of the to-be-measured storage data area, it is necessary to determine whether a patrol instruction or a read instruction for the to-be-measured storage data area is received, if the current system wants to patrol the to-be-measured storage data area or read data, the data retention period of the to-be-measured storage data area is predicted, so as to avoid resource waste and occupation of system resources caused by repeated prediction of the data retention period of the to-be-measured storage data area, and the data retention period is predicted when patrol or read is performed on the to-be-measured storage data area, so that the current data retention period can be predicted in time, and the validity of the processing decision of the to-be-measured storage data area is ensured.
As a preferred embodiment, after predicting the data retention period of the to-be-measured storage data area by using the data retention period prediction model, the method further comprises:
judging whether the data retention period of the to-be-measured storage data area is smaller than a preset shelf life threshold value or not;
if yes, the data stored in the data storage area to be tested is rewritten or transferred to the standby storage position.
In this embodiment, after determining the predicted data retention period of the to-be-measured storage data area, when processing the data in the to-be-measured storage data area, it is first determined whether the predicted data retention period is smaller than a preset retention period threshold, if so, the probability of error occurrence of the data stored in the to-be-measured storage data area is greater, and at this time, the data stored in the to-be-measured storage data area is rewritten or transferred to a spare storage position to update the data retention period of the data stored in the to-be-measured storage data area, thereby avoiding errors of the data stored in the to-be-measured storage data area.
It should be noted that, when transferring the data stored in the to-be-measured storage data area to the standby storage location, the data stored in the to-be-measured storage data area may be transferred to a new storage location, that is, the standby storage location, before the data stored in the to-be-measured storage data area is stored until the data shelf life of the data is reached by the garbage collection operation.
In addition, if the storage period of the predicted data is not less than the preset shelf life threshold, the data stored in the data storage area to be detected is safer without transferring, the sending time of a next inspection instruction or a reading instruction can be set according to the storage period of the predicted data, the safety of the data storage area to be detected is improved, and meanwhile, the workload of a system in transferring the data is reduced. Referring to fig. 9, fig. 9 is a flowchart illustrating another embodiment of a data retention period prediction method according to the present invention. After the sending time of the next inspection instruction or the reading instruction is set, the data retention period of the data area to be tested can be predicted after the inspection instruction or the reading instruction is received again.
Referring to fig. 10, fig. 10 is a schematic structural diagram of a prediction system for data retention period according to the present invention, where the system includes:
A determining unit 101, configured to determine N first sample storage units in a sample storage data area, where N is a positive integer;
a matrix establishing unit 102, configured to establish a first error confusion matrix corresponding to each first sampling storage unit based on an error condition of the data stored in the minimum unit of each first sampling storage unit;
A model acquisition unit 103 for performing machine learning on each of the first error confusion matrices to obtain a data retention period prediction model;
and the prediction unit 104 is used for predicting the data retention period of the data area to be tested by using the data retention period prediction model.
For an introduction to the data retention period prediction system provided by the present invention, please refer to the above method embodiment, and the disclosure is not repeated here.
Referring to fig. 11, fig. 11 is a schematic structural diagram of a data retention period prediction apparatus according to the present invention, where the apparatus includes:
A memory 111 for storing a computer program;
a processor 112 for implementing the steps of the data retention period prediction method as described above when executing a computer program.
For an introduction to the data retention period prediction apparatus provided by the present invention, please refer to the above method embodiment, and the disclosure is not repeated here.
The computer-readable storage medium of the present invention stores a computer program which, when executed by a processor, performs the steps of the data retention period prediction method described above.
For the description of the computer-readable storage medium provided by the present invention, refer to the above method embodiments, and the disclosure is not repeated here.
It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for predicting data retention period, comprising:
determining N first sampling storage units in a sample storage data area, wherein N is a positive integer;
Establishing a first error confusion matrix corresponding to each first sampling storage unit based on the error condition of the data stored by the minimum unit of each first sampling storage unit;
Performing machine learning on each of the first confusion matrices to obtain a data retention period prediction model;
predicting the data retention period of the data area to be detected by using the data retention period prediction model;
The establishing a first error confusion matrix corresponding to each first sampling storage unit based on the error condition of the minimum unit storage data of each first sampling storage unit includes:
Determining a change relation of a programming state in a minimum unit of each first sampling storage unit based on the change of the storage electric quantity of each bit in the minimum unit so as to perform error classification on data errors possibly occurring in the minimum unit;
counting the error condition of the minimum unit storage data in each first sampling storage unit according to the error classification so as to obtain the error counting number of each type of error corresponding to each first sampling storage unit;
Establishing a corresponding first error confusion matrix based on the error statistics of each type of errors corresponding to each first sampling storage unit;
After predicting the data retention period of the data area to be tested by using the data retention period prediction model, the method further comprises the following steps:
judging whether the data retention period of the to-be-measured storage data area is smaller than a preset retention period threshold value or not;
If yes, rewriting or transferring the data stored in the to-be-detected storage data area to a standby storage position;
if not, the data stored in the data storage area to be detected is reserved.
2. The method of claim 1, wherein performing machine learning on each of the first confusion matrices to obtain a data retention period prediction model comprises:
Establishing a machine learning model based on each first error confusion matrix;
the machine learning model is trained using each of the first confusion matrices to obtain the data retention period prediction model.
3. The method of predicting data retention period of claim 2, wherein the machine learning model comprises a regression model or a neural network model.
4. The method of claim 1, wherein predicting the data retention period of the data region to be tested using the data retention period prediction model comprises:
Determining M second sampling storage units in the to-be-detected storage data area, wherein M is a positive integer;
establishing a second error confusion matrix corresponding to each second sampling storage unit based on the error condition of the data stored by the minimum unit of each second sampling storage unit;
And inputting the second error confusion matrix into the data retention period prediction model for calculation so as to obtain the data retention period of the to-be-measured storage data area.
5. The method for predicting data retention period of claim 4, wherein said establishing a second error confusion matrix corresponding to each of said second sample storage units based on an error condition of a minimum unit storage data of each of said second sample storage units comprises:
performing error classification on data errors possibly occurring in the minimum unit based on the change relation of the minimum unit programming state of each second sampling storage unit;
counting the error condition of the minimum unit storage data in each second sampling storage unit according to the error classification so as to obtain the error counting number of each type of error corresponding to each second sampling storage unit;
And establishing a corresponding second error confusion matrix based on the error statistics of each type of errors corresponding to each second sampling storage unit.
6. The method of claim 1, further comprising, after creating a first error confusion matrix corresponding to each of the first sample storage units based on an error condition of the minimum unit storage data of each of the first sample storage units:
normalizing the first error confusion matrix of each first sampling storage unit to determine a third error confusion matrix;
performing machine learning on each of the first confusion matrices to obtain a data retention period prediction model, comprising:
Machine learning is performed on each of the third confusion matrices to obtain a data retention period prediction model.
7. The method of claim 6, wherein normalizing said first error confusion matrix for each of said first sample memory locations, after determining a third error confusion matrix, further comprises:
performing nonlinear mapping processing on each element in the third error confusion matrix to generate a fourth error confusion matrix;
performing machine learning on each of said third confusion matrices to obtain a data retention period prediction model, comprising:
Machine learning is performed on each of the fourth confusion matrices to obtain a data retention period prediction model.
8. The method of claim 1, wherein predicting the data retention period of the data region to be measured using the data retention period prediction model further comprises:
Judging whether a patrol instruction or a read instruction for the to-be-detected storage data area is received or not;
if yes, the step of predicting the data retention period of the data area to be tested by using the data retention period prediction model is carried out.
9. A data retention period prediction system, comprising:
A determining unit, configured to determine N first sampling storage units in a sample storage data area, where N is a positive integer;
the matrix establishing unit is used for establishing a first error confusion matrix corresponding to each first sampling storage unit based on the error condition of the data stored by the minimum unit of each first sampling storage unit;
A model acquisition unit for performing machine learning on each of the first error confusion matrices to obtain a data retention period prediction model;
the prediction unit is used for predicting the data retention period of the data area to be measured by using the data retention period prediction model;
the matrix establishing unit is specifically configured to determine a change relation of a programming state in a minimum unit of each first sampling storage unit based on a change of a storage power of each bit in the minimum unit, so as to perform error classification on data errors possibly occurring in the minimum unit; the error condition of the data stored in the minimum unit in each first sampling storage unit is counted according to the error classification so as to obtain the error statistics number of each type of error corresponding to each first sampling storage unit;
The prediction system is further used for judging whether the data retention period of the to-be-measured storage data area is smaller than a preset retention period threshold value after the prediction unit predicts the data retention period of the to-be-measured storage data area by using the data retention period prediction model, if so, rewriting or transferring the data stored in the to-be-measured storage data area to a standby storage position, and if not, reserving the data stored in the to-be-measured storage data area.
10. A data retention period prediction apparatus, comprising:
A memory for storing a computer program;
A processor for implementing the steps of the method for predicting data retention period according to any one of claims 1 to 8 when executing said computer program.
CN202111679426.0A 2021-12-31 2021-12-31 Data retention period prediction method and related components Active CN114297096B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111679426.0A CN114297096B (en) 2021-12-31 2021-12-31 Data retention period prediction method and related components

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111679426.0A CN114297096B (en) 2021-12-31 2021-12-31 Data retention period prediction method and related components

Publications (2)

Publication Number Publication Date
CN114297096A CN114297096A (en) 2022-04-08
CN114297096B true CN114297096B (en) 2025-09-19

Family

ID=80974638

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111679426.0A Active CN114297096B (en) 2021-12-31 2021-12-31 Data retention period prediction method and related components

Country Status (1)

Country Link
CN (1) CN114297096B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947588A (en) * 2019-02-22 2019-06-28 哈尔滨工业大学 A NAND Flash Bit Error Rate Prediction Method Based on Support Vector Regression
CN111863109A (en) * 2020-07-08 2020-10-30 上海威固信息技术股份有限公司 Three-dimensional flash memory interlayer error rate model and evaluation method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8806111B2 (en) * 2011-12-20 2014-08-12 Fusion-Io, Inc. Apparatus, system, and method for backing data of a non-volatile storage device using a backing store
US9405618B2 (en) * 2014-05-28 2016-08-02 Infineon Technologies Ag Marker programming in non-volatile memories
US10679718B2 (en) * 2017-10-04 2020-06-09 Western Digital Technologies, Inc. Error reducing matrix generation
CN111913830B (en) * 2020-08-18 2024-03-19 深圳大普微电子科技有限公司 A reread operation processing method, device, equipment and readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947588A (en) * 2019-02-22 2019-06-28 哈尔滨工业大学 A NAND Flash Bit Error Rate Prediction Method Based on Support Vector Regression
CN111863109A (en) * 2020-07-08 2020-10-30 上海威固信息技术股份有限公司 Three-dimensional flash memory interlayer error rate model and evaluation method

Also Published As

Publication number Publication date
CN114297096A (en) 2022-04-08

Similar Documents

Publication Publication Date Title
US11321636B2 (en) Systems and methods for a data storage system
US11049009B2 (en) Identifying memory block write endurance using machine learning
US11783185B2 (en) Analysis of memory sub-systems based on threshold distributions
US11474748B2 (en) Compound feature generation in classification of error rate of data retrieved from memory cells
US11704178B2 (en) Estimating a bit error rate of data stored by a memory subsystem using machine learning
US20240312530A1 (en) Classification of Error Rate of Data Retrieved from Memory Cells
KR101981355B1 (en) Soft information generation for memory systems
CN115699188A (en) Recurrent neural network for identifying threshold voltages to be used in reading of flash memory devices
US12436702B2 (en) Hybrid wear leveling for in-place data replacement media
US11862274B2 (en) Determination of state metrics of memory sub-systems following power events
CN113703681B (en) Hard disk management method and device, hard disk equipment and storage medium
US11341036B2 (en) Biased sampling methodology for wear leveling
CN111078123A (en) Method and device for evaluating wear degree of flash memory block
CN114297096B (en) Data retention period prediction method and related components
US12131065B2 (en) Memory device overhead reduction using artificial intelligence
CN110837477A (en) A storage system wear leveling method and device based on life prediction
US20240331777A1 (en) Cascade model for determining read level voltage offsets
WO2021041073A1 (en) Memory sub-system grading and allocation
CN111767165A (en) Data processing method and device and control equipment
US11853558B2 (en) Power down workload estimation
US20240330717A1 (en) Machine learning-based adjustment of memory configuration parameters
US20250138996A1 (en) Model based error avoidance
US20250316324A1 (en) Quick power on block family error avoidance scan
US20230307037A1 (en) Clustering for read thresholds history table compression in nand storage systems
US20250061930A1 (en) Truncated Resolution for Time Sliced Computation of Multiplication and Accumulation using a Memory Cell Array

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: 518000 3501, venture capital building, No. 9, Tengfei Road, huanggekeng community, Longcheng street, Longgang District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Dapu Microelectronics Co.,Ltd.

Address before: 518000 3501, venture capital building, No. 9, Tengfei Road, huanggekeng community, Longcheng street, Longgang District, Shenzhen, Guangdong Province

Applicant before: SHENZHEN DAPU MICROELECTRONICS Co.,Ltd.

Country or region before: China

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant