Detailed Description
In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, and the described embodiments are merely some, rather than all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
It should be noted that the description of "first", "second", etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implying an indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
The method for dynamically enhancing the memory error correction capability of the embodiment of the invention has the main characteristics that:
(1) Acquiring the size and response time of the parity data (PARITY DATA), storing the parity data in other Memory (Memory) areas with slower access speed and larger space left, and
(2) And counting the error rate of the memory in real time, dynamically adjusting the strength of error correction codes (Error Correcting Code, ECC) according to the error rate, suspending data writing of the memory when the strength of the ECC is dynamically increased by increasing the error rate of one memory, verifying the data in the memory according to a previous ECC algorithm, calculating and storing new parity data according to a new ECC algorithm calculation, and dividing the parity data before the memory.
FIGS. 1A-1E are flowcharts illustrating steps of a method for dynamically enhancing memory error correction capability according to an embodiment of the present invention, which is applied to an electronic device. The order of the steps in the flow diagrams may be changed, and some steps may be omitted, according to different needs.
Initial stage of memory (INIT STAGE)
Referring to fig. 1A, in step S11, a memory leaf switch (LEAF SWITCH) reads memory data, e.g., a two-wire memory module (DIMM) size, age.
In step S12, the memory leaf switch performs memory training and testing to obtain ECC capability, bad information (bad part information), response Time (Response Time) of each DIMM.
In step S13, the memory leaf switch transfers the related data to the Pool Controller (Pool Controller).
In step S14, the pool controller builds a memory state table (Memory Status Table) as follows:
In step S15, the ECC capability (Level) adjustment module first boosts the DIMM without ECC protection, i.e. determines which areas (areas) need ECC boosting, and calculates the size of the parity space (PARITY SPACE) required after adjustment.
In step S16, the parity storing and determining module selects a destination area corresponding to the size of the parity space to store the parity data newly added after the ECC is lifted.
Memory writing phase (WRITE STAGE)
Referring to fig. 1B, in step S21, the Host (Host) writes the memory data into the memory pool.
In step 22, the pool controller determines to which leaf switch, e.g., the first leaf switch, the memory data is stored.
In step S23, the first leaf switch receives the memory data and stores the memory data in a corresponding space.
In step S24, the first leaf switch calculates the corresponding ECC parity according to the adjusted ECC, and returns the ECC parity to the pool controller.
In step S25, the parity storage and discrimination module in the pool controller searches for a suitable position, and stores the ECC parity in the target leaf switch.
Memory reading stage (READ STAGE)
Referring to fig. 1C, in step S31, the host requests to read a piece of data from the memory pool, and the pool controller notifies the associated leaf switch.
In step S32, she Jiaohuan reads the data, calculates the ECC parity corresponding to the data and transmits the ECC parity back to the pool controller.
In step S33, the pool controller compares the ECC parity returned by the leaf switch with the ECC parity read back stored elsewhere in the write phase, and verifies the correctness of the data.
Step S34, if the comparison is correct, the pool controller returns correct data to the host, otherwise, the ECC correction procedure is started and recorded.
Selection of storage locations for parity data
On the premise of not causing delay (Latency), the memory space with the lowest cost is found for storage.
Referring to FIG. 1D, step S41, because the size of the data will be much higher than the ECC parity size, the time for the data to pass from the DIMM to the pool controller is calculated as follows:
Data_ResponseTime=(Data from DIMM to LeafSwitch+Time from LeafSwitch to Pool Controller)。
In step S42, the parity size required for the data and the response time of the parity are calculated as follows:
Parity_Size=ECC/u Algorithm (data_size); And
Parity_ResponseTime=(Byte Data from DIMM to Switch+Time from Leaf to Pool Controller)。
In step S43, in order to reduce the overall performance impact caused by the co-location occupying the high-speed DIMM, a lower cost memory space is selected to store the data (a memory region with a slower speed and a large capacity remaining space) on the premise of not causing additional delay in the Free (Free) memory space. Lower cost memory space means slower speed, and memory areas with larger capacity space are stored preferentially, so that system delay caused by occupation of high-speed memory by a large number of parity bits is avoided.
Error monitoring and dynamic ECC boosting (ErrorMonitor ANDDYNAMIC ECCENHANCEMENT)
Referring to FIG. 1E, in step S51, the pool controller monitors and counts the error conditions of each memory and records the error conditions in a table.
In step S52, when a higher error occurs in a certain memory, it is determined that the ECC strength is improved.
In step S53, the ECC capability adjustment module calculates the parity space required for each memory after the memory is lifted.
In step S54, the pool controller notifies She Jiaohuan the memory to improve the ECC capability of the memory.
In step S55, the data is suspended from being written into the memory, and the pool controller uses the parity data to verify the correctness of the data in the memory.
In step S56, she Jiaohuan calculates new parity data according to the new ECC algorithm, and transmits the new parity data to the pool controller.
In step S57, the pool controller finds new parity data after the objective storage and promotion of ECC according to the parity storage and judgment module rule. The new leaf switch receives and stores the new parity data.
In step S58, the pool controller notifies the leaf switches storing the original parity data that the old parity area has failed and frees up space.
In step S59, the ECC capability improvement step is completed, and the memory can re-read and write data.
Fig. 2 is a schematic diagram of a hardware architecture of an electronic device according to an embodiment of the invention. The electronic device 200, such as a server, but not limited to, may communicatively couple the processor 210, the memory 220, and the system 230 with each other via a system bus to dynamically enhance memory error correction capability, fig. 2 only shows the electronic device 200 with components 210-230, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may alternatively be implemented.
The memory 220 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 220 may be an internal storage unit of the electronic device 10, such as a hard disk or a memory of the electronic device 200. In other embodiments, the memory may also be an external storage device of the electronic apparatus 200, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the electronic apparatus 200. Of course, the memory 220 may also include both an internal storage unit and an external storage device of the electronic apparatus 200. In this embodiment, the memory 220 is generally used for storing an operating system and various application software installed on the electronic device 200, such as program codes of the system 230 for dynamically enhancing memory error correction capability. In addition, the memory 220 may be used to temporarily store various types of data that have been output or are to be output.
The processor 210 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 210 is generally used to control the overall operation of the electronic device 200. In this embodiment, the processor 210 is configured to execute the program code or process data stored in the memory 220, for example, execute the system 230 for dynamically enhancing the memory error correction capability, etc.
It should be noted that fig. 2 is merely an example of the electronic device 200. In other embodiments, the electronic device 200 may also include more or fewer components, or have a different configuration of components.
FIG. 3 is a functional block diagram of an electronic device for performing a method for dynamically enhancing memory error correction capability according to an embodiment of the present invention. The method for dynamically enhancing memory error correction capability according to the embodiments of the present invention may be implemented by a computer program in a storage medium, for example, the memory 220 in the electronic device 200. When a computer program implementing the method of the present invention is loaded into the memory 220 by the processor 210, the processor 210 of the line apparatus 200 is driven to execute the method for dynamically enhancing the memory error correction capability according to the embodiment of the present invention.
The electronic device 200 of the embodiment of the invention comprises a pool controller 310, an ECC capability adjusting module 311, a parity storage discriminating module 312 and leaf switches 320 and 330.
The leaf switch 320 reads memory data, such as a dual-wire memory module (DIMM) size, age.
The leaf switch 320 performs memory training and testing to obtain the ECC capability, bad information, and response time of each DIMM.
Leaf switch 320 communicates the relevant data described above to pool controller 310.
Pool controller 310 builds a memory state table as follows:
the ECC capability adjustment module 311 first boosts the DIMM without ECC protection, i.e., determines which areas need ECC boosting, and calculates the amount of parity space needed after adjustment.
The parity storage determination module 312 selects a destination area corresponding to the size of the parity space to store the parity data newly added after ECC promotion.
Memory writing phase (WRITE STAGE)
A host (not shown) writes memory data to the memory pool.
The pool controller 310 determines whether memory data is to be stored in a leaf switch 310 or 320, such as She Jiaohuan switch 310.
Leaf switch 310 receives the memory data and stores it in the corresponding space.
The leaf switch 310 calculates the corresponding parity according to the adjusted ECC and returns the ECC parity to the pool controller 300.
The parity deposit discrimination module 312 in the pool controller 300 finds a suitable location to deposit the ECC parity to a target leaf switch, e.g., leaf switch 320.
Memory reading stage (READ STAGE)
A host (not shown) requests a read of a data from the memory pool and the pool controller 300 notifies the associated leaf switch 310 or 320, e.g., leaf switch 320.
The leaf switch 320 reads the data and calculates the ECC parity corresponding to the data and sends it back to the pool controller 300.
Pool controller 300 compares the ECC parity returned by leaf switch 320 with the ECC parity read back stored elsewhere during the write phase and verifies the correctness of the data.
If the comparison is correct, the pool controller 300 returns the correct data to the host (not shown), otherwise the ECC correction process is started and recorded.
Selection of storage locations for parity data
On the premise of not causing delay (Latency), the memory space with the lowest cost is found for storage.
Because the size of the data may be much higher than the size of the ECC parity, the parity deposit discrimination module 312 calculates the time for the data to pass from the DIMM to the pool controller as follows:
Data_ResponseTime=(Data from DIMM to LeafSwitch+Time from LeafSwitch to Pool Controller)。
The parity deposit discrimination module 312 calculates the required parity size for the data, and the response time for the parity, as follows:
Parity_Size=ECC/u Algorithm (data_size); And
Parity_ResponseTime=(Byte Data from DIMM to Switch+Time from Leaf to Pool Controller)。
To reduce the overall performance impact of the co-location occupying the high-speed DIMM, the co-location store determination module 312 selects a lower cost memory space (a memory region with a slower speed and a larger capacity of the remaining space) to store without additional delay from the Free (Free) memory space. Lower cost memory space means slower speed, and memory areas with larger capacity space are stored preferentially, so that system delay caused by occupation of high-speed memory by a large number of parity bits is avoided.
Error monitoring and dynamic ECC promotion (Error Monitor AND DYNAMIC ECCENHANCEMENT)
Pool controller 300 monitors and counts the error conditions in each memory and records the error conditions in a table.
When a certain memory has a higher error, the ECC capability adjusting module 311 determines that the ECC strength is improved.
The ECC capability adjustment module 311 calculates the parity space required by each memory after the memory is lifted.
Pool controller 300 informs leaf switch 320 to promote the ECC capability of the memory.
The data is suspended from being written into the memory, and the pool controller 300 uses the parity data to verify the correctness of the data in the memory.
Leaf switch 320 calculates new parity data according to the new ECC capability algorithm and passes it to pool controller 300.
The pool controller 300 finds new parity data after the objective storage promotion ECC according to the parity storage determination module rule. The new leaf switch receives and stores the new parity data.
Pool controller 300 notifies leaf switch 320, which stores the original parity data, that the old parity area has failed and frees up space.
The ECC capability improving step is completed, and the memory can re-read and write data.
The modules/units integrated in the electronic device 200 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include any entity or device capable of carrying the computer program code, a recording medium, a USB flash disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory, a random access memory, an electrical carrier wave signal, a telecommunication signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
It will be appreciated that the above described division of modules is merely a logical division of functions and that other divisions of implementation are possible. In addition, each functional module in the embodiments of the present application may be integrated in the same processing unit, or each module may exist alone physically, or two or more modules may be integrated in the same unit. The integrated modules may be implemented in hardware or in hardware plus software functional modules.
The method for dynamically enhancing the memory error correction capability of the embodiment of the invention has the following effects:
(1) A Root Controller (Root Controller) cooperates with the leaf switch to provide ECC protection for the memory without ECC function under the space permission;
(2) Providing proper space for storing Parity data according to the Error checking strength required by each memory to strengthen the data Error investigation capability, and
(3) When the root controller is configured to store parity data in a sufficient space region, the utilization rate of the memory and the response time are simultaneously considered, so that the impact on the efficiency is reduced, and the efficiency and the reliability are considered.
Other corresponding changes and modifications can be made by those skilled in the art in light of the practical needs generated by combining the technical scheme and the technical conception provided by the embodiment of the present invention, and all such changes and modifications are intended to fall within the scope of the claims of the present invention.