[go: up one dir, main page]

CN114816807B - Method for dynamically enhancing memory error correction capability and electronic device - Google Patents

Method for dynamically enhancing memory error correction capability and electronic device

Info

Publication number
CN114816807B
CN114816807B CN202110090549.4A CN202110090549A CN114816807B CN 114816807 B CN114816807 B CN 114816807B CN 202110090549 A CN202110090549 A CN 202110090549A CN 114816807 B CN114816807 B CN 114816807B
Authority
CN
China
Prior art keywords
memory
data
ecc
leaf switch
pool controller
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110090549.4A
Other languages
Chinese (zh)
Other versions
CN114816807A (en
Inventor
李哲毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanning Fulian Fugui Precision Industrial Co Ltd
Original Assignee
Nanning Fulian Fugui Precision Industrial Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanning Fulian Fugui Precision Industrial Co Ltd filed Critical Nanning Fulian Fugui Precision Industrial Co Ltd
Priority to CN202110090549.4A priority Critical patent/CN114816807B/en
Publication of CN114816807A publication Critical patent/CN114816807A/en
Application granted granted Critical
Publication of CN114816807B publication Critical patent/CN114816807B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/073Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a memory management context, e.g. virtual memory or cache management
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1008Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices
    • G06F11/1044Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's in individual solid state devices with specific ECC/EDC distribution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Techniques For Improving Reliability Of Storages (AREA)

Abstract

一种动态增强内存错误校正能力的方法,所述方法包括:取得并传送内存数据给池控制器;根据所述内存数据建立内存状态表;决定叶交换器中需要做ECC提升的目的区域,并且计算所述ECC提升所需的同位空间大小;及选择对应所述同位空间大小的目的区域来存放所述ECC提升后所新增的同位数据。本发明还提供一种電子裝置,针对没有ECC功能的内存,提供ECC功能,或针对ECC侦错强度不足的内存,动态提升ECC的强度。

A method for dynamically enhancing memory error correction capability, the method comprising: obtaining and transmitting memory data to a pool controller; establishing a memory status table according to the memory data; determining a destination area in a leaf switch that needs to be ECC-enhanced, and calculating the size of the co-location space required for the ECC-enhanced; and selecting a destination area corresponding to the size of the co-location space to store the newly added co-location data after the ECC-enhanced. The present invention also provides an electronic device that provides an ECC function for a memory without an ECC function, or dynamically enhances the strength of the ECC for a memory with insufficient ECC error detection strength.

Description

Method for dynamically enhancing memory error correction capability and electronic device
Technical Field
The present invention relates to computer devices, and more particularly, to a method and an electronic device for dynamically enhancing memory error correction capability.
Background
A separate Memory device DISAGGREGATED is used In the data center DATA CENTER by managing each Memory Switch (Memory Switch) by a Pool Controller, each Memory Switch forming a Memory Pool (Memory Pool) with each Dual In-line Memory Module (DIMM) under the management and user.
By means of the extra Parity data (PARITY DATA) and error correction code (Error Correcting Code, ECC) algorithms, the memory (both volatile and non-volatile) can be provided with error detection and correction capabilities, the intensity of which is related to the algorithm employed by the ECC and the Parity Size (Parity Size). In other words, ECC is a mechanism in the memory for detecting whether data is erroneous, and the algorithm can generate parity data (PARITY DATA), and the parity data can verify the correctness of the data. The correction capability (ECC Level) of Error bits (Error bits) by various ECC algorithms is different. Overall, the more bits that are detected and repaired, the more bits that are corrected, the more parity data that is required by the algorithm.
However, the split memory device includes various types of memories with different speeds, wherein some memories have ECC checking (Check) function and some do not. Memory without ECC function has no data protection function. Although the memory with ECC function has a simple data protection function, the memory does not necessarily meet the protection strength required by the current memory.
Disclosure of Invention
In view of the foregoing, it is necessary to provide a method and an electronic device for dynamically enhancing the error correction capability of a memory, which provide an ECC function for a memory without the ECC function, or dynamically enhancing the strength of ECC for a memory with insufficient ECC debug strength.
The embodiment of the invention provides a method for dynamically enhancing memory error correction capability, which is applied to an electronic device and is characterized by comprising the steps of obtaining and transmitting memory data to a pool controller, establishing a memory state table according to the memory data, determining a target area which needs ECC lifting in a leaf exchanger, calculating the size of a parity space which is needed by ECC lifting, and selecting the target area corresponding to the size of the parity space to store the newly-added parity data after ECC lifting.
The embodiment of the invention also provides an electronic device which comprises a She Jiaohuan unit for acquiring the memory data and a pool controller for acquiring the memory data from the She Jiaohuan unit and establishing a memory state table according to the memory data. The pool controller further comprises an ECC capability adjusting module for determining a target area in the leaf exchanger, which needs ECC lifting, and calculating the size of a parity space required by the ECC lifting, and a parity storage judging module for selecting the target area corresponding to the size of the parity space to store the newly added parity data after the ECC lifting.
The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed implements the steps of the method for dynamically enhancing memory error correction capability as described above.
The method and the electronic device for dynamically enhancing the Memory Error correction capability in the embodiment of the invention provide ECC protection for the Memory without ECC function, and dynamically adjust the increase and decrease of ECC intensity for each Memory according to the Error Rate (Error Rate) of each Memory so as to improve more powerful Reliability, availability and maintainability (RAS) service for a Memory Pool (Memory Pool) according to the requirement of a next generation data center.
Drawings
FIGS. 1A-1E are flow charts illustrating steps of a method for dynamically enhancing memory error correction capability according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a hardware architecture of an electronic device according to an embodiment of the invention.
FIG. 3 is a functional block diagram of an electronic device according to an embodiment of the invention.
Description of the main reference signs
Electronic device 200
Processor and method for controlling the same 210
Memory 220
System for dynamically enhancing memory error correction capability 230
Pool controller 310
ECC (error correction code) capability adjusting module 311
Parity storage discriminating module 312
Leaf exchanger 320、330
The invention will be further described in the following detailed description in conjunction with the above-described figures.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will be more clearly understood, a more particular description of the application will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It should be noted that, without conflict, the embodiments of the present application and features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, and the described embodiments are merely some, rather than all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
It should be noted that the description of "first", "second", etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implying an indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.
The method for dynamically enhancing the memory error correction capability of the embodiment of the invention has the main characteristics that:
(1) Acquiring the size and response time of the parity data (PARITY DATA), storing the parity data in other Memory (Memory) areas with slower access speed and larger space left, and
(2) And counting the error rate of the memory in real time, dynamically adjusting the strength of error correction codes (Error Correcting Code, ECC) according to the error rate, suspending data writing of the memory when the strength of the ECC is dynamically increased by increasing the error rate of one memory, verifying the data in the memory according to a previous ECC algorithm, calculating and storing new parity data according to a new ECC algorithm calculation, and dividing the parity data before the memory.
FIGS. 1A-1E are flowcharts illustrating steps of a method for dynamically enhancing memory error correction capability according to an embodiment of the present invention, which is applied to an electronic device. The order of the steps in the flow diagrams may be changed, and some steps may be omitted, according to different needs.
Initial stage of memory (INIT STAGE)
Referring to fig. 1A, in step S11, a memory leaf switch (LEAF SWITCH) reads memory data, e.g., a two-wire memory module (DIMM) size, age.
In step S12, the memory leaf switch performs memory training and testing to obtain ECC capability, bad information (bad part information), response Time (Response Time) of each DIMM.
In step S13, the memory leaf switch transfers the related data to the Pool Controller (Pool Controller).
In step S14, the pool controller builds a memory state table (Memory Status Table) as follows:
In step S15, the ECC capability (Level) adjustment module first boosts the DIMM without ECC protection, i.e. determines which areas (areas) need ECC boosting, and calculates the size of the parity space (PARITY SPACE) required after adjustment.
In step S16, the parity storing and determining module selects a destination area corresponding to the size of the parity space to store the parity data newly added after the ECC is lifted.
Memory writing phase (WRITE STAGE)
Referring to fig. 1B, in step S21, the Host (Host) writes the memory data into the memory pool.
In step 22, the pool controller determines to which leaf switch, e.g., the first leaf switch, the memory data is stored.
In step S23, the first leaf switch receives the memory data and stores the memory data in a corresponding space.
In step S24, the first leaf switch calculates the corresponding ECC parity according to the adjusted ECC, and returns the ECC parity to the pool controller.
In step S25, the parity storage and discrimination module in the pool controller searches for a suitable position, and stores the ECC parity in the target leaf switch.
Memory reading stage (READ STAGE)
Referring to fig. 1C, in step S31, the host requests to read a piece of data from the memory pool, and the pool controller notifies the associated leaf switch.
In step S32, she Jiaohuan reads the data, calculates the ECC parity corresponding to the data and transmits the ECC parity back to the pool controller.
In step S33, the pool controller compares the ECC parity returned by the leaf switch with the ECC parity read back stored elsewhere in the write phase, and verifies the correctness of the data.
Step S34, if the comparison is correct, the pool controller returns correct data to the host, otherwise, the ECC correction procedure is started and recorded.
Selection of storage locations for parity data
On the premise of not causing delay (Latency), the memory space with the lowest cost is found for storage.
Referring to FIG. 1D, step S41, because the size of the data will be much higher than the ECC parity size, the time for the data to pass from the DIMM to the pool controller is calculated as follows:
Data_ResponseTime=(Data from DIMM to LeafSwitch+Time from LeafSwitch to Pool Controller)。
In step S42, the parity size required for the data and the response time of the parity are calculated as follows:
Parity_Size=ECC/u Algorithm (data_size); And
Parity_ResponseTime=(Byte Data from DIMM to Switch+Time from Leaf to Pool Controller)。
In step S43, in order to reduce the overall performance impact caused by the co-location occupying the high-speed DIMM, a lower cost memory space is selected to store the data (a memory region with a slower speed and a large capacity remaining space) on the premise of not causing additional delay in the Free (Free) memory space. Lower cost memory space means slower speed, and memory areas with larger capacity space are stored preferentially, so that system delay caused by occupation of high-speed memory by a large number of parity bits is avoided.
Error monitoring and dynamic ECC boosting (ErrorMonitor ANDDYNAMIC ECCENHANCEMENT)
Referring to FIG. 1E, in step S51, the pool controller monitors and counts the error conditions of each memory and records the error conditions in a table.
In step S52, when a higher error occurs in a certain memory, it is determined that the ECC strength is improved.
In step S53, the ECC capability adjustment module calculates the parity space required for each memory after the memory is lifted.
In step S54, the pool controller notifies She Jiaohuan the memory to improve the ECC capability of the memory.
In step S55, the data is suspended from being written into the memory, and the pool controller uses the parity data to verify the correctness of the data in the memory.
In step S56, she Jiaohuan calculates new parity data according to the new ECC algorithm, and transmits the new parity data to the pool controller.
In step S57, the pool controller finds new parity data after the objective storage and promotion of ECC according to the parity storage and judgment module rule. The new leaf switch receives and stores the new parity data.
In step S58, the pool controller notifies the leaf switches storing the original parity data that the old parity area has failed and frees up space.
In step S59, the ECC capability improvement step is completed, and the memory can re-read and write data.
Fig. 2 is a schematic diagram of a hardware architecture of an electronic device according to an embodiment of the invention. The electronic device 200, such as a server, but not limited to, may communicatively couple the processor 210, the memory 220, and the system 230 with each other via a system bus to dynamically enhance memory error correction capability, fig. 2 only shows the electronic device 200 with components 210-230, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may alternatively be implemented.
The memory 220 includes at least one type of readable storage medium including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 220 may be an internal storage unit of the electronic device 10, such as a hard disk or a memory of the electronic device 200. In other embodiments, the memory may also be an external storage device of the electronic apparatus 200, such as a plug-in hard disk, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like, which are provided on the electronic apparatus 200. Of course, the memory 220 may also include both an internal storage unit and an external storage device of the electronic apparatus 200. In this embodiment, the memory 220 is generally used for storing an operating system and various application software installed on the electronic device 200, such as program codes of the system 230 for dynamically enhancing memory error correction capability. In addition, the memory 220 may be used to temporarily store various types of data that have been output or are to be output.
The processor 210 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 210 is generally used to control the overall operation of the electronic device 200. In this embodiment, the processor 210 is configured to execute the program code or process data stored in the memory 220, for example, execute the system 230 for dynamically enhancing the memory error correction capability, etc.
It should be noted that fig. 2 is merely an example of the electronic device 200. In other embodiments, the electronic device 200 may also include more or fewer components, or have a different configuration of components.
FIG. 3 is a functional block diagram of an electronic device for performing a method for dynamically enhancing memory error correction capability according to an embodiment of the present invention. The method for dynamically enhancing memory error correction capability according to the embodiments of the present invention may be implemented by a computer program in a storage medium, for example, the memory 220 in the electronic device 200. When a computer program implementing the method of the present invention is loaded into the memory 220 by the processor 210, the processor 210 of the line apparatus 200 is driven to execute the method for dynamically enhancing the memory error correction capability according to the embodiment of the present invention.
The electronic device 200 of the embodiment of the invention comprises a pool controller 310, an ECC capability adjusting module 311, a parity storage discriminating module 312 and leaf switches 320 and 330.
The leaf switch 320 reads memory data, such as a dual-wire memory module (DIMM) size, age.
The leaf switch 320 performs memory training and testing to obtain the ECC capability, bad information, and response time of each DIMM.
Leaf switch 320 communicates the relevant data described above to pool controller 310.
Pool controller 310 builds a memory state table as follows:
the ECC capability adjustment module 311 first boosts the DIMM without ECC protection, i.e., determines which areas need ECC boosting, and calculates the amount of parity space needed after adjustment.
The parity storage determination module 312 selects a destination area corresponding to the size of the parity space to store the parity data newly added after ECC promotion.
Memory writing phase (WRITE STAGE)
A host (not shown) writes memory data to the memory pool.
The pool controller 310 determines whether memory data is to be stored in a leaf switch 310 or 320, such as She Jiaohuan switch 310.
Leaf switch 310 receives the memory data and stores it in the corresponding space.
The leaf switch 310 calculates the corresponding parity according to the adjusted ECC and returns the ECC parity to the pool controller 300.
The parity deposit discrimination module 312 in the pool controller 300 finds a suitable location to deposit the ECC parity to a target leaf switch, e.g., leaf switch 320.
Memory reading stage (READ STAGE)
A host (not shown) requests a read of a data from the memory pool and the pool controller 300 notifies the associated leaf switch 310 or 320, e.g., leaf switch 320.
The leaf switch 320 reads the data and calculates the ECC parity corresponding to the data and sends it back to the pool controller 300.
Pool controller 300 compares the ECC parity returned by leaf switch 320 with the ECC parity read back stored elsewhere during the write phase and verifies the correctness of the data.
If the comparison is correct, the pool controller 300 returns the correct data to the host (not shown), otherwise the ECC correction process is started and recorded.
Selection of storage locations for parity data
On the premise of not causing delay (Latency), the memory space with the lowest cost is found for storage.
Because the size of the data may be much higher than the size of the ECC parity, the parity deposit discrimination module 312 calculates the time for the data to pass from the DIMM to the pool controller as follows:
Data_ResponseTime=(Data from DIMM to LeafSwitch+Time from LeafSwitch to Pool Controller)。
The parity deposit discrimination module 312 calculates the required parity size for the data, and the response time for the parity, as follows:
Parity_Size=ECC/u Algorithm (data_size); And
Parity_ResponseTime=(Byte Data from DIMM to Switch+Time from Leaf to Pool Controller)。
To reduce the overall performance impact of the co-location occupying the high-speed DIMM, the co-location store determination module 312 selects a lower cost memory space (a memory region with a slower speed and a larger capacity of the remaining space) to store without additional delay from the Free (Free) memory space. Lower cost memory space means slower speed, and memory areas with larger capacity space are stored preferentially, so that system delay caused by occupation of high-speed memory by a large number of parity bits is avoided.
Error monitoring and dynamic ECC promotion (Error Monitor AND DYNAMIC ECCENHANCEMENT)
Pool controller 300 monitors and counts the error conditions in each memory and records the error conditions in a table.
When a certain memory has a higher error, the ECC capability adjusting module 311 determines that the ECC strength is improved.
The ECC capability adjustment module 311 calculates the parity space required by each memory after the memory is lifted.
Pool controller 300 informs leaf switch 320 to promote the ECC capability of the memory.
The data is suspended from being written into the memory, and the pool controller 300 uses the parity data to verify the correctness of the data in the memory.
Leaf switch 320 calculates new parity data according to the new ECC capability algorithm and passes it to pool controller 300.
The pool controller 300 finds new parity data after the objective storage promotion ECC according to the parity storage determination module rule. The new leaf switch receives and stores the new parity data.
Pool controller 300 notifies leaf switch 320, which stores the original parity data, that the old parity area has failed and frees up space.
The ECC capability improving step is completed, and the memory can re-read and write data.
The modules/units integrated in the electronic device 200 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include any entity or device capable of carrying the computer program code, a recording medium, a USB flash disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory, a random access memory, an electrical carrier wave signal, a telecommunication signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
It will be appreciated that the above described division of modules is merely a logical division of functions and that other divisions of implementation are possible. In addition, each functional module in the embodiments of the present application may be integrated in the same processing unit, or each module may exist alone physically, or two or more modules may be integrated in the same unit. The integrated modules may be implemented in hardware or in hardware plus software functional modules.
The method for dynamically enhancing the memory error correction capability of the embodiment of the invention has the following effects:
(1) A Root Controller (Root Controller) cooperates with the leaf switch to provide ECC protection for the memory without ECC function under the space permission;
(2) Providing proper space for storing Parity data according to the Error checking strength required by each memory to strengthen the data Error investigation capability, and
(3) When the root controller is configured to store parity data in a sufficient space region, the utilization rate of the memory and the response time are simultaneously considered, so that the impact on the efficiency is reduced, and the efficiency and the reliability are considered.
Other corresponding changes and modifications can be made by those skilled in the art in light of the practical needs generated by combining the technical scheme and the technical conception provided by the embodiment of the present invention, and all such changes and modifications are intended to fall within the scope of the claims of the present invention.

Claims (10)

1.一种动态增强内存错误校正能力的方法,应用于电子装置中,其特征在于,所述方法包括:1. A method for dynamically enhancing memory error correction capability, applied to an electronic device, characterized in that the method comprises: 内存叶交换器读取内存数据,其中,所述内存数据至少包括双线内存模块DIMM大小与使用年限;The memory leaf switch reads memory data, wherein the memory data at least includes the size and service life of the dual-line memory module DIMM; 所述内存叶交换器执行内存训练与测试,以取得各个双线内存模块的数据,其中,所述数据至少包括纠错码ECC能力、坏区信息、响应时间;The memory leaf switch performs memory training and testing to obtain data of each dual-line memory module, wherein the data at least includes error correction code ECC capability, bad block information, and response time; 所述内存叶交换器传送所述内存数据与所述双线内存模块DIMM的数据给池控制器;The memory leaf switch transmits the memory data and the data of the dual-inline memory module DIMM to the pool controller; 所述池控制器根据所述内存数据与所述双线内存模块DIMM的数据建立内存状态表,其中,所述内存状态表储存所述内存叶交换器的内存数据,至少包括双线内存模块DIMM大小、分页大小、频率、目前双线内存模块DIMM保护;The pool controller establishes a memory status table according to the memory data and the data of the dual-line memory module DIMM, wherein the memory status table stores the memory data of the memory leaf switch, including at least the dual-line memory module DIMM size, the paging size, the frequency, and the current dual-line memory module DIMM protection; 决定所述内存叶交换器中需要做ECC提升的目的区域,并且计算所述ECC提升所需的同位空间大小;及Determine the target area in the memory leaf switch that needs to be ECC upgraded, and calculate the size of the co-location space required for the ECC upgrade; and 选择对应所述同位空间大小的目的区域来存放所述ECC提升后所新增的同位数据。A destination area corresponding to the size of the co-location space is selected to store the co-location data newly added after the ECC is improved. 2.如权利要求1所述的动态增强内存错误校正能力的方法,其特征在于,还包括:2. The method for dynamically enhancing memory error correction capability according to claim 1, further comprising: 主机写入所述内存数据到内存池;The host writes the memory data into the memory pool; 所述池控制器决定所述内存数据存放至第一叶交换器;The pool controller determines that the memory data is stored in the first leaf switch; 所述第一叶交换器接收所述内存数据并存放到相应的空间;The first leaf switch receives the memory data and stores it in a corresponding space; 所述第一叶交换器依照调整后的ECC算出对应的ECC同位,并回传所述ECC同位给所述池控制器;及The first leaf switch calculates the corresponding ECC parity according to the adjusted ECC, and returns the ECC parity to the pool controller; and 所述池控制器中的同位存放判别模块找寻符合的位置,把所述ECC同位存放到目标叶交换器。The co-location storage determination module in the pool controller finds the matching position and stores the ECC co-location to the target leaf switch. 3.如权利要求2所述的动态增强内存错误校正能力的方法,其特征在于,还包括:3. The method for dynamically enhancing memory error correction capability according to claim 2, further comprising: 所述主机自所述内存池读取一笔数据,所述池控制器通知第二叶交换器;The host reads a piece of data from the memory pool, and the pool controller notifies the second leaf switch; 所述第二叶交换器读取所述数据,同时算出所述数据对应的所述ECC同位并回传给所述池控制器;The second leaf switch reads the data, calculates the ECC parity corresponding to the data and transmits it back to the pool controller; 所述池控制器将所述第二叶交换器回传的所述ECC同位与在写入阶段中存放在他处的ECC同位读回做比较,并验证所述数据的正确性;及The pool controller compares the ECC parity sent back by the second leaf switch with the ECC parity read back stored elsewhere during the write phase and verifies the correctness of the data; and 若比对正确,所述池控制器回传正确所述数据给所述主机,否则启动ECC修正程序并记录。If the comparison is correct, the pool controller returns the correct data to the host, otherwise the ECC correction program is started and recorded. 4.如权利要求3所述的动态增强内存错误校正能力的方法,其特征在于,还包括:4. The method for dynamically enhancing memory error correction capability according to claim 3, further comprising: 计算所述数据由双线内存模块(DIMM)传到所述池控制器的时间;calculating the time for the data to be transferred from a dual inline memory module (DIMM) to the pool controller; 计算所述数据所需要的同位大小以及同位的响应时间;及Calculate the parity size and parity response time required for the data; and 选择较低成本的内存空间存放所述数据。Select a memory space with lower cost to store the data. 5.如权利要求4所述的动态增强内存错误校正能力的方法,其特征在于,还包括:5. The method for dynamically enhancing memory error correction capability according to claim 4, further comprising: 所述池控制器监看与统计着各内存的错误情况;The pool controller monitors and counts the error conditions of each memory; 当某个内存发生较高的错误,判别做ECC强度提升;When a certain memory has a high error rate, the ECC strength is enhanced; ECC能力调整模块计算各内存提升后所需要的同位空间;The ECC capacity adjustment module calculates the co-location space required after each memory is upgraded; 所述池控制器通知所述第一叶交换器提升其内存的ECC能力;The pool controller notifies the first leaf switch to improve the ECC capability of its memory; 数据暂停写入该内存,所述池控制器利用同位数据将所述内存内的数据做正确性验证;The data is temporarily written into the memory, and the pool controller verifies the correctness of the data in the memory using the same bit data; 所述第一叶交换器依照新的ECC能力算法,计算出新同位数据,并传所述给池控制器;The first leaf switch calculates new parity data according to the new ECC capability algorithm and transmits it to the pool controller; 所述池控制器依照所述同位存放判别模块的规则,找到所述第二叶交换器中的目的区域存放提升ECC后的所述新同位数据;及The pool controller finds the destination area in the second leaf switch to store the new parity data after the ECC is upgraded according to the rule of the parity storage determination module; and 池控制器通知所述第一叶交换器释放原先存放所述同位数据的空间区。The pool controller notifies the first leaf switch to release the space area that originally stored the co-located data. 6. 一种电子装置,其特征在于,包括:6. An electronic device, comprising: 叶交换器,用于取得内存数据,其中,所述内存数据至少包括双线内存模块DIMM大小与使用年限,同时执行内存训练与测试,以取得各个双线内存模块的数据,其中,所述数据至少包括纠错码ECC能力、坏区信息、响应时间;及A leaf switch, used to obtain memory data, wherein the memory data at least includes the size and service life of the dual-line memory module DIMM, and simultaneously performs memory training and testing to obtain data of each dual-line memory module, wherein the data at least includes error correction code ECC capability, bad block information, and response time; and 池控制器,用于自所述叶交换器取得所述内存数据与所述双线内存模块DIMM的数据,并根据所述内存数据与所述双线内存模块DIMM的数据建立内存状态表,其中,所述内存状态表储存所述叶交换器的内存数据,至少包括双线内存模块DIMM大小、分页大小、频率、目前双线内存模块DIMM保护,还包括;A pool controller, used for obtaining the memory data and the data of the dual-line memory module DIMM from the leaf switch, and establishing a memory status table according to the memory data and the data of the dual-line memory module DIMM, wherein the memory status table stores the memory data of the leaf switch, including at least the dual-line memory module DIMM size, paging size, frequency, current dual-line memory module DIMM protection, and also includes; ECC能力调整模块,用于决定所述叶交换器中需要做ECC提升的目的区域,并且计算所述ECC提升所需的同位空间大小;及An ECC capability adjustment module is used to determine a target area in the leaf switch that needs to be ECC-enhanced and calculate a size of a co-location space required for the ECC-enhanced; and 同位存放判别模块,用于选择对应所述同位空间大小的目的区域来存放所述ECC提升后所新增的同位数据。The co-location storage determination module is used to select a destination area corresponding to the size of the co-location space to store the co-location data newly added after the ECC is improved. 7.如权利要求6所述的电子装置,其特征在于,还包括:7. The electronic device according to claim 6, further comprising: 第一叶交换器;First leaf exchanger; 其中,所述池控制器决定所述内存数据存放至所述第一叶交换器,其中,主机写入所述内存数据到内存池;wherein the pool controller determines that the memory data is stored in the first leaf switch, wherein the host writes the memory data to the memory pool; 所述第一叶交换器用于接收所述内存数据并存放到相应的空间,依照调整后的ECC算出对应的ECC同位,并回传所述ECC同位给所述池控制器;及The first leaf switch is used to receive the memory data and store it in a corresponding space, calculate the corresponding ECC parity according to the adjusted ECC, and return the ECC parity to the pool controller; and 所述池控制器中的同位存放判别模块找寻符合的位置,把所述ECC同位存放到目标叶交换器。The co-location storage determination module in the pool controller finds the matching position and stores the ECC co-location to the target leaf switch. 8.如权利要求7所述的电子装置,其特征在于,还包括:8. The electronic device according to claim 7, further comprising: 第二叶交换器;Second leaf exchanger; 其中,所述主机自所述内存池读取一笔数据,所述池控制器通知所述第二叶交换器;The host reads a piece of data from the memory pool, and the pool controller notifies the second leaf switch; 所述第二叶交换器,用于读取所述数据,同时算出所述数据对应的所述ECC同位并回传给所述池控制器;The second leaf switch is used to read the data, calculate the ECC parity corresponding to the data and send it back to the pool controller; 所述池控制器将所述第二叶交换器回传的所述ECC同位与在写入阶段中存放在他处的ECC同位读回做比较,并验证所述数据的正确性;及The pool controller compares the ECC parity sent back by the second leaf switch with the ECC parity read back stored elsewhere during the write phase and verifies the correctness of the data; and 若比对正确,所述池控制器回传正确所述数据给所述主机,否则启动ECC修正程序并记录。If the comparison is correct, the pool controller returns the correct data to the host, otherwise the ECC correction program is started and recorded. 9.如权利要求8所述的电子装置,其特征在于,所述同位存放判别模块计算所述数据由双线内存模块(DIMM)传到所述池控制器的时间,计算所述数据所需要的同位大小以及同位的响应时间,及选择较低成本的内存空间存放所述数据。9. The electronic device as described in claim 8 is characterized in that the co-location storage determination module calculates the time for the data to be transmitted from the dual-in-line memory module (DIMM) to the pool controller, calculates the co-location size and co-location response time required for the data, and selects a memory space with lower cost to store the data. 10.如权利要求9所述的电子装置,其特征在于,还包括:10. The electronic device according to claim 9, further comprising: 所述池控制器监看与统计着各内存的错误情况,当某个内存发生较高的错误,判别做ECC强度提升;The pool controller monitors and counts the error conditions of each memory. When a certain memory has a high error rate, it determines to increase the ECC strength. 所述ECC能力调整模块计算各内存提升后所需要的同位空间;The ECC capacity adjustment module calculates the co-location space required after each memory is upgraded; 所述池控制器通知所述第一叶交换器提升其内存的ECC能力,数据暂停写入该内存,所述池控制器利用同位数据将所述内存内的数据做正确性验证;The pool controller notifies the first leaf switch to improve the ECC capability of its memory, suspends writing data into the memory, and the pool controller uses the parity data to verify the correctness of the data in the memory; 所述第一叶交换器依照新的ECC能力算法,计算出新同位数据,并传送所述新同位数据给池控制器;及The first leaf switch calculates new parity data according to the new ECC capability algorithm, and transmits the new parity data to the pool controller; and 所述池控制器依照所述同位存放判别模块的规则,找到所述第二叶交换器中的目的区域存放提升ECC后的所述新同位数据,及通知所述第一叶交换器释放原先存放所述同位数据的空间区。The pool controller finds the destination area in the second leaf switch to store the new co-location data after the ECC upgrade according to the rules of the co-location storage determination module, and notifies the first leaf switch to release the space area originally storing the co-location data.
CN202110090549.4A 2021-01-22 2021-01-22 Method for dynamically enhancing memory error correction capability and electronic device Active CN114816807B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110090549.4A CN114816807B (en) 2021-01-22 2021-01-22 Method for dynamically enhancing memory error correction capability and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110090549.4A CN114816807B (en) 2021-01-22 2021-01-22 Method for dynamically enhancing memory error correction capability and electronic device

Publications (2)

Publication Number Publication Date
CN114816807A CN114816807A (en) 2022-07-29
CN114816807B true CN114816807B (en) 2025-07-25

Family

ID=82524625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110090549.4A Active CN114816807B (en) 2021-01-22 2021-01-22 Method for dynamically enhancing memory error correction capability and electronic device

Country Status (1)

Country Link
CN (1) CN114816807B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7739576B2 (en) * 2006-08-31 2010-06-15 Micron Technology, Inc. Variable strength ECC
US8171380B2 (en) * 2006-10-10 2012-05-01 Marvell World Trade Ltd. Adaptive systems and methods for storing and retrieving data to and from memory cells
US8095851B2 (en) * 2007-09-06 2012-01-10 Siliconsystems, Inc. Storage subsystem capable of adjusting ECC settings based on monitored conditions
US8566539B2 (en) * 2009-01-14 2013-10-22 International Business Machines Corporation Managing thermal condition of a memory
US8862967B2 (en) * 2012-03-15 2014-10-14 Sandisk Technologies Inc. Statistical distribution based variable-bit error correction coding
US11182322B2 (en) * 2018-09-25 2021-11-23 International Business Machines Corporation Efficient component communication through resource rewiring in disaggregated datacenters

Also Published As

Publication number Publication date
CN114816807A (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN110798509B (en) Block data synchronization method, device, medium and electronic equipment
US10552288B2 (en) Health-aware garbage collection in a memory system
WO2019010044A1 (en) Method and system for mitigating write amplification in a phase change memory-based storage device
US11599481B2 (en) Error recovery from submission queue fetching errors
TWI269162B (en) Memory interleaving
US8099632B2 (en) Urgency and time window manipulation to accommodate unpredictable memory operations
US20100115190A1 (en) System and method for processing read request
CN101377748B (en) Method for verifying read-write function of storage device
CN110286853B (en) Data writing method and device and computer readable storage medium
KR20190122057A (en) Memory controller and memory system having the same
US20080215911A1 (en) Storage device capable of meeting the reliability and a building and data writing method thereof
CN111737161B (en) File transmission method, terminal and storage medium of Flash memory
US20070277016A1 (en) Methods and apparatus related to memory modules
US7152129B2 (en) Apparatus having an inter-module data transfer confirming function, storage controlling apparatus, and interface module for the apparatus
CN114816807B (en) Method for dynamically enhancing memory error correction capability and electronic device
US20080195837A1 (en) Data access method, channel adapter, and data access control device
CN113377278A (en) Solid state disk, garbage recycling and controlling method, equipment, system and storage medium
CN115934420A (en) Data recovery method, system, device and medium based on distributed storage
US6904547B2 (en) Method and apparatus for facilitating validation of data retrieved from disk
US20030079101A1 (en) Automatic adjustment of host protected area by BIOS
CN114237983A (en) Data backup method and device
EP4439301A1 (en) Memory device and module life expansion
US20030226090A1 (en) System and method for preventing memory access errors
CN117012267A (en) Verification method, controller and medium for UFS written data
CN112148220B (en) Method, device, computer storage medium and terminal for realizing data processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant