CN101751373A - Configurable multi-core/many core system based on single instruction set microprocessor computing unit - Google Patents
Configurable multi-core/many core system based on single instruction set microprocessor computing unit Download PDFInfo
- Publication number
- CN101751373A CN101751373A CN200810203778A CN200810203778A CN101751373A CN 101751373 A CN101751373 A CN 101751373A CN 200810203778 A CN200810203778 A CN 200810203778A CN 200810203778 A CN200810203778 A CN 200810203778A CN 101751373 A CN101751373 A CN 101751373A
- Authority
- CN
- China
- Prior art keywords
- core
- processing unit
- microprocessor
- instruction set
- many
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 claims abstract description 146
- 238000000034 method Methods 0.000 claims description 29
- 230000008569 process Effects 0.000 claims description 27
- 230000005055 memory storage Effects 0.000 claims description 14
- 230000008901 benefit Effects 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 7
- 230000009471 action Effects 0.000 claims description 2
- 238000013461 design Methods 0.000 abstract description 4
- 238000009826 distribution Methods 0.000 abstract description 4
- 230000010354 integration Effects 0.000 abstract 1
- 230000005540 biological transmission Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000013332 literature search Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000011079 streamline operation Methods 0.000 description 1
Images
Landscapes
- Advance Control (AREA)
Abstract
The invention belongs to integration circuit design field, in particular to a configurable multi-core/many core system based on single instruction set microprocessor computing unit, which comprises a microprocessor core, a processing unit, an arbiter, a DMA controller, an input distributor and an output distributor. The microprocessor core is used for executing the operation system and the non-intense computing part of the application program and is also used for configuring the processing unit; the processing unit is used for executing one or multiple intense computing operations; the processing unit is connected with the arbiter through an input distributor and an output distributor, and the arbiter is used for arbitrating the access requests of the processor core and the processing unit on the external memory. The microprocessor and the processing unit contain the subsets of the same instruction set, and adopt the same complier to compile. The invention effectively solves the problems of difficult workload distribution, low external memory access efficiency and complicated software development and debugging due to different compile systems in the multi-core system.
Description
Technical field
The present invention relates to the integrated circuit (IC) design field, relate in particular to a kind of configurable multi-core/many-core of single instruction set system based on microprocessor core, microprocessor arithmetic element.
Background technology
Along with informationalized universal, growth in the living standard, consumer wants is integrated into more function with in a electronic product, make and in many application, have Based Intelligent Control and digital signal processing capability simultaneously, as digital TV set-top box, both needed to have user's operation correctly, reacted fast, and needed again sound, vision signal are carried out high-efficiency decoding.
Microprocessor (CPU) can be supported operating system, can move general application program, and can be according to routine access, handle instruction and data at random, can well finish irregular operations such as redirect, but to the processing power of flow datas such as sound, video relatively a little less than.And digital signal processor (DSP) just in time in contrast, has very strong stream processing power.Therefore, integrated CPU nuclear and DSP examine in a chip, and are undertaken can bringing into play the advantage of two kinds of processors simultaneously alternately by some interconnection strategies.The exploitation of the integrated acceleration personal communicator by two kinds of nuclears, smart phone, radio network product simultaneously can simplified design, reduces the board level system complexity, reduces the cost of power consumption and total system.
The fusion of CPU and DSP can solve had both needed Based Intelligent Control, needed to flow the demand of processing again, yet more senior user interface and application program have proposed higher parallel processing requirement to embedded system.The simple performance that adopts the method that increases DSP nuclear can further improve embedded type CPU, but there is certain limitation in this method.The program that still exists in a large number necessary serial to carry out in the present digital signal processing algorithm can't well be cut apart.After DSP nuclear reached some, performance just can't promote along with the increase of nuclear volume again.In addition, continuous lifting along with semiconductor fabrication process, the frequency of operation of chip internal has been much higher than the frequency of operation of its external memory storage, and a plurality of DSP examine the external memory storage that conducts interviews simultaneously and cause the problem of access external memory waits for too long also to become a big bottleneck of system for restricting performance.
Find through literature search prior art, the structure that existing abroad mature C PU and DSP merge, as Leonardo da Vinci's series processors of TIX (TI), requiring has two kinds of different compiling systems, compiles CPU and DSP instruction set separately respectively.The Imagine processor of Stanford Univ USA has then adopted the two-stage compile mode: stream level (stream) compiling and core stage (Kernel) compiling, move on host CPU and Imagine respectively.No matter two kinds of compiling systems still are the two-stage compile mode, all can increase the complicacy of applied software development and the difficulty of software debugging greatly.
Aspect polycaryon processor,, all be simply the micro-processor kernel of two or more complexity to be superimposed with the polycaryon processor of Intel (Intel) and Advanced Micro Devices Inc. (AMD) typical case the most.Each microprocessor core all includes buffer memory in the polycaryon processor, be to solve data cached consistency problem in the different IPs, must the frequent exchange data between the microprocessor core, reduced the efficient of whole polycaryon processor.Although can carry out concurrent operation between a plurality of kernels, as a complete unit, can't realize real stream line operation.In addition, aspect software, for polycaryon processor, the thought that needs multiple programming just might make full use of resource, yet mostly operating system is to average distribution with symmetrical manner to the distribution and the not change of essence of management of resource.And, most softwares do not fully take into account the ruuning situation of double-core and even multinuclear, cause the time of thread mean allocation and the time of the swap data between the thread all can increase greatly, especially when thread needed repeatedly access memory, loss in efficiency was even more serious.
Summary of the invention
The present invention is directed to the deficiencies in the prior art, a kind of configurable multi-core/many-core of single instruction set system based on microprocessor core, microprocessor arithmetic element has been proposed, in order to solve difficult workload distribution in the multiple nucleus system, access external memory efficient is low and uses software development that different compiling systems bring and the complexity problem of debugging.
The present invention realizes by the following technical solutions:
Of the present invention based on the single instruction set microprocessor arithmetic element configurable multi-core/many-core system, comprise microprocessor core, processing unit, moderator, dma controller, input divider and output divider, wherein: microprocessor core, be used for executive operating system and application program non-dense set arithmetic section, and be responsible for configuration processing unit; Processing unit, be used for carrying out intensive computing multiplication, add/subtraction, take advantage of add/subtract, add up, be shifted, extraction, swap operation, each processing unit finishes one or more multiplication, add/subtraction, take advantage of add/subtract, add up, be shifted, extraction, swap operation; Moderator is used to arbitrate microprocessor core and the processing unit request of access to external memory storage; Dma controller is used for the storer of direct configuration process unit; The input divider is used for giving different processing units with the data allocations of taking from moderator; The output divider is used for obtaining data and being sent to moderator from different processing units.
The instruction set that microprocessor core of the present invention and processing unit comprise all is subclass of same instruction set, can be by same compiler compiling; After finishing the compiling of application program, use same back compiler to carry out machine code and decompose, determine the action and the load of each processing unit.
Microprocessor core of the present invention comprises arithmetic element, register file, programmable counter, interrupt/exception controller, memory management unit and buffer memory.
Processing unit of the present invention comprises arithmetic element, register file, local command memory, local data memory and programmable counter.
Of the present invention based on the single instruction set microprocessor arithmetic element configurable multi-core/many-core system in the number of processing unit be unfixed, concrete number can be expanded according to practical application request.
Configurable multi-core/many-core system based on the single instruction set microprocessor arithmetic element of the present invention, it is characterized in that described microprocessor core some processing units of arranging in pairs or groups, when carrying out, at first by microprocessor core operation system, processing unit is in closed condition, and microprocessor core can require to open local command memory and local data memory in processing unit, configuration process unit connection relation, the configuration process unit according to application program; Institute's art microprocessor core also can require to open processing unit, configuration process unit connection relation according to application program, and the configuration dma controller is by local command memory and local data memory in the dma controller configuration process unit.
Interconnection realizes flexibly by network-on-chip between processing unit of the present invention, and level connects relation before and after all can constituting between any two processing units.
Microprocessor core configuration process of the present invention unit can require structure connected in series of all processing units one-tenth connected in series is handled intensive computing according to application program; Also can require the processing unit grouping is connected into a plurality of structures connected in series respectively according to application program, different structures connected in series can the different intensive computings of parallel processing, also can the identical intensive computing of parallel processing; Also can require the processing unit grouping is connected into a plurality of structures connected in series respectively according to application program, the wherein different intensive computings of part structure parallel processing connected in series, afterwards again with other intensive computings of other structures connected in series processing connected in series.Annexation between processing unit of the present invention can be adjusted according to the requirement of application program in run duration any time.
In the structure that each a plurality of processing unit of the present invention are formed by connecting, the input data of first processing unit are taken from external memory storage by moderator, input divider, the result of last processing unit sends external memory storage back to by output divider, moderator, the input data of all the other each processing units are taken from the previous stage processing unit, output data is sent to back one-level processing unit, described previous stage processing unit with afterwards be connected by FIFO between the one-level processing unit.
Of the present invention based on the single instruction set microprocessor arithmetic element configurable multi-core/many-core system in processing unit be connected with moderator by input divider and output divider, microprocessor core, import divider and export divider and be connected with external memory storage by moderator.
Of the present invention based on the single instruction set microprocessor arithmetic element configurable multi-core/many-core system in the number of processing unit do not fix, can determine according to application demand.
The beneficial effect that the present invention has is: can be according to the characteristics of intensive computing class application program, by the annexation of microprocessor core configuration process unit, local command memory and local data memory, obtain being fit to the coenocytism of application-specific, and in compilation process, whole intensive computing equilibriums are assigned to different processing units, alleviate the working load of each processing unit; Each processing unit can be carried out many instructions, has alleviated access external memory waits for too long, inefficient problem; Simultaneously, the instruction set that microprocessor core, processing unit comprise all is subclass of same instruction set, can alleviate the workload of software development by same compiler compiling.
Description of drawings
Fig. 1 be proposed by the invention based on the single instruction set microprocessor arithmetic element configurable multi-core/many-core system architecture.
Fig. 2 is microprocessor core structural drawing among the present invention.
Fig. 3 is processing unit structural drawing among the present invention.
Fig. 4 gives an example for data transmission structure between processing unit among the present invention.
Fig. 5 gives an example one for the structure that processing unit among the present invention is connected to form.
Fig. 6 gives an example two for the structure that processing unit among the present invention is connected to form.
Fig. 7 is the process flow diagram of configurable multi-core/many-core system configuration, execution among the present invention.
Fig. 8 comprises the graph of a relation of instruction set for microprocessor core among the present invention and processing unit.
Embodiment
Below in conjunction with accompanying drawing embodiments of the invention are elaborated.Present embodiment is implemented according to technical scheme of the present invention, but protection scope of the present invention is not limited to following embodiment.
Technical thought of the present invention is to the structure serial or parallel of processing unit formation connected in series or string and mixes connection, in order to handle intensive computing.At first by microprocessor core operation system and application program, according to the instruction configuration process unit in the application program, open or close processing unit, determine annexation between processing unit, dispose the local command memory and the local data memory of each processing unit again, the notifier processes unit begins to handle intensive computing afterwards.After finishing above-mentioned configuration, the irrelevant program of intensive computing that microprocessor core enters waiting status or proceeds to carry out with processing unit, and require the notifier processes unit to stop to handle intensive computing according to program at any time, or reconfigure processing unit, carry out new intensive computing.
The number of processing unit is unfixed in the configurable multi-core/many-core system that the present invention proposes, and concrete number can be expanded according to practical application request.Be embodiment with the system that comprises four processing units below.This embodiment that comprises four processing units implements according to technical scheme of the present invention, but protection scope of the present invention is not limited to the following embodiment that comprises four processing units.
In Fig. 1, of the present invention based on the single instruction set microprocessor arithmetic element configurable multi-core/many-core system comprise microprocessor core (201), processing unit (301), moderator (101), dma controller (104), input divider (102) and output divider (103), wherein:
Microprocessor core (201) is used for executive operating system and application program non-dense set arithmetic section, and is responsible for the configuration to processing unit.Configuration comprises configuration to the processing unit annexation, the configuration and the notifier processes unit of data in data and the local data memory in the local command memory in the processing unit is begun to carry out intensive computing or finish intensive computing.
Processing unit (301), be used for carrying out intensive computing multiplication, add/subtraction, take advantage of add/subtract, add up, be shifted, extraction, swap operation.Each processing unit can only finish a multiplication, add/subtraction, take advantage of add/subtract, add up, be shifted, extraction, swap operation, but as a rule, each processing unit finish many multiplication, add/subtraction, take advantage of add/subtract, add up, be shifted, extraction, swap operation.
Moderator (101) is used to arbitrate microprocessor core and the processing unit request of access to external memory storage.
Dma controller (104) is used for the storer of direct configuration process unit.
Input divider (102) is used for giving different processing units with the data allocations of taking from moderator.
Output divider (103) is used for obtaining data and being sent to moderator from different processing units.
Adopt processing unit only with the input divider with export divider and be connected in the configurable multi-core/many-core system that the present invention proposes, microprocessor core (201), input divider (102) and output divider (103) are connected with external memory storage by moderator (101), by the request of access of moderator (101) judgement multinuclear to external memory storage, simple in structure, remove complicated interconnect architecture from, reduced the number of external pin.Whole intensive computing equilibriums are assigned to different processing units, alleviate the working load of each processing unit.All processing unit is divided into two groups, in every group except that the data output of input of the data of first order processing unit and afterbody processing unit respectively with input divider (102) with export divider (103) is connected, all processing units all join end to end, reduce the number of the processing unit of access external memory simultaneously, alleviated the performance bottleneck that a large amount of nuclear while access external memory are brought in the multi-core/many-core structure.Each processing unit is carried out many operations, then data need a plurality of cycles from entering processing unit to output, the problem that the chip internal frequency of operation is higher than the access external memory waits for too long that the external memory storage frequency of operation causes has been alleviated in the corresponding increase of cycle space-number between twice peek.
In Fig. 2, microprocessor core (201) comprises arithmetic element, register file (205), programmable counter (206), interrupt/exception controller (208), memory management unit (207) and buffer memory.Arithmetic element is made up of multiplier-divider (203) and arithmetic and logic unit (204).Buffer memory is made up of Instructions Cache (202) and metadata cache (209).
In Fig. 2, microprocessor core (201) comprises arithmetic element, register file (205), programmable counter (206), interrupt/exception controller (208), memory management unit (207) and buffer memory.Arithmetic element is made up of multiplier-divider (203) and arithmetic and logic unit (204).Buffer memory is made up of Instructions Cache (202) and metadata cache (209).
In Fig. 3, processing unit (301) comprises arithmetic element (303), register file (205), local command memory (302), local data memory (304) and programmable counter (206).Arithmetic element (303) strengthens to some extent based on the arithmetic element of microprocessor core (201), except that all arithmetic logical operations and multiplication and division computing that microprocessor core (201) is supported, has increased the instruction that is fit to extensive density data computing.By microprocessor core (201) or dma controller (104) configuration, local command memory (302) is used for the instruction that storage processing unit (301) will be used.By microprocessor core (201) or dma controller (104) configuration, local data memory (304) is used for storage processing unit (301) with data or the parameter used.Local command memory (302), local data memory (304) are different with metadata cache (209) with the Instructions Cache (202) in the microprocessor core (201), wherein content is not the subclass of external memory storage content, therefore do not have the data consistency problem, reduced the complexity of integrated circuit (IC) design.
Fig. 4 has provided a kind of example of data transmission structure between processing unit.The two paths of data of previous stage processing unit (301) is sent to back one-level processing unit (301) by two groups of FIFO (401).Each processing unit (301) all has the independent F IFO supporting with it (401).As can be seen, the number that is connected to the previous stage processing unit of same back one-level processing unit can be arbitrarily, adopts this structure can increase the dirigibility of multi-core/many-core system configuration.
In the configurable multi-core/many-core system that the present invention proposes between processing unit interconnection realize flexibly that by network-on-chip level connects relation before and after all can constituting between any two processing units.Two kinds of typical structures that interconnect between processing unit have been provided for example with the system that comprises four processing units below.These two kinds of typical structures are implemented according to technical scheme of the present invention, but protection scope of the present invention is not limited to two kinds of following typical structures.
Fig. 5 has provided a kind of example that interconnects between processing unit.At first, four processing units (301) are divided into first group (501) and second group (502), and every group of inside all is configured to connected in series.The data that dispose first group (501) then output to the data input of second group (502), finally constitute the multiple nucleus system of complete serial.
Fig. 6 has provided the another kind of example that interconnects between processing unit.At first, four processing units (301) are divided into first group (601), second group (602) and the 3rd group (603).First group of (601) internal configurations processing unit (301) connects for parallel, it is connected in series to dispose first group (601) and second group (602) then, the data that dispose second group (602) again output to the data input of the 3rd group (603), and the multiple nucleus system that connects is gone here and there, also mixed to final formation fully.
A kind of startup, detection, configuration flow have been provided below.This flow process is implemented according to technical scheme of the present invention, but protection scope of the present invention is not limited to following flow process.
In Fig. 7, at first in step 1 (701) by microprocessor core (201) operation system or application program, and in step 2 (702) detecting operation system or application program, whether customized configuration instruction arranged.As do not have the customized configuration instruction, then return step 1 (701) and continue operation system or application program; Instruct if any customized configuration, then enter the annexation of step 3 (703) according to configuration instruction configures processing unit (301), enter step 4 (704) again and dispose local command memory of each processing unit and local data memory successively, and enter local command memory that step 5 (705) detects all processing units whether and local data memory and all disposed and finish.Finish as not disposing, then return step 4 (704), finish as configuration, then enter step 6 (706) notifier processes unit (301) and begin intensive computing, enter step 7 (707) microprocessor core (201) afterwards and continue to carry out the program that other intensive computings of carrying out with processing unit (301) have nothing to do.Wherein, can directly dispose and detect local command memory of each processing unit and local data memory by microprocessor core (201) when step 4 (704) and step 5 (705), also can directly dispose and detect local command memory of each processing unit and local data memory by DMA (104) controller by behind microprocessor core (201) the configuration dma controller (104).
Fig. 8 comprises the graph of a relation of instruction set for microprocessor core of the present invention (201) and processing unit (301).Wherein, complete instruction set (801) comprises three parts: first (802), second portion (803) and third part (804).First (802) comprises and the relevant instruction of control, as unusually, be absorbed in, configuration-direct; Second portion (803) comprises all operational orders of middle multiplier (203) of microprocessor core (201) and arithmetic and logic unit (204) support; Third part (804) comprises all operational orders except that second portion (803) instruction that arithmetic element (303) is supported in the processing unit (301).The instruction set subclass that microprocessor core (201) is supported is the intersection of first (802) and second portion (803); The instruction set subclass that processing unit (301) is supported is the intersection of second portion (803) and third part (804).
Claims (12)
1. the configurable multi-core/many-core system based on the single instruction set microprocessor arithmetic element is characterized in that comprising microprocessor core, processing unit, and moderator, input divider and output divider, wherein:
Microprocessor core is used for executive operating system and application program non-dense set arithmetic section, and is responsible for the configuration to processing unit;
Processing unit, be used for carrying out intensive computing multiplication, add/subtraction, take advantage of add/subtract, add up, be shifted, extraction, swap operation, each processing unit finishes one or more multiplication, add/subtraction, take advantage of add/subtract, add up, be shifted, extraction, swap operation;
Moderator is used to arbitrate microprocessor core and the processing unit request of access to external memory storage;
Dma controller is used for the storer of direct configuration process unit;
The input divider is used for giving different processing units with the data allocations of taking from moderator;
The output divider is used for obtaining data and being sent to moderator from different processing units;
The instruction set that described microprocessor core and processing unit comprise all is subclass of same instruction set, can be by same compiler compiling; After finishing the compiling of application program, use same back compiler to carry out machine code and decompose, determine the action and the load of each processing unit.
2. the configurable multi-core/many-core system based on the single instruction set microprocessor arithmetic element according to claim 1 is characterized in that described microprocessor core comprises arithmetic element, register file, programmable counter, interrupt/exception controller, memory management unit and buffer memory.
3. the configurable multi-core/many-core system based on the single instruction set microprocessor arithmetic element according to claim 1 is characterized in that described processing unit comprises arithmetic element, register file, local command memory, local data memory and programmable counter.
4. according to claim 1,3 described configurable multi-core/many-core systems based on the single instruction set microprocessor arithmetic element, the number that it is characterized in that described processing unit is unfixed, and concrete number can be expanded according to practical application request.
5. the configurable multi-core/many-core system based on the single instruction set microprocessor arithmetic element according to claim 1, it is characterized in that described microprocessor core some processing units of arranging in pairs or groups, when carrying out, at first by microprocessor core operation system, processing unit is in closed condition, and microprocessor core can require to open local command memory and local data memory in processing unit, configuration process unit connection relation, the configuration process unit according to application program.
6. the configurable multi-core/many-core system based on the single instruction set microprocessor arithmetic element according to claim 1, it is characterized in that described microprocessor core some processing units of arranging in pairs or groups, when carrying out, at first by microprocessor core operation system, processing unit is in closed condition, microprocessor core can require to open processing unit, configuration process unit connection relation according to application program, the configuration dma controller is by local command memory and local data memory in the dma controller configuration process unit.
7. according to claim 1,5,6 described configurable multi-core/many-core systems based on the single instruction set microprocessor arithmetic element, it is characterized in that interconnection realizes flexibly by network-on-chip between described processing unit, level connects relation before and after all can constituting between any two processing units.
8. according to claim 1,5,6,7 described configurable multi-core/many-core systems based on the single instruction set microprocessor arithmetic element, it is characterized in that by described microprocessor core configuration process unit, can require structure connected in series of all processing units one-tenth connected in series is handled intensive computing according to application program.
9. according to claim 1,5,6,7 described configurable multi-core/many-core systems based on the single instruction set microprocessor arithmetic element, it is characterized in that by described microprocessor core configuration process unit, can require processing unit is divided into groups according to application program, connect into a plurality of structures connected in series respectively, different structures connected in series can be moved different program segments, walk abreast and carry out the intensive computing of different instructions, different pieces of information, also can move identical program segment, carry out the intensive computing of same instructions, different pieces of information in the single-instruction multiple-data stream (SIMD) mode.
10. according to claim 1,5,6,7 described configurable multi-core/many-core systems based on the single instruction set microprocessor arithmetic element, it is characterized in that by described microprocessor core configuration process unit, can require processing unit is divided into groups according to application program, connect into a plurality of structures connected in series respectively, the wherein different intensive computings of part structure parallel processing connected in series, afterwards again with other intensive computings of other structures connected in series processing connected in series.
11. according to claim 1,5,6,7,8,9,10 described configurable multi-core/many-core systems based on the single instruction set microprocessor arithmetic element, it is characterized in that by described microprocessor core configuration process unit the annexation between processing unit can be adjusted according to the requirement of application program in run duration any time.
12. according to claim 1,5,6,7,8,9,10,11 described configurable multi-core/many-core systems based on the single instruction set microprocessor arithmetic element, it is characterized in that, in the structure that each described a plurality of processing unit is formed by connecting, the input data of first processing unit are passed through moderator, the input divider is taken from external memory storage, the result of last processing unit is by the output divider, moderator is sent external memory storage back to, the input data of all the other each processing units are taken from the previous stage processing unit, output data is sent to back one-level processing unit, described previous stage processing unit with afterwards be connected by FIFO between the one-level processing unit.
Priority Applications (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN200810203778A CN101751373A (en) | 2008-11-28 | 2008-11-28 | Configurable multi-core/many core system based on single instruction set microprocessor computing unit |
| PCT/CN2009/001346 WO2010060283A1 (en) | 2008-11-28 | 2009-11-30 | Data processing method and device |
| EP09828544A EP2372530A4 (en) | 2008-11-28 | 2009-11-30 | METHOD AND DEVICE FOR DATA PROCESSING |
| KR1020117014902A KR101275698B1 (en) | 2008-11-28 | 2009-11-30 | Data processing method and device |
| US13/118,360 US20110231616A1 (en) | 2008-11-28 | 2011-05-27 | Data processing method and system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN200810203778A CN101751373A (en) | 2008-11-28 | 2008-11-28 | Configurable multi-core/many core system based on single instruction set microprocessor computing unit |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN101751373A true CN101751373A (en) | 2010-06-23 |
Family
ID=42478365
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN200810203778A Pending CN101751373A (en) | 2008-11-28 | 2008-11-28 | Configurable multi-core/many core system based on single instruction set microprocessor computing unit |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN101751373A (en) |
Cited By (10)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102073481A (en) * | 2011-01-14 | 2011-05-25 | 上海交通大学 | Multi-kernel DSP reconfigurable special integrated circuit system |
| CN102521201A (en) * | 2011-11-16 | 2012-06-27 | 刘大可 | Multi-core DSP (digital signal processor) system-on-chip and data transmission method |
| CN104317770A (en) * | 2014-10-28 | 2015-01-28 | 天津大学 | Data storage structure and data access method for multiple core processing system |
| CN104813306A (en) * | 2012-11-21 | 2015-07-29 | 相干逻辑公司 | Processing system with interspersed processors DMA-FIFO |
| CN110007962A (en) * | 2019-03-08 | 2019-07-12 | 浙江大学 | A kind of instruction-set simulation method based on Code automatic build |
| CN111930668A (en) * | 2020-08-03 | 2020-11-13 | 中国科学院计算技术研究所 | Operation device and method, multi-core intelligent processor and multi-core heterogeneous intelligent processor |
| CN112445696A (en) * | 2019-09-02 | 2021-03-05 | 无锡江南计算技术研究所 | Debugging method for longitudinal consistency of heterogeneous many-core Dcache |
| WO2022199604A1 (en) * | 2021-03-26 | 2022-09-29 | 北京灵汐科技有限公司 | Integrated circuit chip and many-core system |
| CN115280297A (en) * | 2019-12-30 | 2022-11-01 | 星盟国际有限公司 | Processors for Configurable Parallel Computing |
| CN115470174A (en) * | 2021-06-10 | 2022-12-13 | 北京灵汐科技有限公司 | Route generation method and device, many-core system, computer-readable medium |
-
2008
- 2008-11-28 CN CN200810203778A patent/CN101751373A/en active Pending
Cited By (15)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN102073481B (en) * | 2011-01-14 | 2013-07-03 | 上海交通大学 | Multi-kernel DSP reconfigurable special integrated circuit system |
| CN102073481A (en) * | 2011-01-14 | 2011-05-25 | 上海交通大学 | Multi-kernel DSP reconfigurable special integrated circuit system |
| CN102521201A (en) * | 2011-11-16 | 2012-06-27 | 刘大可 | Multi-core DSP (digital signal processor) system-on-chip and data transmission method |
| CN104813306B (en) * | 2012-11-21 | 2017-07-04 | 相干逻辑公司 | Processing system with distributed processor DMA‑FIFO |
| CN104813306A (en) * | 2012-11-21 | 2015-07-29 | 相干逻辑公司 | Processing system with interspersed processors DMA-FIFO |
| US12197970B2 (en) | 2012-11-21 | 2025-01-14 | HyperX Logic, Inc. | Processing system with interspersed processors DMA-FIFO |
| CN104317770A (en) * | 2014-10-28 | 2015-01-28 | 天津大学 | Data storage structure and data access method for multiple core processing system |
| CN104317770B (en) * | 2014-10-28 | 2017-03-08 | 天津大学 | Data store organisation for many-core processing system and data access method |
| CN110007962A (en) * | 2019-03-08 | 2019-07-12 | 浙江大学 | A kind of instruction-set simulation method based on Code automatic build |
| CN112445696A (en) * | 2019-09-02 | 2021-03-05 | 无锡江南计算技术研究所 | Debugging method for longitudinal consistency of heterogeneous many-core Dcache |
| CN115280297A (en) * | 2019-12-30 | 2022-11-01 | 星盟国际有限公司 | Processors for Configurable Parallel Computing |
| CN111930668A (en) * | 2020-08-03 | 2020-11-13 | 中国科学院计算技术研究所 | Operation device and method, multi-core intelligent processor and multi-core heterogeneous intelligent processor |
| CN111930668B (en) * | 2020-08-03 | 2023-09-26 | 中国科学院计算技术研究所 | Computing device, method, multi-core intelligent processor and multi-core heterogeneous intelligent processor |
| WO2022199604A1 (en) * | 2021-03-26 | 2022-09-29 | 北京灵汐科技有限公司 | Integrated circuit chip and many-core system |
| CN115470174A (en) * | 2021-06-10 | 2022-12-13 | 北京灵汐科技有限公司 | Route generation method and device, many-core system, computer-readable medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN101751373A (en) | Configurable multi-core/many core system based on single instruction set microprocessor computing unit | |
| Balasubramonian et al. | Near-data processing: Insights from a micro-46 workshop | |
| Lorenzon et al. | Parallel computing hits the power wall: principles, challenges, and a survey of solutions | |
| Esmaeilzadeh et al. | Power limitations and dark silicon challenge the future of multicore | |
| Wang et al. | A ubiquitous machine learning accelerator with automatic parallelization on FPGA | |
| CN103777923A (en) | DMA vector buffer | |
| Lari et al. | Hierarchical power management for adaptive tightly-coupled processor arrays | |
| Chen et al. | Characterizing scalar opportunities in GPGPU applications | |
| Lee et al. | Background scrolling in high-level synthesis oriented game programing library | |
| Baskaran et al. | Decentralized offload-based execution on memory-centric compute cores | |
| CN105700913B (en) | A kind of parallel operation method of lightweight bare die code | |
| Wolf | Multiprocessor system-on-chip technology | |
| CN102023846B (en) | Shared front-end assembly line structure based on monolithic multiprocessor system | |
| Sandokji et al. | Task scheduling frameworks for heterogeneous computing toward exascale | |
| Wang et al. | An automatic-addressing architecture with fully serialized access in racetrack memory for energy-efficient CNNs | |
| Akram et al. | C-slow technique vs multiprocessor in designing low area customized instruction set processor for embedded applications | |
| Watkins et al. | Shared reconfigurable architectures for CMPS | |
| Javaid et al. | Multi-mode pipelined mpsocs for streaming applications | |
| Lari et al. | Massively parallel processor architectures for resource-aware computing | |
| Kaouane et al. | SysCellC: Systemc on cell | |
| Natvig et al. | Multi‐and Many‐Cores, Architectural Overview for Programmers | |
| Chouliaras et al. | VThreads: A novel VLIW chip multiprocessor with hardware-assisted PThreads | |
| Chandy et al. | Hardware parallelism vs. software parallelism | |
| Melikyan | Design of High-performance Heterogeneous Integrated Circuits | |
| Guo et al. | Evaluation and tradeoffs for out-of-order execution on reconfigurable heterogeneous MPSoC |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C12 | Rejection of a patent application after its publication | ||
| RJ01 | Rejection of invention patent application after publication |
Application publication date: 20100623 |