WO1996036919A1 - Micro-ordinateur - Google Patents
Micro-ordinateur Download PDFInfo
- Publication number
- WO1996036919A1 WO1996036919A1 PCT/JP1996/001308 JP9601308W WO9636919A1 WO 1996036919 A1 WO1996036919 A1 WO 1996036919A1 JP 9601308 W JP9601308 W JP 9601308W WO 9636919 A1 WO9636919 A1 WO 9636919A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- access
- cache
- processing unit
- central processing
- memory
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0888—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
Definitions
- the present invention relates to a single-chip microcomputer incorporating a memory and the like together with a central processing unit, and more particularly, to a technology for accelerating a memory access operation inside and outside a microphone opening combination by a central processing unit, for example, a central processing unit.
- RISC reduced instruction set computer
- a ROM Read Only Memory
- a RAM Random Access Memory
- the time required for the CPU to access the memory is short, which is advantageous for speeding up instructions and data fetches.
- the built-in memory can be accessed in one clock cycle of the CPU operation reference clock signal. At this time, external memory access usually requires a plurality of clock cycles.
- Examples of such a single-chip microphone with a combination of a mouthpiece and a commercialized product include, for example, “Nippon Electronics Corp., January 23, 1990, no. There are those described on page 99 to page 112.
- Japanese Patent Application Laid-Open No. 2-187879 discloses a semiconductor device in which an instruction cache memory and a main storage device accessed only when there is no instruction in the instruction cache memory are mounted on the same chip together with a central processing unit. An integrated circuit is shown.
- the present inventor has studied the speeding up of internal and external memory access operations by a single-chip microcomputer equipped with a cache memory and a built-in memory together with a central processing unit. At this time, we focused on two points: lowering of data processing performance due to external memory access, and saturation of improvement in data processing performance with respect to the improvement of the operating frequency of the micro-computer.
- the hit ratio is low, and improvement in data processing performance cannot be expected much.
- the hit rate is low unless there is a large difference between the access speed of the cache memory and the access speed of the internal memory. If so, it became clear that the process of adding cache miss data to the cache memory may eventually lose the significance of installing the cache memory.
- the CPU could access the built-in EPROM or built-in mask ROM in one clock cycle when the operating frequency of the CPU was 2 OMHz .
- the built-in ROM such as large-capacity ROM and flash memory, for which the access time cannot be reduced so much due to the large read line capacity and the bit line capacity, etc.
- the access requires multiple clock cycles.
- the data processing performance of the system saturates even if the operating frequency is increased, especially in a micro-computer with a RISC architecture in which the memory access time is directly proportional to the performance.
- the microcombiner (MPU, MPU 1) formed on one semiconductor substrate includes a central processing unit (1) and a central processing unit (1).
- a built-in memory (9, 10) to be accessed a cache memory (CACHE) coupled to the central processing unit by an internal bus (6), and an external address space of a micro-computer.
- Interface means for causing the external address space to be cached by the cache memory and control means (2, 72) for controlling the internal memory to be non-cacheable.
- the microcomputer (MPU, MPU1) mounts the built-in memory (9, 10) together with the cache memory (CACHE) on the same semiconductor substrate, and the built-in memory is the case of the cache memory in the cache memory.
- the internal memory (9, 10) is not targeted for caching, in other words, the internal memory (9, 10) is mapped to a non-cacheable area, so that the internal memory access can be cache hit in any case. As well as high-speed access.
- the built-in memory is also targeted for cache, if a cache miss occurs during access to the built-in memory, processing for adding data related to the cache miss to the cache memory must be performed.
- the above-mentioned means does not cause such a situation at all.
- the cache memory (CACHE) itself has a small storage capacity and a low cache hit rate, it can be used in combination with the built-in memory (9, 10) that can be accessed at a high speed.
- the overall hit rate for inside and outside of the combination (MPU, MPU 1) can be increased. Therefore, the speed of the memory access in the entire execution process of the operation program by the central processing unit can be increased, and the data processing performance can be improved as a whole.
- the interface means By waiting for the start of the access cycle to the external address space for a certain period of time required to determine the cache hit or cache miss of the cache memory, when a cache hit occurs, the already started pass cycle is stopped halfway. It is possible to prevent the data in the external memory from being destroyed. However, such wait periods will delay the activation of the external pass cycle in the event of a cache miss. Once a cache miss has occurred, taking into account the fact that the state of the cache miss is continuous a plurality of times, and in order to speed up external memory access as much as possible, the interface means must provide access to the external address space.
- the central processing unit has a RISC architecture
- the central processing unit, the built-in memory and the cache memory share an internal bus
- the central processing unit It accesses the internal bus with one clock cycle of its operation reference clock signal as the maximum access speed.
- the cache memory is further provided so long as it outputs a data relating to a cache hit to the internal bus in a period of one clock cycle of an operation reference clock signal of the central processing unit from the start of access by the central processing unit.
- Another micro-computer employs a high-speed RAM (201) having a faster access operation than the built-in memory (202, 203) instead of the cache memory. Furthermore, the central processing unit (200) is specified ( Triggered by the access to the value of C TAR), the bus right is acquired from the central processing unit, and from the specific address (208) of the internal memory (202, 203) or the external address space to another specific address. Is transferred to the high-speed RAM (201), and when the access of the central processing unit after the transfer is the transfer source address (CSAR address ⁇ CEAR), the access target is switched to the high-speed RAM and controlled. The transfer control means (205) is employed. The target of data transfer to the high-speed RAM can be limited to either the internal memory or the external memory, or both.
- Data or programs can be transferred to M (201) in advance and made available. For example, by transferring a certain portion of a program in an internal memory having a slow access time to a high-speed RAM in advance and executing it, the speed of execution of the portion can be increased. Naturally, even if the program does not fit in the internal memory and runs out to the external memory, performance control can be prevented by controlling this external memory in the same way as the internal memory.
- routines that require high speed are relatively limited in the address range and execution timing, such as interrupt service parts, resulting in poor processing performance.
- parts requiring high-speed execution can be transferred to high-speed RAM in advance. As a result, the performance of the entire program processing is improved.
- Still another microcomputer employs a high-speed RAM (401), which has a faster access operation than the built-in memory (402, 403), instead of the cache memory.
- a high-speed RAM (401) Is built-in Triggered by accessing the memory (402, 403) or the external address space (408), the access data is stored in the high-speed RAM in parallel with the access to the internal memory access or the external address space by the central processing unit.
- (401) is performed for a predetermined address range (CSAR ⁇ address ⁇ CE AR), and the access of the central processing unit after the completion of the data transfer for all the addresses in the predetermined address range is performed by the transfer.
- a transfer control means (405) for switching and controlling the access target to a high-speed RAM when the address is the original data is adopted.
- Data transfer to the high-speed RAM can be limited to either the internal memory or the external memory, or both.
- MPU4 employ a high-speed RAM (601), which is faster in access operation than the built-in memory (602, 603), instead of the cache memory. 08) and a data transfer control means (612) for performing data transfer from the central processing unit to the high-speed RAM, and a central processing unit access to a transfer source address (CS AR ⁇ address ⁇ CEAR) of the data transferred to the high-speed RAM.
- Switching control means (6) for switching and controlling access to the high-speed RAM. 0 4).
- the target of data transfer to the high-speed RAM can be either internal memory or external memory, or both.
- the central processing unit tries to access a specified range of the internal memory and unconditionally accesses the high-speed RAM, it can be used when the user initializes the program. If the programs and data to be accelerated in the built-in memory are previously transferred to the high-speed RAM, the data processing performance can be improved. If the program to be accelerated is known among the programs completed by the C compiler assembler, the data processing performance can be easily improved by the above configuration. According to this method, a penalty due to a cache miss and transfer of a high-speed routine during the background processing are eliminated. In addition, the execution time is exactly the same at the time of the first passage and after that, which makes it easy to perform timing design. Naturally, even if the program does not fit in the internal memory and runs out to the external memory, performance control can be prevented by controlling this external memory in the same way as the internal memory. BRIEF DESCRIPTION OF THE FIGURES
- FIG. 1 is a block diagram of an embodiment of a single-chip microcombination according to an embodiment of the present invention.
- FIG. 2 is a block diagram showing an example of a pass controller and an external pass interface included in the microcomputer shown in FIG.
- FIG. 3 is an address mapping diagram in the built-in ROM valid mode in the micro-combination shown in FIG.
- FIG. 4 is an address mapping diagram in a built-in ROM invalid mode in the micro-computer shown in FIG.
- FIG. 5 is an explanatory diagram of an address signal output by the CPU.
- FIG. 6 is an example block diagram of the cache memory.
- FIG. 7 is a timing chart when the CPU performs a read access to the RAM or ROM as an example of access to the non-cacheable area.
- FIG. 8 is a timing chart of read access to the cacheable area by the CPU during a cache shot.
- FIG. 9 is a timing chart of a read access by the CPU to the cacheable area during a cache error.
- FIG. 10 is a timing chart when the control signal B ECNOP is negated during a cache miss.
- FIG. 11 is a timing chart when the control signal BE CNOP is negated at the time of a cache hit.
- FIG. 12 is an explanatory diagram showing a state during a cache hit and a state during a cache miss in a plurality of bus access cycles.
- FIG. 13 is an explanatory diagram of the configuration when the cache memory is also used as the built-in RAM.
- FIG. 14 is an explanatory diagram of an address signal and a cache address of an address array when the cache memory is also used as the built-in RAM.
- FIG. 15 is a block diagram of a single-chip microcomputer according to another embodiment of the present invention.
- FIG. 16 is a block diagram showing an example of a bus controller and an external password interface in the micro-computer shown in FIG. Figure 17 shows the dynamic access ratio r to the internal memory, the cache memory hit ratio h when accessing the external memory, and the total hit ratio.
- FIG. 4 is an explanatory diagram showing the relationship of H.
- FIG. 18 is another explanatory diagram showing the relationship between the dynamic access ratio r to the internal memory, the hit ratio h of the cache memory at the time of accessing the external memory, and the total hit ratio H.
- Fig. 23 is an explanatory diagram showing the miss rate (1-1h) for the cache configuration (number of ways, line length, capacity).
- FIG. 27 is an explanatory diagram showing the relationship between the cache capacity C and the built-in ROM capacity R for the full cache memory based on the results shown in FIG.
- FIG. 28 shows the relationship between the cache capacity C and the internal ROM capacity R based on the results shown in FIG. 25 using a 4-way set-associative (4-way) key.
- FIG. 4 is an explanatory diagram obtained for a cache memory.
- FIG. 29 is an explanatory diagram showing the relationship between the cache capacity C and the built-in ROM capacity R based on the results shown in FIG. 25 for a two-way set-associative (2 way) cache memory.
- FIG. 30 is an explanatory diagram showing the relationship between the cache capacity C and the built-in ROM capacity R based on the results shown in FIG. 25 for a direct map (direct) cache memory.
- FIG. 31 is a block diagram of an embodiment of a microcomputer in which data is transferred to a high-speed RAM in advance to speed up memory access as a whole.
- FIG. 32 is a block diagram of a RAM transfer controller included in the micro-computer shown in FIG.
- FIG. 33 is an explanatory diagram of the address conversion.
- FIG. 34 is a block diagram of the high-speed RAM and the address converter.
- FIG. 35 is a block diagram of an embodiment of a microcombiner in which data is simultaneously transferred to a high-speed RAM to speed up access.
- FIG. 36 is a block diagram of the high-speed RAM included in the micro-computer shown in FIG.
- FIG. 37 is a block diagram of a RAM transfer controller.
- FIG. 38 is an illustration of parallel writing to a high-speed RAM.
- Fig. 39 shows a microcombiner in which information in a low-speed memory is transferred to high-speed RAM in advance by using a DMAC, and the access address for the low-speed memory is replaced with the address of the high-speed RAM to speed up memory access.
- FIG. 3 is a working block diagram of the embodiment.
- FIG. 40 is an example block diagram of a path controller included in the micro-combination shown in FIG. 39.
- FIG. 41 is a block diagram of the high-speed RAM and the address arithmetic unit. BEST MODE FOR CARRYING OUT THE INVENTION
- FIG. 1 shows an embodiment of a single-chip microcombiner according to an embodiment of the present invention.
- the single-chip microcomputer (also simply referred to as a microcombination) MPU has a cache memory CACHE in addition to a ROM (read 'only' memory) 9 and a RAM (random 'access memory) 10.
- ROM 9 and RAM 10 are not cached by the cache memory CACHE.
- the CPU (central processing unit) 1 accesses the built-in ROM 9 or RAMI 0, the cache memory is The built-in ROM 9 or RAM 10 is read / written directly without using the access operation of CACHE.
- the microcomputer MPU includes a CPU 1, a cache memory C ACHE, a ROM 9 in which an operation program or data of the CPU 1 is stored, a RAM 10 serving as a work area of the CPU 1 or a temporary storage area of data, It is formed on a single semiconductor substrate such as monocrystalline silicon, including the MULT 5, the path controller 7, the peripheral circuit 11, the external path interface 12, and the like. An external memory 13 and the like are connected to the external path interface 12 via an external path 14.
- the microcomputer MPU of this embodiment has a RISC architecture, although not particularly limited, and the CPU 1 executes an instruction in one clock cycle in the operation reference clock signal of the CPU 1 per instruction.
- path The pipeline stages such as instruction fetch, instruction decode, instruction execution, and memory access are executed so that the basic cycle number of access is one clock cycle.
- the internal path (I BUS) 6 to which the CPU 1 is connected is a high-speed internal bus having a minimum operation cycle of one clock cycle, and includes an internal address bus I AB, an internal data path I DB, and an internal control path I CB.
- the cache memory CACHE, ROM 9, RAM 10, a multiplier (MULT) 5 for executing a multiplication instruction, and a bus controller 7 are connected to the internal path 6.
- the bus controller 7 controls the internal bus 6 and controls access to the peripheral circuits 11.
- FIG. 2 shows an example block diagram of the bus controller 7 and the external pass interface 12.
- the bus controller 7 includes a circuit block roughly divided into a buffer 71 and a control logic circuit 72.
- the external bus interface 12 has a circuit block roughly divided into a buffer 120 and a timing generation circuit 121.
- the buffer 71 supplies a predetermined signal included in an internal data bus address bus constituting the internal path 6 to the external bus interface 12 and the peripheral circuit 11.
- the control logic circuit 72 determines the access target based on the upper bits of the address signal supplied from the internal path 6, and receives a path command or the like supplied from the internal path 6 to perform read / write access. Judge the type of access such as data width.
- a selection signal MS ROM of the built-in ROM 9, a selection signal MSRAM of the RAM 10, a read / write signal MRW for the built-in circuit, and the like are generated.
- CS 0 to CS 3 read signal RD, write signal WR, row address strobe signal RAS for accessing DRAM (Dynamic RAM), column address strobe signal CAS, etc.
- Timing generation circuit 1 2 1 And outputs a control signal and the like for output from the controller.
- the output timing of the data input / output address signal for external path access is performed by the timing generation circuit 122 controlling the buffer 120 based on the control signal from the control logic circuit 72.
- control logic circuit 72 is notified of a cache hit / miss in the cache memory CACHE by a hit signal 109, and furthermore, the start timing of the external bus access via the external bus interface 12 by the control signal BE CNOP. Alternatively, the wait until the start of the external path access is notified. Then, the control logic circuit 72 notifies the CPU 1 of the path accessible state by the control signal BUS RDY.
- the control signal BUSRDY defines the delimitation of the bus access by the CPU 1.
- FIG. 3 shows an example of address matching in the built-in ROM valid mode in the micro-computer MPU according to the present embodiment
- FIG. 4 shows an example of address matching in the internal ROM invalid mode. Is shown.
- the microcomputer MPU of this embodiment includes a built-in ROM valid mode for validating the built-in ROM 9 and a built-in ROM invalid mode for disabling the built-in ROM 9, and their operation modes are determined by the setting state of a mode pin (not shown). Is done.
- the cache target space by the cache memory CACHE is the CS0 to CS3 space and the DRAM space in FIGS. 3 and 4, and the ROM 9 and the RAM 10 are not subjected to the cache.
- the address signal managed by the CPU 1 is 32 bits (A31 to A0) as shown in FIG. 5, A31 to A24 are used for selecting the space shown in the address map, and A23 and A22 are Used for chip selection in CS space.
- the address signal output via the external path interface 12 is 22 bits A21 to A0.
- the control logic circuit 72 shown includes address decoding logic 721 for space selection and CS space selection according to the operation mode, and generates the selection signals MS ROM and MS RAM according to the address decoding logic 721. , CS 0 to CS 3, and a control signal for outputting AS and CAS from the timing control circuit 121.
- the number of access cycles for external path access is determined by the control logic circuit 72 according to the number of wait states set in the control register 720 for each of the CS0 to CS3 space and DRAM space. Notify one. Note that, for the purpose of space selection and CS space selection, address bits lower than A22 are also actually decoded.
- FIG. 6 shows an example block diagram of the cache memory CACHE.
- the cache memory CACHE includes an address array (AA) 3, a data array (DA) 4, a comparator 20, a cache controller (C CONT) 2, and the like.
- the cache target by the cache memory CA CHE may be any of only an instruction, only data, and both an instruction and data.
- the cache memory CACHE is, for example, a cache that stores an effective address (virtual address or physical address) indicating which position (address) of the external storage device corresponds to information of one cache line that constitutes one entry.
- the address array 3 has a cache tag address 104 and a valid bit 105
- the data array 4 has a data line 106. If necessary, a dirty bit or the like indicating that the contents of the cache memory are updated and different from the contents of the external mass storage device may be provided. Other than these, memory space In some cases, a field indicating the type of the data is added.
- the address signal 100 supplied from the central processing unit CPU 1 to the internal address path IAB is regarded as a tag address 101, an entry address 102, and a line address 103 in a line.
- the byte address 103 in the line is information for selecting the byte data included in the data line 4.
- the entry address 102 is decoded by the decoder 110 and used to select (index) a cache line from the address array 3 and the data array 4. For example, if there are 128 cache lines, the entry address is 7 bits.
- the cache tag address 104 of the indexed cache line is compared with the tag address 101 at the comparator 20. The comparison result and the valid bit 105 are supplied to the cache controller 2.
- the cache controller 2 refers to the comparison result by the comparator 20 and the valid bit 105.For example, if the indexed cache line is valid, the cache tag address 104 of the cache line is the tag. When the address coincides with the address 101, a cache hit is notified to the path controller by the cache hit signal 109. At the same time, required data is selected from the indexed cache line data line 106 at the line address 103 in the line and read out to the internal data path IDB, or the cache line data line is read out. The data of the internal path IDB is written at the position selected by the in-line pipe address 103 with respect to 106.
- 40 is a data output circuit that selects data from the indexed data line at the in-line byte address 103 and outputs it to the internal data path IDB.
- 41 is a predetermined position of the indexed cache line in the line byte. Data that is selected at address 103 to supply write data from internal data path IDB This is an input circuit.
- the bus controller 7 controls to read information for one data line including data corresponding to the access address at that time from the external memory 13, and the cache controller 2 performs The contents of the data line 106 are replaced with the read data (line fill).
- the cache line to be replaced can be determined according to a well-known logic such as [1111; (1 ⁇ & 311 ⁇ 061111136)].
- the corresponding cache address tag 104 is written to the cache line, and the valid bit 105 is set.
- the operations of the data output circuit 40, the data input circuit 41, the output gates 31 and 105, and the input gates 30 and 1505 are performed by an index operation, a line fill operation, and a write operation by the CPU 1.
- the cache controller 2 controls the access.
- the address array 3 and the data array 4 in FIG. 6 are a direct map system with one way, a two-way (2-way) set associative or a four (4 one way) set associative format. And so on. Increasing the number of ways increases the number of cache lines indexed by the entry address, thereby increasing the cache hit rate. When increasing the number of ways, the address array 3 and the data array 4 are also prepared for that number, and the entry address 102 is commonly provided to each way. The cached gu addresses that have been paid and indexed for each way are compared with the guadress at the Comparator evening 20 provided for each way. If the comparison result by any one of the comparators 20 is matched and a cache hit is made, the data array of the way related to the match is subjected to read / write. Line filling is also performed for any one of the ways.
- the determination as to whether or not the access address area by the CPU 1 is an area to be cached by the cache memory CAC HE is not particularly limited, but is performed by the cacheable control circuit 21 of the cache controller 2.
- the cache target space is the CS0 to CS3 space and the DRAM space, and the other space is the cache non-target space (non-cacheable area).
- the cacheable control circuit 21 sets H, 00200000 to! ⁇ ⁇ It has a circuit that determines access of 01 FFFFFF and a circuit that determines access of H '00000000 to H' 01 FFFFFF to determine the cache target space in the internal ROM invalid mode. Determine whether the area is a non-cacheable area.
- the circuit for judging the access from H, 00200000 to H, 01 FFFFFF is not particularly limited, but the output and the access of the circuit for detecting that the 11 bits from the most significant bit of the access address are not all 0s are not limited. This can be determined by the logical sum signal with the output of the circuit that detects that all 7 bits from the most significant bit of the address are 0. Similarly, the circuit that determines access from H '00000000 to H' 01 FFFF FF makes a determination based on the output of the circuit that detects that all 7 bits from the most significant bit of the access address are all 0s. can do. The determination result signal is supplied from the cacheable control circuit 21 to the control signal generation circuit 23.
- the control signal generation circuit 23 When a determination result signal indicating that the access address area by the CPU 1 is not an area to be cached by the cache memory CACHE is supplied from the cacheable control circuit 21 to the control signal generation circuit 23, the control signal generation circuit 23 generates the comparator 20. An operation stop signal for stopping the operation is supplied to each of the input gates 30, 1050, the output gates 31, 1051, the data output circuit 40, the data input circuit 41, and the decoder 110. At this time, the cache memory CACHE becomes inactive.
- the control circuit 23 receives from the cacheable control circuit 21 a determination result signal indicating that the access address area of the CPU 1 is a cache target area by the cache memory CACHE, the control signal generation circuit 23 executes the comparator 20, an input gate 30, 1050, an output gate 31, 1051, a data output circuit 40, a data input circuit 41, and a decoder 110 are supplied with an operation start signal for starting the operation. At this time, the cache memory CACHE is in an operating state.
- the cache controller 2 In response to the access to the non-cacheable area, the cache controller 2 does not perform at least the read and write cache operations for the internal bus 6, and maintains the cache hit signal 109 in a cache miss state.
- the cache operation of reading and writing to the internal path 6 is performed, and the heat signal 109 is changed according to the determination result of the cache heat / miss.
- the cache controller 2 changes the control signal BE CNOP to an enable level for one clock cycle from the path access of the CPU 1 and gives it to the bus controller 7. Is controlled.
- the cache memory CACHE corresponds to the access to the cacheable area, the cache line is indexed, the cache address 104 is compared with the address tag 101, and a cache miss or hit is determined.
- One clock cycle is spent. If the bus controller 7 starts the external access during this time, the data in the external memory 13 accessed by the bus controller 7 may be destroyed.
- the path controller 7 waits for the start of the external bus access operation by setting the control signal BECNOP to the enable level for one clock cycle after the CPU 1 bus access. In the meantime, if the cache hit is notified to the bus controller by the cache hit signal 109, the bus controller 7 does not perform the external bus access and sends the bus ready signal BUS RDY to the state where the next path access operation is possible. Assert and notify CPU1.
- FIG. 7 shows a timing chart when the CPU 1 performs a read access to the RAM 10 or the ROM 9 as an example of an access to the non-cacheable area.
- Reference numerals 1 and 22 denote operation reference clock signals of the CPU 1 and non-overlapping two-phase clock signals.
- the pass command B CMD includes information indicating the bus access width and read / write, and is output by the CPU 1 to the internal control path I CB.
- the output timing is synchronized with the address output to the internal address bus IAB.
- the CPU 1 supplies an address signal to the internal address bus IAB in synchronization with the bus segment notified by the control signal BU SRDY.
- the cache control circuit 21 determines that the cache controller 2 is an access to the RAM 10 or ROM 9 by the address signal, that is, an access to the non-cacheable area, the address array 3 And the operation of array 4 is suppressed and cache hit Signal 109 is maintained in a cache miss state, and the control signal BE C NOP is negated.
- the bus controller 7 decodes the access address signal at that time and asserts the ROM selection signal MSR0M or the RAM selection signal MSRAM, whereby one clock of the clock signal ⁇ 1 is output.
- read data from ROM or RAM is applied to the internal data path IDB.
- the bus controller 7 asserts the bus ready signal BUSRY in synchronization with the next rising edge of the clock signal ⁇ 1, and notifies the CPU 1 that the next bus access is possible.
- FIG. 8 shows a timing chart of read access to the cacheable area by the CPU 1 during a cache hit. Since the access is to the cacheable area, the cacheable control circuit 21 of the cache controller 2 asserts the control signal BECNOP for one clock cycle after the path access by the CPU 1 to start the external access by the bus controller 7. The period is deterred. In the meantime, in the cache memory CACHE, the data array 4 and the address array 3 are read by the index operation, and a cache miss Z jet is determined for the read data. In this example, the cache hit is determined, the read data related to the hit is supplied to the internal data path IDB, and the cache hit is notified to the bus controller 7 by the cache hit signal 109. The bus controller 7 asserts the path ready signal BUS RDY without performing an external path access according to the access at that time, and notifies the CPU 1 of a state in which the next bus access is possible.
- FIG. 9 shows a timing chart of a read access by the CPU 1 to the cacheable area at the time of a cache miss.
- the cache control circuit 21 of the cache controller 2 asserts the control signal BE CNOP during one cycle of the bus access from the CPU 1 to start the external access by the path controller 7. The period is deterred.
- the cache memory CACHE the data array 4 and the address array 3 are read by the index operation, and a cache miss / hit determination is performed on the read data.
- the bus controller 7 activates an external path access according to the access address at that time.
- ZCSn shown in the figure means any of CS0 to CS3 in FIG. /: RD corresponds to RD in Fig. 2.
- External bus access starts from the T2 state, and in the T3 state, read data from the external memory 13 is read into the internal path IDB.
- the tag address related to the cache miss is written to the corresponding cache line as a cache tag address in the T2 state, and the read is performed in parallel with the CPU 1 taking in the read data in the T3 state. Data is written to the corresponding cache line.
- the period during which the external memory 13 is actually accessed is two clock cycles of T2 and T3, but the activation of the external path access is waited for one clock cycle by the control signal BE CNOP. Therefore, read access of CPU 1 requires a total of three clock cycles T1 to T3.
- the state of the cache miss continues for a relatively long time, and in the access to the cacheable area, the previous access to the cacheable area is not performed. Waiting is performed by the control signal BE CNOP only when it is a cache hit, and when the previous access to the cacheable area is a cache miss in the access to the cacheable area, the wait by the control signal BE CNOP is performed. Do not do. At the time of a cache hit following a cache miss to the cacheable area, since the wait by the control signal BE CNOP is not performed, the state of the cache hit is not given to the path roller 7, and the same operation as the cache miss is performed. I do.
- the hit signal generation circuit 22 has a flag FLG that holds the previous determination result of cache hit / miss.
- the hit signal generation circuit 22 compares the previous determination result of the cache miss / hit obtained from the flag FLG with the determination result of the current cache miss / hit, and if the previous cache miss is the current cache hit, the hit signal is generated.
- the cache control circuit 21 grasps the result of the previous cache miss / heat determination from the flag FLG, and if it indicates a cache miss, the control signal BECNOP even if it detects an access to the cacheable area.
- FIG. 10 shows that the control signal BECNOP is negated during a cache miss.
- a timing chart is shown in a case where the operation is performed (a state in which the cache miss is continuous).
- the control signal BECNOP since the control signal BECNOP is negated, the activation of the external path access is advanced by one clock cycle.
- FIG. 11 shows a timing chart when the control signal BECNOP is negated at the time of a cache hit (when the cache hit is made after the cache miss). In this case, the cache hit is actually a cache hit, but since the pass cycle has already been started, the cache hit signal 109 is negated, and the data read from the data array 4 is sent to the internal data bus IDB. Supply will be deterred.
- FIG. 12 shows states during a cache hit and a cache miss in a plurality of bus access cycles.
- A corresponds to the state of FIG. 8
- B corresponds to the state of FIG. 9
- C corresponds to the state of FIG. 10
- D corresponds to the state of FIG.
- the number of clock cycles for external path access is shortened by one clock cycle from the second cache miss, and the cache is switched from cache miss to cache hit.
- the number of clock cycles for the first external bus access at the switch is extended by one clock cycle. Therefore, if the cache access to the cacheable area continues three times or more, the number of clock cycles of the path access as a whole is determined by controlling the cache hit signal 109 and the control signal BECN0P as in this embodiment. Is shortened.
- FIG. 13 shows a configuration example in which the cache memory is also used as the built-in RAM.
- the internal RAM 10 has a data bus width of 32 bits with a capacity of 1 KB per module, and a total of 4 modules constitutes a 4 KB RAM. It shall be.
- the cache memory CACHE has a 1 KB storage capacity for the data array in a micro-computer MPU with such a built-in RAM 10, the cache memory CACHE can be configured by using the built-in RAM . If one module 1KB RAM is used as the data array 4 of the cache memory CACHE, 256 lines can be stored with a data line length of 4 bytes.
- the cache tag address 104 of the address array 3 can be assigned to D24 to D10 as shown in FIG. 14 (B). Since there is an empty bit in the address array 3, in order to utilize that area, the valid bit (V) 105 is assigned to the least significant bit D0 of the address array in this example.
- the valid bit 105 must be reset for each cache line, and to invalidate all the cache lines and initialize the cache memory CACHE, 256 times Requires access.
- the valid bit 105 can be stored in a logic circuit other than the RAM, such as the cache controller 2. For example, a cache of 32 bytes (32 x 8 bits) is prepared in the cache controller 2, and a correspondence is established with 256 cache lines to enable the validity of the register in the cache. Store the data.
- predetermined 8 bits (A 9 to A 2) of the address signal are used as an index for the address array 3, but when the address array 3 is used as a normal RAM, A is used.
- the 10 bits from 9 to A0 are used for addressing.
- the latch 23 holds the tag address, and the held address tag is supplied to one input of the comparator 20 for cache hit / miss determination, and is supplied to the address array 3 at the time of line filling.
- the other input of the comparator 20 is supplied with the cache address tag of the line indexed by the address array 3.
- the aligner 26 is a circuit that determines the correspondence between the internal data bus IDB and the data input / output terminal of the data array 4, and is controlled by the cache controller 2.
- the input circuit 24 and the output circuit 25 are circuits that selectively connect the data input / output terminals of the address array and the data path, and constitute the data input / output circuit when the address array 3 is used as a normal RAM. I do.
- whether or not to use the address array 3 and the data array 4 as ordinary RAM (as a part of RAMIO) can be linked to the validity / invalidity of the cache memory CACHE. Such valid Z invalid control can be determined by the operation mode of the micro-computer.
- the microcomputer MPU 1 shown in FIGS. 15 and 16 differs from the embodiment shown in FIGS. 1 and 2 in the arrangement of the bus controller.
- the bus controller 7 and the peripheral circuit 11 share the internal bus 6.
- the other points are the same as those of the embodiment shown in FIGS. 1 and 2, and the detailed description is omitted.
- the overall hit ratio in a micro computer with built-in memory that is accessed at high speed together with the cache memory will be described.
- the dynamic internal memory access ratio ratio of internal memory access during all memory accesses
- the cache memory hit ratio during external memory access is h
- the internal memory access and cache hit for all memory accesses are performed.
- the overall hit rate that is, the total hit rate H, which is the ratio of external memory accesses considered as the internal memory access, the internal memory access and the external memory access used as the cache hit do not occur at the same time.
- the average number of access clock cycles (average number of access clock cycles) Sa for the internal memory (9, 10) and the external memory (13) of the microcomputer MPU (MPU 1) is:
- FIG. 19 is drawn based on the contents of FIG. 20, and
- FIG. 21 is drawn based on the contents of FIG.
- the micro combination MPU (MPU 1) of this embodiment has the cache memory CACHE and the ROM 9 and RAMI 0 mounted on the same semiconductor substrate, and the RAMI 0 and ROM 9 are the cache memory cache memory CACHE. It is a built-in memory that can be accessed at a high speed, which is accessed in the same one-cycle as in the previous case. In any case, the fact that the internal memory access can be realized in one clock cycle similar to the cache heat is that the internal memories 9 and 10 are not targeted for caching, in other words, the internal memories 9 and 10 are non-cacheable areas. It is guaranteed by mating.
- the interrupt vector and the interrupt service routine are stored in the built-in ROM / RAM in the microcomputer MPU (MPU 1) that has the cache memory CACHE as well as the built-in memory such as ROM / RAM.
- MPU 1 microcomputer MPU
- the relevant ROM / RAM not to be cached, it is possible to speed up the transition to the interrupt processing program in the same way as in the case of a cache hit, even for a program with frequent interrupts. In this regard, the data processing performance or data processing speed can be improved.
- Using the cache memory as the built-in RAM is also advantageous in terms of cost. Furthermore, the valid bits 105 are physically cut off from the address array 3 and stored in a register such as the cache controller 2 so that the clearing of the bit 105 can be efficiently performed by software with a small number of accesses. It will be easier. Also in this regard, the hardware configuration of the cache memory CACHE is reduced, contributing to the cost reduction of the microphone-portable MPU (MPU 1).
- Figure 23 shows the miss rate (11h) for the cache configuration (number of ways, line length, capacity). This value is shown in “Al an. J. Smith,“ Line (block) Size Choice or CPU Caches ””, IEEE Trans, Comput., Vol C-36, no. 9, pp l 063— 1075, Set., 1987], based on the value of the instruction cache miss rate (Fig2).
- the vertical column marked CACHE SIZE indicates the storage capacity (bytes) of the data array of the cache memory CACHE
- the horizontal column marked Line is the cache memory CACHE. Means the number of bytes of the data overnight line in. Then, the value in the vertical column and the value in the horizontal column are exchanged.
- the numerical value written at the point position is the error rate (11h) corresponding to the cache size and line length.
- 23 (A) to 23 (D) show the configuration of the cache memory CACHE, which is full-associative (full), 4-way set associative (4way), and 2-way set associative (2way). , 1-way set associative, that is, direct map (direct).
- the description method is the same as in Fig. 23.
- r ⁇ 0 indicates a cache configuration in which the heat rate exceeds 0.9 even without the internal ROM.
- FIGS. 27 to 30 When the relationship between the cache capacity C [B] and the internal ROM capacity R [KB] is plotted based on the results in FIG. 25, the results in FIGS. 27 to 30 are obtained.
- the configuration of the cache memory is shown in Fig. 27 for full associativity (full), Fig. 28 for 4-way set associative (4 way), Fig. 29 for 2-way set associative (2 way), and 30
- the figure shows a direct map (direct), in which the data line length of the cache memory is a parameter.
- the ROM capacity is 85.33 kilobytes. This means that the overall hit rate will be 0.9. Real At this time, using the above results, it can be seen that the capacity of C and R is sufficient to obtain the required performance.
- C + aR b.
- the cache memory capacity C and the ROM capacity R are not unnecessarily (wastefully) increased, and the storage capacity of the ROM 9 with the built-in microphone MPU (MPU l) and the capacity are reduced in terms of cost and overall heat rate.
- the storage capacity of the cache memory CACHE can be optimized.
- the values of the constants a and b are as shown below from FIG.
- Fig. 31 shows that data is transferred to high-speed RAM in advance and the entire A block diagram of an embodiment of a microcomputer (microcomputer with a built-in pre-transfer format RAM cache) for speeding up memory access is shown.
- the single-chip micro-computer of this embodiment (also simply referred to as a micro-computer) MPU2 has a CPU 200, a ROM 202 in which an operation program or data of the CPU 200 is stored, a work area of the CPU 200 or a data storage area.
- RAM 203 used as a temporary storage area, multiplier (MU LT) 209, bus controller 204, peripheral circuit 210, external bus interface 207, high-speed RAM 201, address converter 2010, RAM transfer controller 205, etc. And formed on one semiconductor substrate such as single crystal silicon.
- the external path interface 207 is connected to an external memory 208 via an external path 211.
- the microcomputer MPU2 of this embodiment has an ISC architecture, although not particularly limited, and the CPU 200 executes an instruction in one clock cycle of the operation reference clock signal of the CPU 200 per instruction, and performs basic bus access. Let the number of cycles be one clock cycle In this way, pipeline stages such as instruction fetch, instruction decode, instruction execution, and memory access are executed by pipeline operation.
- the internal path 206 to which the CPU 200 is connected is a high-speed internal path having a minimum operation cycle of one clock cycle, and includes an internal address path IAB, an internal data bus IDB, and an internal control path ICB.
- the respective circuit modules are connected to the internal bus 206.
- the bus controller 204 performs control of the internal path 206, access control of the peripheral circuit 210, and the like.
- the bus controller 204 determines the area to be accessed based on the higher-order bits of the address signal supplied from the internal path 206, and receives a path command or the like supplied from the internal bus 206 to read / write access data.
- the type of access such as overnight is determined.
- the operation of the built-in ROM 202, AM 203, high-speed RAM 201, etc. is selected according to the results of these determinations, and a chip selection for external access and a read / write instruction are issued to the outside.
- the RAM 203 and the ROM 202 are used for access.
- the high-speed RAM 201 can be accessed in one clock cycle, and is used as a pre-transfer-type RAM cache.
- the RAM transfer controller 205 controls to transfer necessary information of the built-in RAM 203 and the ROM 202 to the high-speed RAM 201.
- FIG. 32 is a block diagram of the RAM transfer controller 205.
- the RAM transfer controller 205 includes a register (CTAR) 301 for storing an address used as a trigger for data transfer to the high-speed RAM 201, that is, a caching trigger address, a caching block address register (CSAR) 302, and a caching block.
- the CSR305 has a valid flag (V) 307 in addition to the operation setting bit of the RAM transfer controller 205.
- each register of CTAR301, CSAR302, CEAR303, and RCAR304 a value to be compared with the address output from the CPU 200 is set.
- the CSR 305 stores setting data for control of the RAM transfer controller 205 and a value including a valid flag 307.
- the registers 301 to 305 can be read-Z-accessed by the CPU 200, and the control circuit 306 decodes the value of the internal address path IAB, selects each register, and outputs the selected address via the internal data path IDB. Those Regis evenings are read-written.
- address information that specifies a part or the entire range of the area of the ROM 202 and the RAM 203 is set.
- the storage capacity of the specifiable range is determined by the storage capacity of the high-speed RAM 201.
- the comparison circuit 311 compares the value of the internal address path I AB with the value of the CTAR 301, and gives the comparison result to the control circuit 306.
- the comparison circuit 312 determines whether or not the value of the internal address bus IAB is equal to or greater than the value of the CSAR 302, and gives the determination result to the control circuit 306.
- the comparison circuit 313 determines whether or not the value of the internal address path I AB is equal to or less than the value of the CE AR 302, and gives the determination result to the control circuit 306. Referring to FIG.
- control circuit 306 detects that the address signal supplied from the CPU 200 to the internal address path IAB matches the value of the CTAR 301, it acquires the bus right from the CPU 200 and Determined by the value of CEAR 303 from the address determined by the value of CSAR302
- the transfer control is performed to a predetermined area of the high-speed ram 201 starting with the value of the CAR 304.
- the conversion control signal 3140 is inactive, and the high-speed RAM 201 is accessed according to the address signal output from the RAM transfer controller 205 to the internal address bus IAB.
- the valid flag 307 is valid and the pass right is abandoned.
- the control circuit 306 refers to the determination result by the comparison circuits 312 and 313 in the determination circuit 314, and the value of the internal address bus IAB is determined by the value of the CSAR 302.
- the address calculator 315 subtracts the value of CSAR302 from the value of RCAR304 (RCAR—CSAR), and the subtracted value is the RAM address.
- the conversion information 316 is supplied to the high-speed RAM 201, and the conversion control signal 310 is activated.
- FIG. 34 shows a block diagram of the high-speed RAM 201 and the address converter 2010.
- the address converter 2010 adds the RAM address conversion information 316 to the value of the internal address bus IAB at that time, and uses this as an access address, which is used as a high-speed RAM. Supply to 201.
- the conversion control signal 3140 is inactive, the value of the internal address path I AB is supplied to the high-speed RAM 201 as it is. It is sufficient for the high-speed RAM 201 to have a configuration of a normal RAM, and thus a detailed description thereof is omitted.
- RCA The value of R 304 is transferred to the high-speed RAM 210 with the value at the beginning.
- the conversion control signal 3140 is activated.
- the path controller 204 selects the operation of the high-speed RAM 201 instead of the operation of the ROM 202, the RAM 203, and the external memory 208.
- CPU 200 outputs address signal 250 via internal path 206.
- the address signal 250 is supplied to the RAM transfer controller 205, and the supplied address signal 250 is compared with the value of the CTAR 301. If the comparison result matches, the RAM transfer controller 205 asserts a bus right request signal (RT CREQ) 251 to the bus controller 204.
- the path controller 204 arbitrates the path right by the path arbiter 220, and gives the path right to the RAM transfer controller 205 at a break of a pass cycle of the CPU 200 or the like.
- the CPU pass right acknowledge signal (CPUACK) 253 is negated, and the RAM transfer controller bus right acknowledge signal (RT CACK) 252 is asserted.
- the RAM transfer controller 205 acquires the bus right in response to the assertion of the RT CACK 2 52, the RAM transfer controller 205 changes the address indicated by the CE SAR 302 from the address indicated by the CSAR 302. For example, instructions or data in the low-speed internal ROM 202, the internal RAM 203, or the external memory 208 in the range up to and including the address in the high-speed RAM 201 indicated by the R CAR 304 Transfer control to the address.
- control circuit 306 Inverts the valid flag 307 in the CSR 305 indicating that an instruction or data is stored in the high-speed RAM to a set state. Then, the RAM transfer controller 205 relinquishes the bus right.
- the address signal 250 output by the CPU 200 is compared with the values of the CSAR 302 and CEAR 303 by the comparators 312 and 313. Be compared. If CS AR ⁇ address signal ⁇ CEAR and the NORD flag 307 is in the set state, the contents of the address signal 250 are converted for access to the high-speed RAM 201. The high-speed RAM 201 is accessed instead of the / external memory 208. If the above conditions are not satisfied, the low-speed built-in ROM 202, built-in RAM 203, or external memory 208 is accessed by the address signal 250 as usual.
- the RAM address conversion information 316 is supplied from the RAM transfer controller 205 to the high-speed RAM 201, and the RAM address conversion information 316 and A conversion control signal 3141 is provided.
- the high-speed RAM 201 can be accessed by the access address calculated by the CPU access address—CSAR + RCAR.
- the path controller 204 selects the operation of the high-speed RAM 210 based on the conversion information 316 and the conversion control signal 3141. Further, the bus controller 204 uses the conversion control signal 310 to inhibit the operation of the ROM 202, the RAM 203, or the external memory 208 corresponding to the address signal 250 on the internal address bus IAB at that time.
- the access address of the high-speed RAM 201 is determined by connecting the upper part of the RC AR and the lower part of the address 250. Is also good. In this case, the values that can be set for CSAR 302 and CEAR 303 are naturally limited. Alternatively, the CEAR 303 may be abolished, and a match between the top of the address 250 and the top of the C SAR 302 may be made.
- the access speed is sufficient for all or any of the ROM 202 and the RAM 203, or a part of each of them, it is possible to adopt a configuration that allows one cycle access.
- the control of the RAM cache is unnecessary for the device connected to the internal path 206 and accessible by one cycle, and one clock cycle access by the CPU 200 is performed as usual.
- the data to be pre-transferred to the high-speed RAM 210 in the above embodiment is not limited to the data in the internal ROM and RAM, but only the data in the external memory or the data in the internal ROM and RAM and the external memory. And both data can be targeted. In these cases, the control can be realized by the same control as in the above embodiment.
- Fig. 35 shows a microcombiner with simultaneous transfer of data to high-speed RAM to speed up access (microcomputer with built-in simultaneous transfer format RAM cache). 2) is shown.
- the single-chip micro-computer of this embodiment (also simply referred to as micro-computer) MPU 3 includes a CPU 400, a ROM 402 in which the CPU 400 operating program or data is stored, a work area of the CPU 400, or a temporary storage area for data.
- An external memory 408 and the like are connected to the external pass interface 407 via an external bus 411.
- the high speed RAM 401 is shown in FIG. The difference from the one described in FIG. 34 is that the operation of the address converter 4010 is not controlled by the conversion control signal 5140, and the RAM address conversion information 516 is always added to the value of the internal address path IAB to achieve high speed operation. Supply to RAM401.
- the microcomputer MPU3 of the present embodiment is not particularly limited, but has a RISC architecture, and the CPU 400 executes an instruction in one clock cycle of the operation reference clock signal of the CPU 400 per instruction.
- Pipeline operation executes pipeline stages such as instruction fetch, instruction decode, instruction execution, and memory access so that the basic number of path access cycles is one clock cycle. I will do it.
- the internal bus 406 to which the CPU 400 is connected is a high-speed internal bus having a minimum operation cycle of one clock cycle, and includes an internal address bus IAB, an internal data bus IDB, and an internal control bus ICB.
- the respective circuit modules are connected to the internal path 406.
- the bus controller 404 controls the internal bus 406 and controls the access to the peripheral circuit 410.
- the bus controller 404 determines an access target area based on the upper bit of the address signal supplied from the internal path 406, and receives a path command or the like supplied from the internal path 406 to read / write access data width. The type of access such as is determined.
- the internal ROM 402 and the RAM 403 are selected in accordance with the determination results, and a control for selecting a chip for external access or giving a read Z-write instruction to the outside through an external bus interface is performed.
- the RAM 403 and the ROM 402 are low-speed internal memories that require two or more clock cycles for access.
- the high-speed RAM 401 is accessible in one clock cycle, and is used as a simultaneous transfer type RAM cache.
- the RAM transfer controller 405 controls to transfer required information of the built-in RAM 403 and the ROM 402 to the high-speed RAM 401.
- FIG. 37 shows a block diagram of the RAM transfer controller 405.
- the RAM transfer controller 405 has a caching block start address register (CSAR) 502, a caching block end address register (CEAR) 503, and an AM cache top address register (R CAR) 504, a control / status register (CSR). 505, and a control circuit 506 for controlling data transfer.
- the CS 505 It has a Nord flag (V) 508.
- a value to be compared with the address output by the CPU 400 is set in CSAR502, CEAR503, and RCAR504.
- the CSR 505 stores the setting data for controlling the RAM transfer controller 405 and the value including the valid flag 508.
- the registers 502 to 505 are made read / write accessible by the CPU 400, and the control circuit 506 decodes the value of the internal address path IAB to select each register, and via the internal data path IDB. Those Regis evenings will be lead Z-written.
- address information for designating a part or all of the area of the ROM 502 and the RAM 503 is set.
- the storage capacity of the specifiable range is determined by the storage capacity of the high-speed RAM 501.
- the comparison circuit 512 determines whether the value of the internal address bus IAB is equal to or greater than the value of the CSAR 502 or is equal to the value of the CSAR 502, and gives the determination result to the control circuit 506.
- the comparison circuit 513 determines whether the value of the internal address path I AB is equal to or less than the value of the CE AR 502 or equal to the value of the CEAR 502, and gives the determination result to the control circuit 506.
- the address signal 450 is supplied to the RAM transfer controller 405, and the supplied address signal is compared with the values of the CSAR 502 and CEAR 503 by the comparators 512 and 513. Be compared. The comparison result is given to the judgment circuit 514.
- the determination circuit 514 detects that the access address matches the value of the CSAR when the NORD flag 508 is in the invalid state, the determination circuit 514 continues until the address signal 450 matches the value of the CEAR 503.
- the address signal 450 changes from the value of CSAR502 to CEAR
- the CPU 400 accesses an instruction or data in the built-in ROM 402, the built-in RAM 403, or the external memory 408 while in the range of the value of 503, the access (read or write) is performed at the same time.
- the access address of the high-speed RAM 401 is calculated by the CPU access address—C SAR + R CAR, as in the above embodiment.
- the arithmetic unit 515 generates RAM address conversion information (RCAR-C SAR) 51 as in the above-described embodiment, and the arithmetic unit 40 10 of the high-speed RAM 401 receives the RAM address conversion information (RCAR-CSAR) 51.
- (CPU access address-C SAR + RCAR) is generated.
- the writing is performed in parallel with the access to the internal ROM 402, the internal RAM 403, or the external memory 408, as shown in FIG.
- the operation selection of the high-speed RAM 401 for such parallel high-speed RAM 401 access is controlled by receiving the conversion control information 5 140 by the bus controller 404.
- the access address of the high-speed RAM 401 may be a connection between the upper part of the RCAR and the lower part of the address signal 450.
- the configurable contents of C SAR 502 and CEAR 503 are limited.
- the CEAR 503 may be abolished, and the higher order of the address signal 450 may be compared with the upper order of the CSAR 502.
- the control circuit 506 sets the valid flag 508 to the set state. Turn around.
- the control circuit 506 has a means for detecting the end of the write operation to the high-speed RAM 401 in the address range from the value of the CSAR 502 to the address of the CEAR 503. Not particularly prepared. That is, in the address range where the access address by the CPU 400 is changed in order, the CSAR 502 and CEA This is because the value of R 503 may be set. Although not particularly limited, it is also possible to provide a circuit for sequentially detecting the presence / absence of access to each address in the address range from the value of CSAR 502 to CEAR 503 and detecting completion of access to all addresses in the range. c
- the control circuit 506 determines that the access address of the CPU is in the range of SAR ⁇ address ⁇ CEAR in the access of the CPU 400.
- the detection is made in 514, the selection of the operation of the internal ROM 402, the internal RAM 403, or the external memory 408 is suppressed by the bus controller 404 by the judgment control signal 5140.
- the bus controller 404 selects the operation of the high-speed RAM 401, supplies the high-speed RAM 401 with the RAM address conversion information 516, and operates the high-speed RAM 401 in one clock cycle. As a result, address access in that range can be sped up.
- the address signal used for accessing the high-speed RAM 401 is CPU access address-CSAR + RCAR, as in the above embodiment. If this address operation requires a time that cannot be ignored, the access address of the high-speed RAM 401 can be made by connecting the upper part of the RCAR and the lower part of the address 450 as described above.
- the data to be simultaneously transferred to the high-speed RAM 210 in the above embodiment is not limited to the data in the built-in ROM and RAM, but only the data in the external memory, or the data in both the built-in ROM and the RAM. De overnight can be targeted. In those cases, it can be realized by the same control as in the above embodiment.
- the processing performance can be improved in the same manner as the micro transfer device with a built-in pre-transfer format RAM cache, but the performance is improved.
- the range of the internal ROM / RAM that you want to speed up is copied to the high-speed RAM 401 at the same time when the CPU 400 accesses the range, the first time that range is executed, The performance depends only on the access time of the built-in ROM / RAM, but the second and subsequent executions are performed by accessing the high-speed RAM 401, so the speed is increased.
- controlling this external memory in the same way as the built-in ROMZRAM can prevent performance degradation.
- FIG. 1 is a block diagram showing an embodiment of a microcomputer (microcomputer with built-in RAM cache) for speeding up memory access.
- the MPU 4 has a CPU 600, a CPU 600 operating program or a ROM 6.02 in which data is stored, a work area of the CPU 600, or a temporary memory.
- RAM 603 as a storage area
- multiplier (MULT) 609 path controller 604
- peripheral circuit 610 external path interface 607
- high-speed RAM 601, address converter 613 address converter 613
- An external memory 608 and the like are connected to the external path interface 607 via an external path 611.
- the micro-view MPU4 of the present embodiment is not particularly limited, It has an ISC architecture, and the CPU 600 executes the instruction in one clock cycle of the operation reference clock signal of the CPU 600 per instruction, and sets the basic cycle number of the path access to one clock cycle.
- the pipeline stages such as instruction fetch, instruction decode, instruction execution, and memory access are executed by line operation.
- the internal path 606 to which the CPU 600 is connected is a high-speed internal path having a minimum operation cycle of one clock cycle, and includes an internal address bus IAB, an internal data bus IDB, and an internal control bus ICB.
- the respective circuit modules are connected to the internal path 606.
- a bus controller 604 controls the internal bus 606 and controls access to peripheral circuits 610.
- the RAM 603 and the ROM 602 are low-speed internal memories that require two or more clock cycles for access.
- the high-speed RAM 601 can be accessed in one clock cycle, and is used as a RAM cache of an address replacement type.
- the DMA controller 612 controls the transfer of required information such as the built-in RAM 603 and the ROM 602 to the high-speed RAM 601 in place of the CPU 600.
- the CPU 600 initializes the operation of the DMA controller 612.
- the control to enable the CPU 600 to access the data transferred to the high-speed RAM 601 by the DMA controller 612 in place of the built-in RAM 603 or the ROM 602 is not particularly limited, but the bus controller 604 performs the control.
- FIG. 40 shows an example block diagram of the bus controller 604.
- the path controller 604 determines the area to be accessed based on the upper bits of the address signal supplied from the internal path 606, and receives a path command supplied from the internal path 606, etc. Judge the type of access such as read / write access time.
- the internal ROM 602 and the RAM 603 are selected in accordance with the determination results, and a chip for external access and a read / write instruction are issued to the outside via the external pass-in interface 607.
- the control is performed by the control circuit 706.
- the bus controller 604 controls the high-speed RAM 601 by using a caching block start address register (CSAR) 702, a caching block end address register (CEAR) 703, a control / status register (CSR) 705, A RAM start address register 707, comparators 712 and 713, a decision circuit 715, and an address calculator 714 are provided.
- CSR 705 provides a parity flag 7051. The flag 7051 is set to a reset state after the DMA controller transfers necessary information of the built-in RAM 603 and the ROM 602 to the high-speed RAM 601 instead of the CPU.
- CSAR 702 and CEAR 703 a value to be compared with an address output from the CPU 600 is set.
- the start address of the address area where the high-speed RAM 601 is mapped is set in the RAM start address register 707.
- the registers 702, 703, 705, and 707 are made read-Z-write accessible by the CPU 600, and the control circuit 706 decodes the value of the internal address path IAB to select each register, and the internal data path IDB. Those Regis evenings are read / written.
- address information that specifies the range of data addresses of the ROM 602 and the RAM 603 transferred to the high-speed RAM 601 by the DMA controller 6 12 is set.
- the comparator circuit 712 determines whether the value of the internal address bus IAB is equal to or greater than the value of the CSAR 702 and provides the determination result to the control circuit 706. You.
- the comparison circuit 713 determines whether the value of the internal address bus IAB is equal to or smaller than the value of the CEAR 702, and gives the determination result to the control circuit 706. Accordingly, when the determination circuit 715 detects that the address signal supplied from the CPU 600 to the internal address path IAB is within the range of CSAR ⁇ address CEAR, the parity flag 7051 is set to the set state.
- the address calculator 714 subtracts the value of the CSAR 702 from the value of the RAM start address register (RCAR) 707 (RCAR-C SAR) and uses the resulting value as the RAM address conversion information.
- the signal is supplied to the high-speed RAM 601 as 716, and the conversion control signal 710 is activated and supplied to the high-speed RAM 601.
- the address calculator 714 is not operated, and the conversion control signal 710 is inactive, regardless of the value of the address of the internal path IAB. Is done.
- the control circuit 706 suppresses the operation selection of the ROM 602 or the RAM 603 specified by the value of the address bus IAB, and instead operates the high-speed RAM 601. Is selected by the selection signal (memory enable signal).
- FIG. 41 is a block diagram of the high-speed RAM 601 and the address calculator 613.
- the address arithmetic unit 613 adds the RAM address conversion information 716 to the value of the internal address path IAB, and uses this as the access address. Supplied to high-speed RA M60 1.
- the address calculator 613 supplies the value of the internal address bus ⁇ directly to the high-speed RAM 601.
- the parity flag 7051 is reset, so that the address signal output by the DMA controller 604 is output.
- High-speed RAM 60 1 accesses according to Is done.
- CPU 600 outputs address signal 650 via internal bus 606.
- the address signal 650 is supplied to the bus controller 604, and the supplied address signal is compared with the values of C SAR 702 and C EAR 703, respectively. If CSAR ⁇ address ⁇ CEAR, the conversion control signal 7150 is activated, and the address calculator 714 converts the access address of the CPU 600 into the address of the high-speed RAM 601.
- the CPU 600 can access data in the low-speed memory 602, 603, or 608 by accessing the high-speed RAM 601 at high speed.
- the access address of high-speed RAM 601 is set to the upper address of high-speed RAM 601 and the address signal.
- the lower part of 650 may be connected.
- the C EAR 603 may be abolished, and the higher order of the address signal 650 and the upper order of the CSAR 602 may be compared.
- the CPU 600 can access it as usual without controlling the RAM cache.
- the data transfer by address replacement to the high-speed RAM 210 in the above embodiment is not limited to the data in the built-in ROM and RAM. Or the data of both the built-in ROM and RAM and the external memory. In those cases, it can be realized by the same control as in the above embodiment.
- the user can execute the program At the time of initial setting, etc., the processing speed can be increased by previously transferring programs and data whose operation in the built-in ROM / RAM is to be accelerated to the high-speed RAM 601 by the DMA C controller 612 or the like. If you know where to increase the speed of a program completed with a C compiler or assembler, you can easily improve the performance with this function. This eliminates the penalty for cache misses and the transfer of high-speed routines during background processing.
- the execution time is exactly the same after the first pass and thereafter, making it easy to perform timing design.
- the program does not fit in the built-in ROMZRAM and runs into an external memory with a slow access time, controlling this external memory in the same way as the built-in ROM / RAM can prevent performance degradation.
- the peripheral circuit 11 may not be directly connected to the internal path 6 but may be connected to a bus dedicated to some peripheral function, and the bus dedicated to the peripheral function may be connected to the internal 6 via another interface circuit. It is possible. Coherency at the time of a data write between the cache memory CACHE and the external memory 13 can be maintained by a write-back or write-through method. In addition, all parity bits are collected at once to initialize the cache memory. It is also possible to separately provide a purge mechanism for clearing. Further, as the cache memory CACHE, it is possible to adopt any one of instruction only, data only, and instruction data mixed type, or a combination thereof.
- the control signal BE CNOP may be generated by the bus controller using the access area determination mechanism.
- the bus controller must be provided with a flag FLG for latching the notification of a cache hit / miss from the cache memory, and control the assertion / negation of the control signal BE CNOP. Further, the generation of such a control signal BE CNOP and the control of the assertion negation may be performed by a dedicated circuit. Further, the determination of the cacheable area / non-cacheable area is not limited to the configuration performed by the cache controller based on the access address as in the above embodiment. For example, the control bit for controlling the cache enable Z disable provided by the bus controller or provided in the cache controller is rewritten according to whether the access area determined by the bus controller is a non-cacheable area or a cacheable area. It may be performed by controlling. Industrial applicability
- the present invention is widely applied to various types of data processing systems including a central processing unit, a microcomputer incorporating a memory such as a ROM, a RAM, and a cache memory, and a microcomputer equipped with an external memory. can do.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Microcomputers (AREA)
- Advance Control (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
Micro-ordinateur dans lequel la mémoire et l'antémémoire sont commandées de façon optimale afin d'améliorer les performances de traitement de l'ordinateur. Un micro-ordinateur (MPU) comprend des mémoires intégrées (9, 10) montées sur le même substrat à semi-conducteur avec une antémémoire (ANTEMEMOIRE). Les mémoires intégrées sont celles qui permettent un accès rapide de la même manière qu'une présence en antémémoire dans la mémoire cache. On peut obtenir un accès rapide aux mémoires intégrées (9, 10) dans tous les cas, de la même manière qu'une présence en antémémoire, en les rendant non-antémémorisables, c'est-à-dire par projection des mémoires intégrées (9, 10) dans une zone non-antémémorisable. D'autre part, si elles sont antémémorisables, les données relatives à une absence dans l'antémémoire doivent être placées dans la mémoire cache, dans le cas d'une absence dans l'antémémoire, pendant l'accès aux mémoires intégrées.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP53470696A JP3735373B2 (ja) | 1995-05-19 | 1996-05-17 | マイクロコンピュータ |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP7/145552 | 1995-05-19 | ||
JP14555295 | 1995-05-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO1996036919A1 true WO1996036919A1 (fr) | 1996-11-21 |
Family
ID=15387817
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP1996/001308 WO1996036919A1 (fr) | 1995-05-19 | 1996-05-17 | Micro-ordinateur |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP3735373B2 (fr) |
WO (1) | WO1996036919A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7136965B2 (en) | 2000-08-07 | 2006-11-14 | Nec Corporation | Microcomputer |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS61136145A (ja) * | 1984-12-07 | 1986-06-24 | Hitachi Ltd | キヤツシユメモリ制御回路 |
JPS6227825A (ja) * | 1985-07-29 | 1987-02-05 | Fujitsu Ten Ltd | 汎用高速プロセツサ |
JPH0195343A (ja) * | 1987-10-07 | 1989-04-13 | Matsushita Electric Ind Co Ltd | 記憶装置 |
JPH02187881A (ja) * | 1989-01-13 | 1990-07-24 | Mitsubishi Electric Corp | 半導体集積回路 |
JPH0528040A (ja) * | 1991-07-18 | 1993-02-05 | Oki Electric Ind Co Ltd | 高速メモリアクセス方式 |
JPH0535467A (ja) * | 1991-07-31 | 1993-02-12 | Nec Corp | マイクロプロセツサ |
JPH05210974A (ja) * | 1991-10-03 | 1993-08-20 | Smc Standard Microsyst Corp | 同一チップ上でのスタティックキャッシュメモリとダイナミックメインメモリとの結合システム |
-
1996
- 1996-05-17 JP JP53470696A patent/JP3735373B2/ja not_active Expired - Fee Related
- 1996-05-17 WO PCT/JP1996/001308 patent/WO1996036919A1/fr active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS61136145A (ja) * | 1984-12-07 | 1986-06-24 | Hitachi Ltd | キヤツシユメモリ制御回路 |
JPS6227825A (ja) * | 1985-07-29 | 1987-02-05 | Fujitsu Ten Ltd | 汎用高速プロセツサ |
JPH0195343A (ja) * | 1987-10-07 | 1989-04-13 | Matsushita Electric Ind Co Ltd | 記憶装置 |
JPH02187881A (ja) * | 1989-01-13 | 1990-07-24 | Mitsubishi Electric Corp | 半導体集積回路 |
JPH0528040A (ja) * | 1991-07-18 | 1993-02-05 | Oki Electric Ind Co Ltd | 高速メモリアクセス方式 |
JPH0535467A (ja) * | 1991-07-31 | 1993-02-12 | Nec Corp | マイクロプロセツサ |
JPH05210974A (ja) * | 1991-10-03 | 1993-08-20 | Smc Standard Microsyst Corp | 同一チップ上でのスタティックキャッシュメモリとダイナミックメインメモリとの結合システム |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7136965B2 (en) | 2000-08-07 | 2006-11-14 | Nec Corporation | Microcomputer |
Also Published As
Publication number | Publication date |
---|---|
JP3735373B2 (ja) | 2006-01-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230418759A1 (en) | Slot/sub-slot prefetch architecture for multiple memory requestors | |
US8725987B2 (en) | Cache memory system including selectively accessible pre-fetch memory for pre-fetch of variable size data | |
US12332790B2 (en) | Multi-level cache security | |
US5745732A (en) | Computer system including system controller with a write buffer and plural read buffers for decoupled busses | |
US5157774A (en) | System for fast selection of non-cacheable address ranges using programmed array logic | |
KR100262906B1 (ko) | 데이터 선인출 방법 및 시스템 | |
US20240264955A1 (en) | Multiple-requestor memory access pipeline and arbiter | |
US6321321B1 (en) | Set-associative cache-management method with parallel and single-set sequential reads | |
US7877537B2 (en) | Configurable cache for a microprocessor | |
JP2000242558A (ja) | キャッシュシステム及びその操作方法 | |
JPH07129471A (ja) | 主メモリ・プリフェッチ・キャッシュを含むコンピュータ装置とその作動方法 | |
JPH0628256A (ja) | データ処理システム | |
JPH0962572A (ja) | ストリーム・フィルタ装置及び方法 | |
JP2013529816A (ja) | メモリデバイスの消費電力を減らす方法およびシステム | |
EP2095243A2 (fr) | Mémoire cache configurable destinée à un microprocesseur | |
KR101462220B1 (ko) | 마이크로프로세서용 구성가능한 캐시 | |
US5717894A (en) | Method and apparatus for reducing write cycle wait states in a non-zero wait state cache system | |
EP0309995B1 (fr) | Système pour la sélection rapide de champs d'adresses non antémémorisables utilisant un réseau logique programmable | |
US6484237B1 (en) | Unified multilevel memory system architecture which supports both cache and addressable SRAM | |
JP3515333B2 (ja) | 情報処理装置 | |
WO1996036919A1 (fr) | Micro-ordinateur | |
JPH11184752A (ja) | データ処理装置及びデータ処理システム | |
WO2004031963A1 (fr) | Machine de traitement de donnees a semi-conducteurs | |
JP3378270B2 (ja) | マルチプロセッサシステム | |
JPS5927994B2 (ja) | コンピユ−タシステム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): CN JP KR SG US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
122 | Ep: pct application non-entry in european phase |