KR102835585B1

KR102835585B1 - Method and device for processing data

Info

Publication number: KR102835585B1
Application number: KR1020170003050A
Authority: KR
Inventors: 유동훈; 김남형; 안준환; 최기영
Original assignee: 삼성전자주식회사; 서울대학교산학협력단
Priority date: 2016-11-07
Filing date: 2017-01-09
Publication date: 2025-07-18
Anticipated expiration: 2037-01-09
Also published as: KR20180051326A

Abstract

복수개의 코어들이 동작하는 컴퓨팅 환경에서, 뱅크 내 최적화(intra-bank optimization) 또는 뱅크간 최적화(inter-bank optimization)를 수행하는 데이터 처리 방법 및 데이터 처리 디바이스를 개시한다.Disclosed are a data processing method and a data processing device that perform intra-bank optimization or inter-bank optimization in a computing environment where multiple cores operate.

Description

Method and device for processing data

본 개시는 데이터 처리 방법 및 장치에 관한 것이다.The present disclosure relates to a data processing method and device.

과거 컴퓨터 기타 연산 시스템의 성능 향상을 위해 시스템 코어(core)의 클럭 주파수(clock frequency)를 높이는 방법이 이용되었다. 하지만 코어의 클럭 주파수가 상승하면, 전력 소비와 발열이 상승하는 문제점이 있다. 이러한 문제점 때문에 최근 코어의 클럭 주파수 상승을 통한 성능 향상은 둔화된 상황이다. 이러한 문제점을 회피하고, 시스템의 성능을 향상시키기 위해 코어 개수를 증가시키는 방법이 이용되고 있다.In the past, methods were used to increase the clock frequency of system cores to improve the performance of computer and other operating systems. However, there is a problem that power consumption and heat generation increase when the clock frequency of the core increases. Due to this problem, performance improvement through increasing the clock frequency of the core has recently slowed down. To avoid this problem and improve the performance of the system, a method of increasing the number of cores is being used.

복수개의 코어를 이용한 데이터 처리 방법에 있어 캐시(cache) 이용 방법이 문제된다. 일반적으로 캐시(cache)는 대용량의 메인 메모리(main memory)에 대한 접근을 빠르게 하기 위해 CPU 칩 내부나 근방에 탑재하는 작은 메모리를 의미한다. 메모리에 대한 접근 속도가 늘어나는 것에 비해 CPU의 처리 속도가 빠르게 증가하는 추세이기 때문에, 용량은 작지만 속도가 빠른 캐시는 프로세서의 성능에 직접적인 영향을 미친다.In the data processing method using multiple cores, the cache utilization method is problematic. In general, cache refers to a small memory installed inside or near the CPU chip to speed up access to a large amount of main memory. Since the processing speed of the CPU is increasing rapidly compared to the increase in the access speed to the memory, a cache with a small capacity but high speed directly affects the performance of the processor.

일 실시 예는 복수개의 코어들을 이용하여 데이터를 처리하는 방법 및 장치를 개시한다.One embodiment discloses a method and device for processing data using multiple cores.

제 1 측면에 따른 복수개의 코어들을 이용하여 데이터를 처리하는 방법은 상기 복수개의 코어들에 대응되는 복수개의 캐시 뱅크들 중 일부의 캐시 뱅크들에 저장할 데이터를 수신하는 단계; 상기 수신한 데이터의 쓰기 집중도(write intensity)에 따라, 상기 일부의 캐시 뱅크들에 상기 수신한 데이터를 분할(partitioning)하여 전송하는 단계; 및 상기 일부의 캐시 뱅크들 중 하나인 제1 캐시 뱅크에 전송된 데이터를 상기 제1 캐시 뱅크에 저장하는 단계를 포함할 수 있다.A method for processing data using a plurality of cores according to a first aspect may include: a step of receiving data to be stored in some of a plurality of cache banks corresponding to the plurality of cores; a step of partitioning and transmitting the received data to some of the cache banks according to a write intensity of the received data; and a step of storing data transmitted to a first cache bank, which is one of the some of the cache banks, in the first cache bank.

또한, 상기 제1 캐시 뱅크에 전송된 데이터를 상기 제1 캐시 뱅크에 저장하는 단계는 상기 제1 캐시 뱅크에 전송된 데이터의 쓰기 집중도에 따라, 상기 제1 캐시 뱅크에 포함된 쓰기 특성이 상이한 복수개의 메모리 중 하나의 메모리에 상기 제1 캐시 뱅크에 전송된 데이터를 저장할 수 있다.In addition, the step of storing the data transferred to the first cache bank in the first cache bank may store the data transferred to the first cache bank in one of a plurality of memories having different write characteristics included in the first cache bank, depending on the write concentration of the data transferred to the first cache bank.

또한, 캐시 엑세스 요청을 수신하는 단계; 및 상기 수신한 캐시 엑세스 요청에 따라 상기 제1 캐시 뱅크에 저장된 데이터를 출력하는 단계를 더 포함할 수 있다.In addition, the method may further include a step of receiving a cache access request; and a step of outputting data stored in the first cache bank according to the received cache access request.

또한, 상기 분할하여 전송하는 단계는 상기 일부의 캐시 뱅크들에 쓰기(writes)가 균등하게(uniformly) 분배(distribute)되도록 상기 수신한 데이터를 분할하여 상기 일부의 캐시 뱅크들에 전송할 수 있다.Additionally, the step of dividing and transmitting may divide the received data and transmit it to some of the cache banks so that writes are uniformly distributed to the some of the cache banks.

또한, 상기 복수개의 메모리는 단위 데이터 쓰기에 요구되는 대기 시간(latency time) 또는 에너지가 기설정된 값 미만인 제1 메모리 및 기설정된 값 이상인 제2 메모리를 포함할 수 있다.Additionally, the plurality of memories may include a first memory having a latency time or energy required for writing unit data less than a preset value and a second memory having a latency time or energy greater than or equal to the preset value.

또한, 상기 제1 캐시 뱅크에 전송된 데이터를 상기 제1 캐시 뱅크에 저장하는 단계는 상기 제1 캐시 뱅크에 전송된 데이터의 쓰기 집중도가 기설정된 값보다 낮은 경우, 상기 제1 캐시 뱅크에 전송된 데이터를 상기 제2 메모리에 저장할 수 있다.In addition, the step of storing the data transferred to the first cache bank in the first cache bank may store the data transferred to the first cache bank in the second memory if the write concentration of the data transferred to the first cache bank is lower than a preset value.

또한, 상기 제1 캐시 뱅크에 전송된 데이터를 상기 제1 캐시 뱅크에 저장하는 단계는 상기 제1 캐시 뱅크에 전송된 데이터의 쓰기 집중도가 기설정된 값보다 높은 경우, 상기 제1 캐시 뱅크에 전송된 데이터를 상기 제1 메모리에 저장할 수 있다.In addition, the step of storing the data transferred to the first cache bank in the first cache bank may store the data transferred to the first cache bank in the first memory if the write concentration of the data transferred to the first cache bank is higher than a preset value.

또한, 상기 제2 메모리는 STT-RAM(Spin Transfer Torque RAM) 및 PCRAM(Phase Change Random Access Memory) 중 적어도 하나를 포함하고, 상기 제1 메모리는 SRAM(static random access memory)을 포함할 수 있다.Additionally, the second memory may include at least one of STT-RAM (Spin Transfer Torque RAM) and PCRAM (Phase Change Random Access Memory), and the first memory may include SRAM (static random access memory).

또한, 상기 제1 캐시 뱅크에 전송된 데이터에 대응되는 프로그램 카운터에 의해 엑세스된 데이터의 쓰기 집중도 히스토리(write intensity history)를 이용하여 상기 제1 캐시 뱅크에 전송된 데이터의 쓰기 집중도를 결정하는 단계를 더 포함할 수 있다.In addition, the method may further include a step of determining a write intensity of data transferred to the first cache bank by using a write intensity history of data accessed by a program counter corresponding to the data transferred to the first cache bank.

또한, 상기 제1 캐시 뱅크에 전송된 데이터에 대응되는 어플리케이션에 대한 쓰기 집중도 히스토리(write intensity history)를 이용하여 상기 제1 캐시 뱅크에 전송된 데이터의 쓰기 집중도를 결정하는 단계를 더 포함할 수 있다.In addition, the method may further include a step of determining a write intensity of data transferred to the first cache bank by using a write intensity history for an application corresponding to the data transferred to the first cache bank.

또한, 제 2 측면에 따른 복수개의 코어들을 이용하여 데이터를 처리하는 디바이스는 상기 복수개의 코어들에 대응되는 복수개의 캐시 뱅크들 중 일부의 캐시 뱅크들에 저장할 데이터를 수신하는 수신부; 상기 수신한 데이터의 쓰기 집중도(write intensity)에 따라, 상기 일부의 캐시 뱅크들에 상기 수신한 데이터를 분할(partitioning)하여 전송하는 프로세서; 및 상기 일부의 캐시 뱅크들 중 하나인 제1 캐시 뱅크에 전송된 데이터를 저장하는 상기 제1 캐시 뱅크를 포함할 수 있다.In addition, a device for processing data using a plurality of cores according to a second aspect may include a receiving unit for receiving data to be stored in some of a plurality of cache banks corresponding to the plurality of cores; a processor for partitioning and transmitting the received data to some of the cache banks according to a write intensity of the received data; and a first cache bank for storing the transmitted data in a first cache bank, which is one of the some of the cache banks.

또한, 상기 제1 캐시 뱅크는 상기 제1 캐시 뱅크에 전송된 데이터의 쓰기 집중도에 따라, 상기 제1 캐시 뱅크에 포함된 쓰기 특성이 상이한 복수개의 메모리 중 하나의 메모리에 상기 제1 캐시 뱅크에 전송된 데이터를 저장할 수 있다.Additionally, the first cache bank may store data transferred to the first cache bank in one of a plurality of memories having different write characteristics included in the first cache bank, depending on the write concentration of the data transferred to the first cache bank.

또한, 제 1 측면에 따른, 데이터 처리 방법을 컴퓨터에서 구현하기 위한 프로그램이 기록된 컴퓨터로 판독 가능한 기록 매체를 제공할 수 있다.In addition, a computer-readable recording medium having recorded thereon a program for implementing a data processing method according to the first aspect on a computer can be provided.

도 1은 일 실시 예에 따른 디바이스를 나타낸 블록도이다.
도 2는 복수개의 코어들이 동작하는 컴퓨팅 환경에서 일 실시 예에 따른 디바이스가 동작하는 일 예를 나타내는 도면이다.
도 3은 일 실시 예에 따른 디바이스가 복수개의 메모리를 포함하는 캐시 뱅크에 데이터를 저장하는 일 예를 나타내는 도면이다.
도 4는 일 실시 예에 따른 디바이스가 포인터를 이용하여 데이터를 처리하는 일 예를 나타내는 도면이다.
도 5는 일 실시 예에 따른 디바이스가 복수개의 캐시 뱅크들에 데이터를 분할하여 저장하는 일 예를 나타내는 도면이다.
도 6은 복수개의 코어들이 동작하는 컴퓨팅 환경에서 일 실시 예에 따른 디바이스가 데이터를 전송하고 저장하는 일 예를 나타내는 흐름도이다.
도 7은 디바이스가 캐시 엑세스 요청에 따라 데이터를 출력하는 일 예를 나타내는 흐름도이다.
도 8은 일 실시 예에 따른 디바이스가 복수개의 메모리 중에서 획득한 데이터의 쓰기 집중도에 따라 결정된 메모리에 획득한 데이터를 저장하는 일 예를 나타내는 흐름도이다.
도 9는 복수개의 타일들이 동작하는 컴퓨팅 환경에서 일 실시 예에 따른 디바이스가 동작하는 일 예를 나타내는 블록도이다.
도 10은 일 실시 예에 따른 디바이스가 쓰기 집중도를 예측하거나 모니터링하여 데이터를 처리하는 일 예를 나타내는 블록도이다.FIG. 1 is a block diagram illustrating a device according to one embodiment.
FIG. 2 is a diagram illustrating an example of a device according to an embodiment of the present invention operating in a computing environment in which multiple cores operate.
FIG. 3 is a diagram illustrating an example of a device storing data in a cache bank including a plurality of memories according to one embodiment.
FIG. 4 is a diagram illustrating an example of a device processing data using a pointer according to one embodiment.
FIG. 5 is a diagram illustrating an example of a device according to one embodiment of the present invention dividing and storing data in a plurality of cache banks.
FIG. 6 is a flowchart illustrating an example of a device transmitting and storing data in a computing environment in which multiple cores operate, according to one embodiment of the present invention.
Figure 7 is a flowchart illustrating an example in which a device outputs data in response to a cache access request.
FIG. 8 is a flowchart illustrating an example of a device storing acquired data in a memory determined based on the write concentration of the acquired data among a plurality of memories according to one embodiment.
FIG. 9 is a block diagram illustrating an example of a device operating in a computing environment in which multiple tiles operate, according to one embodiment.
FIG. 10 is a block diagram illustrating an example of a device processing data by predicting or monitoring write concentration according to one embodiment.

이하 첨부된 도면을 참조하면서 오로지 예시를 위한 실시 예를 상세히 설명하기로 한다. 하기 실시 예는 기술적 내용을 구체화하기 위한 것일 뿐 권리 범위를 제한하거나 한정하는 것이 아님은 물론이다. 상세한 설명 및 실시 예로부터 해당 기술분야의 전문가가 용이하게 유추할 수 있는 것은 권리범위에 속하는 것으로 해석된다.Hereinafter, with reference to the attached drawings, embodiments for illustrative purposes only will be described in detail. It should be noted that the following embodiments are intended only to specify technical contents and do not limit or restrict the scope of the rights. What a specialist in the relevant technical field can easily infer from the detailed description and embodiments is interpreted as falling within the scope of the rights.

본 명세서에서 사용되는 '구성된다' 또는 '포함한다' 등의 용어는 명세서 상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.The terms “comprises” or “comprising” used in this specification should not be construed to necessarily include all of the components or steps described in the specification, and some of the components or steps may not be included, or may include additional components or steps.

또한, 본 명세서에서 사용되는 '제 1' 또는 '제 2' 등과 같이 서수를 포함하는 용어는 다양한 구성 요소들을 설명하는데 사용할 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만 사용된다. Additionally, terms including ordinal numbers, such as "first" or "second," used herein may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.

본 실시 예들은 렌더링 방법 및 장치에 관한 것으로서 이하의 실시 예들이 속하는 기술 분야에서 통상의 지식을 가진 자에게 널리 알려져 있는 사항들에 관해서는 자세한 설명을 생략한다.The present embodiments relate to a rendering method and device, and a detailed description of matters that are widely known to those skilled in the art to which the embodiments pertain is omitted.

도 1은 일 실시 예에 따른 디바이스(100)를 나타낸 블록도이다. 도 1에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 더 포함될 수 있음을 관련 기술 분야에서 통상의 지식을 가진 자라면 이해할 수 있다. FIG. 1 is a block diagram illustrating a device (100) according to one embodiment. Those skilled in the art will understand that, in addition to the components illustrated in FIG. 1, other general components may be included.

도 1을 참조하면, 디바이스(100)는 수신부(110), 프로세서(120) 및 제1 캐시 뱅크(131)를 포함할 수 있다.Referring to FIG. 1, the device (100) may include a receiver (110), a processor (120), and a first cache bank (131).

복수개의 코어들을 이용하여 데이터를 처리함에 있어서, 디바이스(100)는 복수개의 코어들 중 하나의 코어에 대응될 수 있다. 예를 들면, 복수개의 코어들 중 하나의 코어와 디바이스(100)는 하나의 타일에 포함될 수 있다. 일 예에 따른 타일은 코어와 캐시 뱅크를 포함하는 데이터 처리 단위를 의미할 수 있다. 일 예에 따른 캐시 뱅크는 캐시가 복수의 위치에 분산되어 배치될 때, 분산되어 배치된 캐시의 각 부분을 의미할 수 있다. 일 예에 따른 분산 캐시는 캐시가 복수의 위치에 분산되어 배치되는 방식을 의미할 수 있다.In processing data using multiple cores, the device (100) may correspond to one core among the multiple cores. For example, one core among the multiple cores and the device (100) may be included in one tile. A tile according to an example may mean a data processing unit including a core and a cache bank. A cache bank according to an example may mean each part of a cache that is distributed and arranged when the cache is distributed and arranged in multiple locations. A distributed cache according to an example may mean a method in which a cache is distributed and arranged in multiple locations.

일 실시 예에 따른 수신부(110)는 디바이스(100)의 외부로부터 데이터를 수신할 수 있다. 일 예로, 일 실시 예에 따른 수신부(110)는 복수개의 코어들에 대응되는 복수개의 캐시 뱅크들 중 일부의 캐시 뱅크들에 저장할 데이터를 수신할 수 있다. A receiving unit (110) according to one embodiment may receive data from outside the device (100). For example, the receiving unit (110) according to one embodiment may receive data to be stored in some of the cache banks among a plurality of cache banks corresponding to a plurality of cores.

일 실시 예에 따른 제1 캐시 뱅크(131)는 복수개의 코어들에 대응되는 복수개의 캐시 뱅크들 중 하나의 캐시 뱅크일 수 있다. 일 실시 예에 따른 제1 캐시 뱅크(131)는 수신부(110)로부터 데이터를 획득하여 저장할 수 있다. 제1 캐시 뱅크(131)는 수신부(110)로부터 제1 캐시 뱅크(131)로 전송된 데이터를 저장할 수 있다. 일 실시 예에 따른 제1 캐시 뱅크(131)는 쓰기 특성이 상이한 복수개의 메모리를 포함하고, 수신부(110)가 수신한 데이터 중 일부의 데이터를 수신하여 저장할 수 있다. The first cache bank (131) according to one embodiment may be one of a plurality of cache banks corresponding to a plurality of cores. The first cache bank (131) according to one embodiment may obtain and store data from the receiving unit (110). The first cache bank (131) may store data transmitted from the receiving unit (110) to the first cache bank (131). The first cache bank (131) according to one embodiment includes a plurality of memories having different write characteristics, and may receive and store some of the data received by the receiving unit (110).

일 실시 예에 따른 프로세서(120)는 수신부(110) 및 제1 캐시 뱅크(131)를 제어할 수 있다. 일 실시 예에 따른 프로세서(120)는 수신부(110)가 수신한 데이터의 쓰기 집중도(write intensity)에 따라, 일부의 캐시 뱅크들 각각에 수신한 데이터를 분할(partitioning)하여 전송할 수 있다. 또한, 일 실시 예에 따른 프로세서(120)는 제1 캐시 뱅크(131)에 전송된 데이터의 쓰기 집중도에 따라, 제1 캐시 뱅크(131)에 포함된 복수개의 메모리 중 하나의 메모리에 제1 캐시 뱅크(131)에 전송된 데이터를 저장하도록 제1 캐시 뱅크(131)를 제어할 수 있다.A processor (120) according to an embodiment may control a receiving unit (110) and a first cache bank (131). A processor (120) according to an embodiment may partition and transmit data received by the receiving unit (110) to each of some cache banks according to a write intensity of the data received. In addition, a processor (120) according to an embodiment may control a first cache bank (131) to store data transmitted to the first cache bank (131) in one of a plurality of memories included in the first cache bank (131) according to a write intensity of the data transmitted to the first cache bank (131).

도 2는 복수개의 코어들(141 내지 145)이 동작하는 컴퓨팅 환경에서 일 실시 예에 따른 디바이스(100)가 동작하는 일 예를 나타내는 도면이다.FIG. 2 is a diagram showing an example of a device (100) operating in a computing environment in which multiple cores (141 to 145) operate according to one embodiment.

일 실시 예에 따른 수신부(110)는 복수개의 코어들(141 내지 145)에 대응되는 복수개의 캐시 뱅크들(131 내지 135) 중 일부의 캐시 뱅크들(131 내지 134)에 저장할 데이터를 수신할 수 있다. According to one embodiment, a receiving unit (110) may receive data to be stored in some of the cache banks (131 to 134) among the plurality of cache banks (131 to 135) corresponding to the plurality of cores (141 to 145).

분산 캐시 방식에 따라, 캐시(130)는 복수개의 캐시 뱅크들(131 내지 135)을 포함할 수 있다. 또한, 디바이스(100)는 복수개의 코어들(141 내지 145) 및 복수개의 캐시 뱅크들(131 내지 135)을 포함하는 컴퓨팅 환경에서 동작할 수 있다. According to the distributed cache method, the cache (130) may include a plurality of cache banks (131 to 135). In addition, the device (100) may operate in a computing environment including a plurality of cores (141 to 145) and a plurality of cache banks (131 to 135).

복수개의 캐시 뱅크들(131 내지 135)은 복수개의 코어들(141 내지 145) 각각에 대응될 수 있다. 예를 들면, 제1 코어(141)에 제1 캐시 뱅크(131)가 대응되고, 제2 코어(142)에 제2 캐시 뱅크(132)가 대응되고, 제N 코어(145)에 제N 캐시 뱅크(135)가 대응될 수 있다. A plurality of cache banks (131 to 135) may correspond to each of a plurality of cores (141 to 145). For example, a first cache bank (131) may correspond to a first core (141), a second cache bank (132) may correspond to a second core (142), and an Nth cache bank (135) may correspond to an Nth core (145).

수신부(110)는 복수개의 캐시 뱅크들(131 내지 135) 중 일부의 캐시 뱅크들(131 내지 134)에 저장될 데이터를 디바이스(100)의 외부로부터 수신할 수 있다. 일부의 캐시 뱅크들(131 내지 134)이 어떤 캐시 뱅크들인지에 대한 결정은 수신부(110)가 데이터를 수신하기 이전 또는 이후에 수행될 수 있다. 예를 들면, 디바이스(100)는 제2 캐시 뱅크의 잔여 메모리 용량에 대한 정보를 수신부(110)가 데이터를 수신하기 이전에 획득할 수 있다.The receiving unit (110) can receive data to be stored in some of the cache banks (131 to 134) among the plurality of cache banks (131 to 135) from outside the device (100). The determination of which of the cache banks (131 to 134) are which can be performed before or after the receiving unit (110) receives the data. For example, the device (100) can obtain information about the remaining memory capacity of the second cache bank before the receiving unit (110) receives the data.

디바이스(100)는 데이터를 처리함에 있어서, 복수개의 캐시 뱅크들 중 일부 캐시 뱅크를 이용할 수 있다. 예를 들면, 제1 코어(141)에서 처리되는 데이터가 제2 캐시 뱅크(132)에 저장될 수 있다.The device (100) may use some of the cache banks among the plurality of cache banks when processing data. For example, data processed in the first core (141) may be stored in the second cache bank (132).

일 실시 예에 따른 제1 캐시 뱅크(131)는 쓰기 특성이 상이한 복수개의 메모리를 포함할 수 있다. 또한, 제1 캐시 뱅크(131)는 수신부(110)가 수신한 데이터 중 일부의 데이터를 수신하여 저장할 수 있다.The first cache bank (131) according to one embodiment may include a plurality of memories having different write characteristics. In addition, the first cache bank (131) may receive and store some of the data received by the receiving unit (110).

제1 캐시 뱅크(131)는 수신부(110) 또는 프로세서(120)로부터 데이터를 수신하여 저장할 수 있다. 제1 캐시 뱅크(131)는 수신한 데이터를 제1 캐시 뱅크(131)에 포함된 복수개의 메모리 중 하나의 메모리에 저장할 수 있다. 제1 캐시 뱅크(131)에 포함된 복수개의 메모리는 쓰기 특성이 상이할 수 있다. 예를 들면, 제1 캐시 뱅크(131)는 하이브리드 캐시의 특성을 가질 수 있다. 하이브리드 캐시는 속성이 상이한 복수개의 메모리를 포함하는 캐시를 의미할 수 있다. The first cache bank (131) can receive and store data from the receiver (110) or the processor (120). The first cache bank (131) can store the received data in one of the multiple memories included in the first cache bank (131). The multiple memories included in the first cache bank (131) may have different write characteristics. For example, the first cache bank (131) may have the characteristics of a hybrid cache. A hybrid cache may mean a cache including multiple memories with different properties.

복수개의 메모리의 쓰기 특성이 상이할 경우, 복수개의 메모리 각각에 단위 데이터를 쓸 때(write) 요구되는 대기 시간(latency time) 또는 에너지가 상이할 수 있다. 일 실시 예에 따른 복수개의 메모리는 휘발성 메모리와 비휘발성 메모리를 포함할 수 있다. 예를 들면, 복수개의 메모리는 SRAM(static random access memory), STT-RAM(spin transfer torque random access memory), PCRAM(phase change random access memory) 등을 포함할 수 있다.When the write characteristics of the plurality of memories are different, the latency time or energy required to write unit data to each of the plurality of memories may be different. The plurality of memories according to one embodiment may include volatile memory and nonvolatile memory. For example, the plurality of memories may include SRAM (static random access memory), STT-RAM (spin transfer torque random access memory), PCRAM (phase change random access memory), etc.

일 실시 예에 따른 프로세서(120)는 수신부(110)가 수신한 데이터의 쓰기 집중도(write intensity)에 따라, 일부의 캐시 뱅크들(131 내지 134) 각각에 수신한 데이터를 분할(partitioning)하여 전송할 수 있다. 일 실시 예에 따른 프로세서(120)는 복수개의 타일들(171 내지 175) 중에서 일부의 타일들(171 내지 174)에 데이터를 분할하여 전송할 수 있다. 캐시 뱅크 및 코어를 포함하는 각각의 타일은 데이터 처리의 단위로 동작할 수 있다. 일 실시 예에 따라, 제1 타일(171)에 포함된 제1 캐시 뱅크(131) 또는 제1 코어(141)는 제2 타일(171)에서 수행되는 어플리케이션의 데이터를 처리하는데 이용될 수 있다.According to an embodiment, the processor (120) may partition and transmit the received data to each of some of the cache banks (131 to 134) according to the write intensity of the data received by the receiver (110). According to an embodiment, the processor (120) may partition and transmit the data to some of the tiles (171 to 174) among a plurality of tiles (171 to 175). Each tile including a cache bank and a core may operate as a unit of data processing. According to an embodiment, the first cache bank (131) or the first core (141) included in the first tile (171) may be used to process data of an application executed in the second tile (171).

수신부(110)가 수신한 데이터를 복수개의 캐시 뱅크들(131 내지 135) 중 어떤 캐시 뱅크에 저장할지는 수신부(110)가 수신한 데이터의 양 또는 속성 등에 따라 결정될 수 있다. 일 예로, 프로세서(120)는 수신부(110)가 수신한 데이터의 양 또는 속성 등에 따라 필요한 캐시 뱅크의 수를 결정하고, 복수개의 캐시 뱅크들(131 내지 135) 중 이용 가능한 캐시 뱅크들에 수신부(110)가 수신한 데이터를 분산하여 전송할 수 있다. 예를 들면, 프로세서(120)는 수신부(110)를 제어하여 수신부(110)가 수신한 데이터를 일부의 캐시 뱅크들(131 내지 134)에 분산하여 전송할 수 있다. 프로세서(120)는 복수개의 캐시 뱅크들(131 내지 135)을 모니터링하여, 데이터 전송에 적합한 일부의 캐시 뱅크들(131 내지 134)을 결정하고, 결정된 일부의 캐시 뱅크들(131 내지 134)에 수신부(110)가 수신한 데이터를 분산하여 전송할 수 있다.The data received by the receiving unit (110) may be stored in a cache bank among the plurality of cache banks (131 to 135) depending on the amount or properties of the data received by the receiving unit (110). For example, the processor (120) may determine the number of cache banks required depending on the amount or properties of the data received by the receiving unit (110), and may distribute and transmit the data received by the receiving unit (110) to available cache banks among the plurality of cache banks (131 to 135). For example, the processor (120) may control the receiving unit (110) to distribute and transmit the data received by the receiving unit (110) to some of the cache banks (131 to 134). The processor (120) can monitor a plurality of cache banks (131 to 135) to determine some of the cache banks (131 to 134) suitable for data transmission, and distribute and transmit data received by the receiving unit (110) to the determined some of the cache banks (131 to 134).

일부의 캐시 뱅크들(131 내지 134)에 수신부(110)가 수신한 데이터를 분산하여 전송할 때, 수신부(110)가 수신한 데이터의 속성이 이용될 수 있다. 예를 들면, 프로세서(120)는 수신부(110)가 수신한 데이터의 쓰기 집중도에 기초하여 데이터를 분산하여 전송할 수 있다. When transmitting data received by the receiver (110) in a distributed manner to some of the cache banks (131 to 134), the properties of the data received by the receiver (110) may be utilized. For example, the processor (120) may transmit the data in a distributed manner based on the write concentration of the data received by the receiver (110).

일 실시 예에 따른 프로세서(120)는 일부의 캐시 뱅크들(131 내지 134)에 쓰기(writes)가 균등하게(uniformly) 분배(distribute)되도록 수신한 데이터를 분할하여 전송할 수 있다. 예를 들면, 프로세서(120)는 쓰기가 많이 요구되는 데이터를 포함하는 쓰기 집중 파티션(write intensive partition)과, 쓰기가 많이 요구되지 않는 데이터를 포함하는 쓰기 비집중 파티션(non write intensive partition)이 일부의 캐시 뱅크들(131 내지 134)에 균등하게 분배되도록 수신한 데이터를 분할하여 전송할 수 있다.A processor (120) according to an embodiment of the present invention may divide and transmit received data such that writes are uniformly distributed among some of the cache banks (131 to 134). For example, the processor (120) may divide and transmit received data such that a write intensive partition including data requiring many writes and a non write intensive partition including data not requiring many writes are uniformly distributed among some of the cache banks (131 to 134).

균등의 의미는 물리적인 균등을 의미하지 않는다. 균등의 의미는 기설정된 오차 범위 내에서의 균등을 의미할 수 있다. 예를 들면, 100의 데이터가 3개의 캐시 뱅크들에 균등하게 분배되는 경우, 기설정된 오차 범위에 따라서, 100의 데이터는 3개의 캐시 뱅크들에 35, 32, 33으로 분배될 수 있다. The meaning of equality does not mean physical equality. The meaning of equality can mean equality within a preset error range. For example, if data of 100 is evenly distributed to three cache banks, the data of 100 can be distributed to 35, 32, and 33 in the three cache banks according to the preset error range.

또한, 일 실시 예에 따른 프로세서(120)는 제1 캐시 뱅크(131)에 전송된 데이터의 쓰기 집중도에 따라, 제1 캐시 뱅크(131)에 포함된 복수개의 메모리 중 하나의 메모리에 제1 캐시 뱅크(131)에 전송된 데이터를 저장하도록 제1 캐시 뱅크(131)를 제어할 수 있다.In addition, the processor (120) according to one embodiment may control the first cache bank (131) to store data transferred to the first cache bank (131) in one of a plurality of memories included in the first cache bank (131) according to the write concentration of data transferred to the first cache bank (131).

제1 캐시 뱅크(131)에 포함된 복수개의 메모리는 쓰기 특성이 상이할 수 있다. 예를 들어, 쓰기 특성이 상이한 제1 메모리와 제2 메모리가 제1 캐시 뱅크(131)에 포함되고, 제1 메모리의 쓰기 특성이 제2 메모리보다 좋은 경우에 대해 설명한다.A plurality of memories included in the first cache bank (131) may have different write characteristics. For example, a case where a first memory and a second memory having different write characteristics are included in the first cache bank (131) and the write characteristics of the first memory are better than those of the second memory will be described.

제1 캐시 뱅크(131)에 포함된 제1 메모리와 제2 메모리는 쓰기 특성이 상이할 수 있다. 예를 들면, 제1 메모리와 제2 메모리에 단위 데이터를 쓸 때 요구되는 대기 시간 또는 에너지가 상이할 수 있다. 일 예로, 단위 데이터를 제1 메모리에 쓰기 위해 요구되는 대기 시간 또는 에너지는 기설정된 값 미만이고, 단위 데이터를 제2 메모리에 쓰기 위해 요구되는 대기 시간 또는 에너지는 기설정된 값 이상일 수 있다. 일 실시 예에 따른 프로세서(120)는 쓰기 집중도가 높은 데이터를 제1 메모리에 저장하고, 쓰기 집중도가 낮은 데이터를 제2 메모리에 저장할 수 있다. 일 실시 예에 따른 제1 캐시 뱅크(131)는 쓰기 집중도에 따라 데이터를 제1 메모리 또는 제2 메모리에 분류하여 저장함으로써, 소모되는 리소스를 감소시킬 수 있다. 예를 들면, 제1 메모리에 단위 데이터를 쓸 때 요구되는 대기 시간이 제2 메모리에 단위 데이터를 쓸 때 요구되는 대기 시간보다 짧은 경우, 쓰기 집중도가 높은 데이터를 제1 메모리에 저장함으로써 쓰기에 소요되는 대기 시간을 감소시킬 수 있다. 다른 예로, 제1 메모리에 단위 데이터를 쓸 때 요구되는 에너지가 제2 메모리에 단위 데이터를 쓸 때 요구되는 에너지보다 적은 경우, 쓰기 집중도가 높은 데이터를 제1 메모리에 저장함으로써 쓰기에 소요되는 에너지를 감소시킬 수 있다.The first memory and the second memory included in the first cache bank (131) may have different write characteristics. For example, the waiting time or energy required when writing unit data to the first memory and the second memory may be different. For example, the waiting time or energy required to write unit data to the first memory may be less than a preset value, and the waiting time or energy required to write unit data to the second memory may be greater than or equal to the preset value. The processor (120) according to one embodiment may store data with high write intensity in the first memory, and data with low write intensity in the second memory. The first cache bank (131) according to one embodiment may classify and store data in the first memory or the second memory according to the write intensity, thereby reducing consumed resources. For example, if the waiting time required when writing unit data to the first memory is shorter than the waiting time required when writing unit data to the second memory, the waiting time required for writing may be reduced by storing the data with high write intensity in the first memory. As another example, if the energy required to write unit data to the first memory is less than the energy required to write unit data to the second memory, the energy required for writing can be reduced by storing data with high write intensity in the first memory.

쓰기 집중도에 따라, 쓰기 특성이 상이한 복수개의 메모리 중 하나의 메모리에 데이터를 저장하는 보다 구체적인 실시 예는 도 3에서 후술한다.A more specific example of storing data in one of multiple memories with different write characteristics depending on the write concentration is described later in FIG. 3.

일 실시 예에 따른 프로세서(120)는 데이터 처리 결과를 외부 디바이스(150)로 출력할 수 있다. 또는 프로세서(120)는 데이터 처리 과정에서 필요한 데이터를 수신부(110) 뿐 아니라 외부 디바이스(150)를 통해 수신할 수 있다.According to one embodiment, the processor (120) may output the data processing result to an external device (150). Alternatively, the processor (120) may receive data required during the data processing process through the external device (150) as well as the receiving unit (110).

일 실시 예에 따라 프로세서(120)에서 출력되는 데이터가 영상 데이터인 경우, 프로세서(120)는 외부 디바이스(150)로 영상 데이터를 출력할 수 있다. 외부 디바이스(150)는 디스플레이를 포함할 수 있다. 외부 디바이스(150)가 디스플레이이고, 프로세서(120)에서 출력되는 데이터가 영상 데이터인 경우, 외부 디바이스(150)는 프로세서(120)로부터 영상 데이터를 수신하여 영상을 디스플레이할 수 있다.According to one embodiment, if the data output from the processor (120) is image data, the processor (120) may output the image data to an external device (150). The external device (150) may include a display. If the external device (150) is a display and the data output from the processor (120) is image data, the external device (150) may receive the image data from the processor (120) and display the image.

도 2에서는 일 실시 예에 따라 디바이스(100)가 제1 타일(171)에 포함되지 않도록 도시되었지만, 다른 실시 예에 따를 때, 디바이스(100)가 제1 타일(171)에 포함될 수 있다. 또는 다른 실시 예에 따를 때, 수신부(110) 및/또는 프로세서(120)가 제1 타일(171)에 포함될 수 있다. In FIG. 2, the device (100) is illustrated as not being included in the first tile (171) according to one embodiment, but in another embodiment, the device (100) may be included in the first tile (171). Or, in another embodiment, the receiver (110) and/or the processor (120) may be included in the first tile (171).

또한, 도 2에서는 일 실시 예에 따라 프로세서(120)가 제1 코어(141)와 별개의 구성으로 도시되었지만, 다른 실시 예에 따를 때, 프로세서(120)의 일부 또는 전부의 기능을 제1 코어(141)가 수행할 수 있다.Additionally, although in FIG. 2, the processor (120) is depicted as a separate configuration from the first core (141) according to one embodiment, in another embodiment, the first core (141) may perform some or all of the functions of the processor (120).

일 실시 예에 따른 디바이스(100)는 분산 캐시 방식에 따라 스케일러블(scalable)하게 동작할 수 있다. 예를 들면, 이용되는 코어의 개수가 늘어나는 만큼 퍼포먼스가 증가할 수 있다.A device (100) according to one embodiment can operate scalably according to a distributed cache method. For example, performance can increase as the number of cores used increases.

도 3은 일 실시 예에 따른 디바이스(100)가 복수개의 메모리를 포함하는 캐시 뱅크에 데이터를 저장하는 일 예를 나타내는 도면이다.FIG. 3 is a diagram showing an example of a device (100) storing data in a cache bank including a plurality of memories according to one embodiment.

일 실시 예에 따른 디바이스(100)는 캐시 뱅크에 저장할 데이터의 속성에 따라 캐시 뱅크에 저장할 데이터를 캐시 뱅크 내에 포함된 복수개의 메모리에 분류하여 저장함으로써, 뱅크 내 최적화(intra-bank optimization)를 수행할 수 있다.A device (100) according to one embodiment can perform intra-bank optimization by classifying and storing data to be stored in a cache bank in multiple memories included in the cache bank according to the properties of the data to be stored in the cache bank.

도 3을 참조하면, 일 예로, 제1 캐시 뱅크(131)가 쓰기 특성이 좋은 제1 메모리(310)와 쓰기 특성이 나쁜 제2 메모리(320)를 포함하는 경우에 대해 설명한다. Referring to FIG. 3, as an example, a case is described where the first cache bank (131) includes a first memory (310) with good write characteristics and a second memory (320) with bad write characteristics.

쓰기 특성이 좋은 제1 메모리(310)에 단위 데이터를 쓸 때 요구되는 대기 시간 또는 에너지는 기설정된 값 이하일 수 있다. 쓰기 특성이 나쁜 제2 메모리(320)에 단위 데이터를 쓸 때 요구되는 대기 시간 또는 에너지는 기설정된 값 초과일 수 있다. 예를 들면, 제1 메모리(310)는 SRAM이고, 제2 메모리(320)는 STT-RAM 또는 PCRAM일 수 있다. 다른 예로, 제1 메모리(310)는 휘발성 메모리이고, 제2 메모리는 비휘발성 메모리일 수 있다. 비휘발성 메모리는 STT-RAM, FeRAM(Ferroelectric Random Access Memory), MRAM(Magnetic Random Access Memory), 또는 PCM(Phase Change Memory) 등일 수 있다. 초고속 및 저전력으로 동작할 수 있는 STT-RAM은 다른 비휘발성 메모리(예를 들어, FeRAM, MRAM, 및 PCM)에 비해 에너지 소비(energy dissipation)가 적으며, 특히 쓰기 에너지가 다른 비휘발성 메모리에 비해 낮은 특성을 가질 수 있다. 또한, STT-RAM은 다른 비휘발성 메모리에 비해 high endurance, high density, high scalability의 특성을 가질 수 있다. 일 예로, 디바이스(100)는 쓰기 집중 데이터(write intensive data)를 SRAM에 저장하고 쓰기 집중 데이터(non write intensive data)를 STT-RAM에 저장하여 대부분의 쓰기 동작이 SRAM에서 일어나도록 할 수 있다.When writing unit data to a first memory (310) having good write characteristics, the required waiting time or energy may be less than a preset value. When writing unit data to a second memory (320) having bad write characteristics, the required waiting time or energy may be more than a preset value. For example, the first memory (310) may be SRAM, and the second memory (320) may be STT-RAM or PCRAM. As another example, the first memory (310) may be volatile memory, and the second memory may be nonvolatile memory. The nonvolatile memory may be STT-RAM, FeRAM (Ferroelectric Random Access Memory), MRAM (Magnetic Random Access Memory), or PCM (Phase Change Memory). STT-RAM, which can operate at ultra-high speed and low power, has lower energy dissipation than other nonvolatile memories (e.g., FeRAM, MRAM, and PCM), and in particular, may have a characteristic of lower write energy than other nonvolatile memories. In addition, STT-RAM can have the characteristics of high endurance, high density, and high scalability compared to other non-volatile memories. For example, the device (100) can store write intensive data in SRAM and non-write intensive data in STT-RAM so that most write operations occur in SRAM.

디바이스(100)는 제1 캐시 뱅크(131)에 저장할 데이터의 쓰기 집중도를 예측 또는 결정하고, 예측 또는 결정 결과에 따라 데이터를 제1 메모리(310) 또는 제2 메모리(320)에 저장할 수 있다. 예를 들면, 디바이스(100)는 쓰기 집중도가 기설정된 값 이상으로 예측된 데이터를 제1 메모리(310)에 저장하고, 쓰기 집중도가 기설정된 값 미만으로 예측된 데이터를 제2 메모리(320)에 저장할 수 있다. 디바이스(100)는 쓰기 집중도가 높은 데이터를 쓰기 특성이 좋은 메모리에 저장함으로서, 쓰기에 요구되는 대기 시간 또는 에너지를 감소시킬 수 있다.The device (100) can predict or determine the write concentration of data to be stored in the first cache bank (131), and store the data in the first memory (310) or the second memory (320) according to the prediction or determination result. For example, the device (100) can store data whose write concentration is predicted to be higher than a preset value in the first memory (310), and store data whose write concentration is predicted to be lower than the preset value in the second memory (320). By storing data with high write concentration in a memory with good write characteristics, the device (100) can reduce the waiting time or energy required for writing.

일 실시 예에 따른 제1 캐시 뱅크(131)는 분산 캐시(distributed cache)에 따른 하이브리드 캐시의 캐시 뱅크일 수 있다. 하이브리드 캐시는 속성이 상이한 복수개의 메모리를 포함하므로, 하이브리드 캐시의 캐시 뱅크도 속성이 상이한 복수개의 메모리를 포함할 수 있다.According to one embodiment, the first cache bank (131) may be a cache bank of a hybrid cache according to a distributed cache. Since the hybrid cache includes a plurality of memories with different properties, the cache bank of the hybrid cache may also include a plurality of memories with different properties.

일 실시 예에 따른 디바이스(100)가 저장할 데이터의 쓰기 집중도를 예측하여 데이터를 제1 메모리(310) 또는 제2 메모리(320)에 저장할 때, PHC(prediction hybrid cache)에서 이용되는 방법을 이용할 수 있다. 예를 들면, 디바이스(100)는 저장할 데이터의 쓰기 집중도를 예측할 때 프로그램 카운터를 기반으로 할 수 있다. 일 실시 예에 따른 디바이스(100)는 제1 프로그램 카운터에 의해 엑세스된 데이터의 쓰기 집중도 히스토리(write intensity history)를 이용해 다음에 제1 프로그램 카운터에 의해 엑세스되는 데이터의 쓰기 집중도를 예측할 수 있다. 동일한 명령(instruction)이 엑세스(access)한 데이터의 특성이 비슷할 경우, 쓰기 집중도에 대한 예측의 정확도가 높아질 수 있다. 다른 예로, 디바이스(100)는 저장할 데이터의 쓰기 집중도를 예측할 때 예측 대상이 되는 어플리케이션을 기초로 예측할 수 있다. 일 실시 예에 따른 디바이스(100)는 제1 캐시 뱅크에 전송된 데이터에 대응되는 어플리케이션에 대한 쓰기 집중도 히스토리(write intensity history)를 이용하여 제1 캐시 뱅크에 전송된 데이터의 쓰기 집중도를 결정할 수 있다.When the device (100) according to one embodiment predicts the write intensity of data to be stored and stores the data in the first memory (310) or the second memory (320), a method used in a prediction hybrid cache (PHC) may be used. For example, the device (100) may predict the write intensity of data to be stored based on a program counter. The device (100) according to one embodiment may predict the write intensity of data accessed by the first program counter using the write intensity history of data accessed by the first program counter. When the characteristics of data accessed by the same instruction are similar, the accuracy of the prediction for the write intensity may be increased. As another example, the device (100) may predict the write intensity of data to be stored based on an application that is a target of the prediction. A device (100) according to one embodiment may determine the write intensity of data transferred to a first cache bank by using a write intensity history for an application corresponding to the data transferred to the first cache bank.

도 4는 일 실시 예에 따른 디바이스(100)가 포인터를 이용하여 데이터를 처리하는 일 예를 나타내는 도면이다.FIG. 4 is a drawing showing an example of a device (100) processing data using a pointer according to one embodiment.

포인터는 태그에서 데이터로의 포인터(tag to data pointer)를 의미할 수 있다. 캐시 또는 캐시 뱅크는 태그 영역(410) 및 데이터 영역(420)을 포함할 수 있다. 태그 영역(410)은 데이터 영역(420)에 저장된 데이터에 대한 정보를 포함할 수 있다. 예를 들면, 태그 영역(410)은 데이터 영역(420)에 저장된 데이터의 유효 여부, 엑세스 타임, 주소 등에 대한 정보를 저장할 수 있다. 일 예로, 제1 태그(411)는 제1 데이터(421)의 관련 정보, 제2 태그(412)는 제2 데이터(422)의 관련 정보, 제3 태그(413)는 제3 데이터(423)의 관련 정보를 포함할 수 있다. 또한 포인터는 태그와 데이터를 상호 대응시키기 위한 정보를 포함할 수 있다. 예를 들면, 제1 포인터(431)는 제1 태그(411)와 제1 데이터(421)를 상호 대응시키고, 제2 포인터(432)는 제2 태그(412)와 제2 데이터(422)를 상호 대응시키고, 제3 포인터(433)는 제3 태그(413)와 제3 데이터(423)를 상호 대응시킬 수 있다. A pointer may mean a tag to data pointer. A cache or cache bank may include a tag area (410) and a data area (420). The tag area (410) may include information about data stored in the data area (420). For example, the tag area (410) may store information about whether data stored in the data area (420) is valid, an access time, an address, etc. For example, a first tag (411) may include information related to first data (421), a second tag (412) may include information related to second data (422), and a third tag (413) may include information related to third data (423). In addition, a pointer may include information for making tags and data correspond to each other. For example, a first pointer (431) may correspond to a first tag (411) and a first data (421), a second pointer (432) may correspond to a second tag (412) and a second data (422), and a third pointer (433) may correspond to a third tag (413) and a third data (423).

일 실시 예에 따른 디바이스(100)는 태그 영역(410)에 저장된 하나 이상의 태그를 갱신하거나, 포인터를 갱신하여, 데이터 쓰기 횟수를 감소시킬 수 있다. 예를 들면, 데이터의 위치를 상호 교환해야 할 때, 일 실시 예에 따른 디바이스(100)는 데이터 영역(420)에 저장된 데이터의 위치를 갱신하지 않고, 포인터만을 갱신함으로써, 데이터의 위치가 상호 교환된 결과를 획득할 수 있다. 다른 예로, 데이터를 특정 위치에 써야 할 때, 일 실시 예에 따른 디바이스(100)는 데이터들간의 순차적인 위치 이동 없이, 삭제될 데이터의 위치에 신규 데이터를 쓰고 태그 또는 포인터의 값을 갱신할 수 있다. 이 경우, 데이터들의 위치 갱신은 태그 또는 포인터의 갱신을 통해 수행될 수 있다.The device (100) according to one embodiment can reduce the number of times data is written by updating one or more tags stored in the tag area (410) or updating a pointer. For example, when the positions of data need to be exchanged, the device (100) according to one embodiment can obtain a result in which the positions of data are exchanged by updating only the pointer without updating the positions of data stored in the data area (420). As another example, when data needs to be written to a specific position, the device (100) according to one embodiment can write new data to the position of data to be deleted and update the value of the tag or pointer without sequentially moving the positions of the data. In this case, the position update of the data can be performed by updating the tag or pointer.

도 5는 일 실시 예에 따른 디바이스(100)가 복수개의 캐시 뱅크들에 데이터를 분할하여 저장하는 일 예를 나타내는 도면이다.FIG. 5 is a diagram showing an example of a device (100) according to one embodiment of the present invention dividing and storing data in a plurality of cache banks.

일 실시 예에 따른 디바이스(100)는 복수개의 캐시 뱅크들 중 일부의 캐시 뱅크들(131 내지 134)에 데이터를 분할하여 전송할 수 있다. 디바이스(100)는 데이터를 전송할 때, 전송할 데이터의 쓰기 집중도에 따라 데이터를 분할하여 일부의 캐시 뱅크들(131 내지 134)에 전송함으로써, 뱅크간 최적화(inter-bank optimization)를 수행할 수 있다. 예를 들면, 디바이스(100)는 SRAM 활용도가 낮은 뱅크의 데이터와 SRAM 활용도가 높은 뱅크의 데이터를 치환하여 전체 캐시의 SRAM 활용도를 높일 수 있다. 따라서, 디바이스(100)는 캐시 뱅크 간에 데이터를 이동시킬 수 있다. 또는 디바이스(100)는 캐시 뱅크들에 기설정된 방식으로 데이터를 분할하여 전송할 수 있다.According to an embodiment, the device (100) may divide and transmit data to some of the cache banks (131 to 134) among a plurality of cache banks. When transmitting data, the device (100) may divide the data and transmit the data to some of the cache banks (131 to 134) according to the write concentration of the data to be transmitted, thereby performing inter-bank optimization. For example, the device (100) may increase the SRAM utilization of the entire cache by replacing data of a bank with low SRAM utilization with data of a bank with high SRAM utilization. Accordingly, the device (100) may move data between cache banks. Alternatively, the device (100) may divide and transmit the data to the cache banks in a preset manner.

도 5를 참조하면, 쓰기 집중도와 관계 없이 데이터를 분할하여 전송한 제1 경우(510)와 쓰기 집중도를 고려하여 데이터를 분할하여 전송한 제2 경우(520)에 대해 설명한다.Referring to FIG. 5, a first case (510) in which data is divided and transmitted regardless of write concentration and a second case (520) in which data is divided and transmitted considering write concentration are described.

제1 경우와 같이 쓰기 집중도와 관계 없이 데이터를 분할하여 전송할 경우, 데이터를 저장하는 일부의 캐시 뱅크들(131 내지 134) 각각에 저장되는 데이터의 쓰기 집중도는 상이할 수 있다. 예를 들면, 제1 캐시 뱅크(131) 및 제3 캐시 뱅크(133)은 쓰기 집중도가 높은 파티션을 각각 두 개씩 포함하고, 제2 캐시 뱅크(132) 및 제4 캐시 뱅크(134)는 쓰기 집중도가 높은 파티션을 포함하지 않을 수 있다. 일 실시 예에 따른 파티션은 논리적인(logical) 캐시 데이터 그룹을 의미할 수 있다. 일 예로 파티션은 하나의 어플리케이션에 의해 엑세스되는 데이터를 포함할 수 있다.When data is divided and transmitted regardless of the write intensity as in the first case, the write intensity of the data stored in each of the cache banks (131 to 134) that store the data may be different. For example, the first cache bank (131) and the third cache bank (133) may each include two partitions with high write intensity, and the second cache bank (132) and the fourth cache bank (134) may not include partitions with high write intensity. A partition according to an embodiment may mean a logical cache data group. For example, a partition may include data accessed by one application.

그러나 제2 경우와 같이 쓰기 집중도에 따라 데이터를 분할하여 전송할 경우, 데이터를 저장하는 일부의 캐시 뱅크들(131 내지 134) 각각에 저장되는 데이터의 쓰기 집중도는 균등할 수 있다. 예를 들면, 제1 캐시 뱅크(131) 내지 제4 캐시 뱅크(134)는 각각 쓰기 집중도가 높은 파티션 1개씩을 포함할 수 있다. 상술한 바와 같이, 균등의 의미는 물리적인 균등이 아니라 기설정된 오차 범위 내에서의 균등을 의미할 수 있다.However, when data is divided and transmitted according to write concentration as in the second case, the write concentration of data stored in each of some cache banks (131 to 134) that store data may be equal. For example, the first cache bank (131) to the fourth cache bank (134) may each include one partition with high write concentration. As described above, the meaning of equality may mean equality within a preset error range rather than physical equality.

도 6은 복수개의 코어들이 동작하는 컴퓨팅 환경에서 일 실시 예에 따른 디바이스(100)가 데이터를 전송하고 저장하는 일 예를 나타내는 흐름도이다.FIG. 6 is a flowchart illustrating an example of a device (100) transmitting and storing data in a computing environment in which multiple cores operate according to one embodiment.

단계 S610에서 일 실시 예에 따른 디바이스(100)는 복수개의 코어들에 대응되는 복수개의 캐시 뱅크들 중 일부의 캐시 뱅크들에 저장할 데이터를 수신한다.In step S610, the device (100) according to one embodiment receives data to be stored in some of the cache banks among the plurality of cache banks corresponding to the plurality of cores.

디바이스(100)는 복수개의 코어들 및 복수개의 캐시 뱅크들을 포함하는 컴퓨팅 환경에서 동작할 수 있다. 복수개의 캐시 뱅크들은 복수개의 코어들에 대응될 수 있다. The device (100) can operate in a computing environment including a plurality of cores and a plurality of cache banks. The plurality of cache banks can correspond to the plurality of cores.

일 실시 예에 따른 디바이스(100)는 처리할 데이터를 디바이스(100)의 외부로부터 수신하고, 수신한 데이터를 분석하여 복수개의 캐시 뱅크들 중 어떤 캐시 뱅크들을 이용할지 결정하고, 결정된 바에 따라 복수개의 캐시 뱅크들 중 일부 캐시 뱅크들에 수신한 데이터를 전송할 수 있다.According to one embodiment, a device (100) may receive data to be processed from outside the device (100), analyze the received data to determine which of a plurality of cache banks to use, and transmit the received data to some of the plurality of cache banks based on the determination.

단계 S620에서 일 실시 예에 따른 디바이스(100)는 단계 S610에서 수신한 데이터의 쓰기 집중도에 따라 일부의 캐시 뱅크들에 수신한 데이터를 분할(partitioning)하여 전송한다.In step S620, the device (100) according to one embodiment partitions and transmits the data received in step S610 to some cache banks according to the write concentration of the data.

일 실시 예에 따른 디바이스(100)는 데이터를 전송할 때, 전송할 데이터의 쓰기 집중도에 따라 데이터를 분할하여 일부의 캐시 뱅크들에 전송함으로써, 뱅크간 최적화(inter-bank optimization)를 수행할 수 있다.According to one embodiment, a device (100) can perform inter-bank optimization by dividing data and transmitting the data to some cache banks according to the write concentration of the data to be transmitted when transmitting data.

디바이스(100)는 쓰기 집중도에 따라 데이터를 분할하여 전송하기 때문에, 데이터를 저장하는 일부 캐시 뱅크들에 저장되는 데이터의 쓰기 집중도는 균등할 수 있다. 상술한 바와 같이, 균등의 의미는 물리적인 균등이 아니라 기설정된 오차 범위 내에서의 균등을 의미할 수 있다.Since the device (100) divides and transmits data according to the write concentration, the write concentration of data stored in some cache banks that store data may be equal. As described above, the meaning of equality may not mean physical equality but equality within a preset error range.

또는, 일 실시 예에 따른 디바이스(100)는 데이터를 전송할 일부의 캐시 뱅크들의 상황 및 쓰기 집중도에 기초하여 데이터를 분할하여 전송할 수 있다. 예를 들면, 디바이스(100)는 쓰기 특성이 나쁜 메모리로 구성된 캐시 뱅크에는 쓰기 집중도가 높은 데이터의 전송량을 평균보다 낮게 결정할 수 있다. 다른 예로, 디바이스(100)는 쓰기 특성이 좋은 메모리로 구성된 캐시 뱅크에는 쓰기 집중도가 높은 데이터의 전송량을 평균보다 높게 결정할 수 있다.Alternatively, the device (100) according to one embodiment may divide and transmit data based on the status and write concentration of some of the cache banks to which the data is to be transmitted. For example, the device (100) may determine a transmission amount of data with high write concentration to be lower than average for a cache bank configured with memory having poor write characteristics. As another example, the device (100) may determine a transmission amount of data with high write concentration to be higher than average for a cache bank configured with memory having good write characteristics.

단계 S630에서 일 실시 예에 따른 디바이스(100)는 일부의 캐시 뱅크들 중 하나인 제1 캐시 뱅크에 전송된 데이터를 제1 캐시 뱅크에 저장한다. 제1 캐시 뱅크는 캐시를 구성하는 복수개의 캐시 뱅크들 중 하나일 수 있다.In step S630, the device (100) according to one embodiment stores data transmitted to a first cache bank, which is one of some cache banks, in the first cache bank. The first cache bank may be one of a plurality of cache banks constituting the cache.

또한, 일 실시 예에 따른 프로세서(120)는 제1 캐시 뱅크에 전송된 데이터의 쓰기 집중도에 따라, 제1 캐시 뱅크에 포함된 복수개의 메모리 중 하나의 메모리에 제1 캐시 뱅크에 전송된 데이터를 저장하도록 제1 캐시 뱅크를 제어할 수 있다.Additionally, the processor (120) according to one embodiment may control the first cache bank to store data transferred to the first cache bank in one of a plurality of memories included in the first cache bank, depending on the write concentration of the data transferred to the first cache bank.

일 실시 예에 따른 디바이스(100)는 제1 캐시 뱅크에 저장할 데이터의 속성에 따라 제1 캐시 뱅크에 저장할 데이터를 제1 캐시 뱅크 내에 포함된 복수개의 메모리에 분류하여 저장함으로써, 뱅크 내 최적화(intra-bank optimization)를 수행할 수 있다.A device (100) according to one embodiment can perform intra-bank optimization by classifying and storing data to be stored in a first cache bank in a plurality of memories included in the first cache bank according to the properties of the data to be stored in the first cache bank.

예를 들면, 디바이스(100)는 제1 캐시 뱅크가 수신한 데이터에 대해서 쓰기 집중도를 예측하고, 예측 결과에 따라 제1 캐시 뱅크에 포함된 복수개의 메모리 중 하나의 메모리에 제1 캐시 뱅크가 수신한 데이터를 저장할 수 있다. 디바이스(100)는 저장할 데이터에 대한 예측된 쓰기 집중도가 기설정된 값 이상인 경우 제1 캐시 뱅크에 포함된 복수개의 메모리 중 쓰기 특성이 좋은 메모리에 저장하고, 저장할 데이터에 대한 예측된 쓰기 집중도가 기설정된 값 미만인 경우 제1 캐시 뱅크에 포함된 복수개의 메모리 중 쓰기 특성이 나쁜 메모리에 저장할 수 있다. 쓰기 특성이 좋은 메모리에 단위 데이터를 쓸 때 요구되는 대기 시간 또는 에너지는 쓰기 특성이 나쁜 메모리에 단위 데이터를 쓸 때 요구되는 대기 시간 또는 에너지보다 낮을 수 있다.For example, the device (100) may predict a write concentration for data received by the first cache bank, and store the data received by the first cache bank in one of a plurality of memories included in the first cache bank according to the prediction result. If the predicted write concentration for the data to be stored is equal to or greater than a preset value, the device (100) may store the data in a memory having good write characteristics among the plurality of memories included in the first cache bank, and if the predicted write concentration for the data to be stored is less than the preset value, the device may store the data in a memory having bad write characteristics among the plurality of memories included in the first cache bank. The waiting time or energy required when writing unit data in a memory having good write characteristics may be lower than the waiting time or energy required when writing unit data in a memory having bad write characteristics.

다른 예로, 디바이스(100)는 PHC(prediction hybrid cache)에서 이용되는 방법을 이용하여 제1 캐시 뱅크에 포함된 복수개의 메모리 중 하나의 메모리에 제1 캐시 뱅크가 수신한 데이터를 저장할 수 있다.As another example, the device (100) may store data received by the first cache bank in one of the multiple memories included in the first cache bank using a method used in a prediction hybrid cache (PHC).

도 7은 디바이스(100)가 캐시 엑세스 요청에 따라 데이터를 출력하는 일 예를 나타내는 흐름도이다.Figure 7 is a flowchart showing an example of a device (100) outputting data in response to a cache access request.

단계 S710에서 일 실시 예에 따른 디바이스(100)는 캐시 엑세스 요청을 수신한다. 예를 들면, 디바이스(100)는 캐시에 저장된 데이터를 획득하기 위한 요청을 수신할 수 있다.In step S710, the device (100) according to one embodiment receives a cache access request. For example, the device (100) may receive a request to obtain data stored in a cache.

단계 S720에서 일 실시 예에 따른 디바이스(100)는 제1 캐시 뱅크에 저장된 데이터 중 단계 S710에서 수신한 캐시 엑세스 요청에 대응되는 데이터가 있는지 결정한다. 예를 들면, 디바이스(100)는 단계 S710에서 수신한 캐시 엑세스 요청에 따라 제1 캐시 뱅크에 엑세스가 필요한지 여부를 결정할 수 있다. 도 7에서 제1 캐시 뱅크는 분산 캐시에 따른 복수개의 캐시 뱅크들 중 하나의 캐시 뱅크를 의미할 수 있다.In step S720, the device (100) according to one embodiment determines whether there is data corresponding to the cache access request received in step S710 among the data stored in the first cache bank. For example, the device (100) may determine whether access to the first cache bank is required according to the cache access request received in step S710. In FIG. 7, the first cache bank may mean one of a plurality of cache banks according to a distributed cache.

단계 S730에서 일 실시 예에 따른 디바이스(100)는 단계 S710에서 수신한 캐시 엑세스 요청에 대응되는 데이터가 제1 캐시 뱅크에 있는 경우, 단계 S710에서 수신한 캐시 엑세스 요청에 따라 제1 캐시 뱅크에 저장된 데이터를 출력한다. 예를 들면, 디바이스(100)는 단계 S710에서 수신한 캐시 엑세스 요청에 따라 외부 디바이스로 제1 캐시 뱅크에 저장된 데이터를 출력할 수 있다. In step S730, the device (100) according to one embodiment outputs the data stored in the first cache bank according to the cache access request received in step S710, if the data corresponding to the cache access request received in step S710 is in the first cache bank. For example, the device (100) may output the data stored in the first cache bank to an external device according to the cache access request received in step S710.

일 실시 예에 따라 출력되는 데이터가 영상 데이터인 경우, 디바이스(100)는 디스플레이(미도시)로 데이터를 출력할 수 있다. 디스플레이(미도시)는 디바이스(100)로부터 영상 데이터를 수신하여 영상을 디스플레이할 수 있다.In one embodiment, when the data output is image data, the device (100) can output the data to a display (not shown). The display (not shown) can receive image data from the device (100) and display the image.

도 8은 일 실시 예에 따른 디바이스(100)가 복수개의 메모리 중에서 획득한 데이터의 쓰기 집중도에 따라 결정된 메모리에 획득한 데이터를 저장하는 일 예를 나타내는 흐름도이다.FIG. 8 is a flowchart illustrating an example of a device (100) storing acquired data in a memory determined according to the write concentration of the acquired data among multiple memories according to one embodiment of the present invention.

단계 S810에서 일 실시 예에 따른 디바이스(100)는 제1 캐시 뱅크에 저장할 데이터를 획득한다. 예를 들면, 디바이스(100)는 디바이스(100)의 외부로부터 수신한 데이터 중 제1 캐시 뱅크에 저장할 데이터를 결정할 수 있다.In step S810, the device (100) according to one embodiment obtains data to be stored in the first cache bank. For example, the device (100) may determine data to be stored in the first cache bank among data received from outside the device (100).

단계 S820에서 일 실시 예에 따른 디바이스(100)는 단계 S810에서 획득한 데이터의 쓰기 집중도가 기설정된 값 이상인지 결정한다. In step S820, the device (100) according to one embodiment determines whether the write concentration of the data acquired in step S810 is greater than or equal to a preset value.

일 실시 예에 따른 디바이스(100)는 예측 알고리즘 등을 통해서 단계 S810에서 획득한 데이터의 쓰기 집중도를 예측하고, 예측된 쓰기 집중도가 기설정된 값 이상인지 결정할 수 있다. A device (100) according to one embodiment can predict the write concentration of data acquired in step S810 through a prediction algorithm or the like, and determine whether the predicted write concentration is greater than a preset value.

예를 들면, 디바이스(100)는 제1 캐시 뱅크에 전송된 데이터에 대응되는 프로그램 카운터에 의해 엑세스된 데이터의 쓰기 집중도 히스토리(write intensity history)를 이용하여 제1 캐시 뱅크에 전송된 데이터의 쓰기 집중도를 결정할 수 있다. 또한, 디바이스(100)는 프로그램 카운터에 의해 엑세스된 데이터의 쓰기 집중도 히스토리에 따라 결정된 쓰기 집중도가 기설정된 값 이상인지 여부를 결정할 수 있다.For example, the device (100) can determine the write intensity of data transferred to the first cache bank by using the write intensity history of data accessed by the program counter corresponding to the data transferred to the first cache bank. In addition, the device (100) can determine whether the write intensity determined according to the write intensity history of data accessed by the program counter is equal to or greater than a preset value.

다른 예로, 디바이스(100)는 제1 캐시 뱅크에 전송된 데이터에 대응되는 어플리케이션에 대한 쓰기 집중도 히스토리(write intensity history)를 이용하여 제1 캐시 뱅크에 전송된 데이터의 쓰기 집중도를 결정할 수 있다. 또한, 디바이스(100)는 어플리케이션에 대한 쓰기 집중도 히스토리에 따라 결정된 쓰기 집중도가 기설정된 값 이상인지 여부를 결정할 수 있다.As another example, the device (100) may determine the write intensity of data transferred to the first cache bank by using the write intensity history for the application corresponding to the data transferred to the first cache bank. In addition, the device (100) may determine whether the write intensity determined according to the write intensity history for the application is greater than or equal to a preset value.

이하의 단계에서는 제1 메모리의 쓰기 특성이 제2 메모리의 쓰기 특성보다 좋은 경우에 대해 설명한다.The following steps describe a case where the write characteristics of the first memory are better than those of the second memory.

단계 S830에서 일 실시 예에 따른 디바이스(100)는 단계 S810에서 획득한 데이터를 제1 메모리에 저장한다. In step S830, the device (100) according to one embodiment stores the data acquired in step S810 in the first memory.

제1 메모리의 쓰기 특성이 제2 메모리의 쓰기 특성보다 좋기 때문에, 단위 데이터를 제1 메모리에 쓰기 위해 요구되는 대기 시간 또는 에너지는 단위 데이터를 제2 메모리에 쓰기 위해 요구되는 대기 시간 또는 에너지보다 낮을 수 있다. 예를 들면, 제1 메모리는 SRAM이고, 제2 메모리(320)는 STT-RAM 또는 PCRAM일 수 있다. 다른 예로, 제1 메모리(310)는 휘발성 메모리이고, 제2 메모리는 비휘발성 메모리일 수 있다. Since the write characteristics of the first memory are better than those of the second memory, the latency or energy required to write unit data to the first memory may be lower than the latency or energy required to write unit data to the second memory. For example, the first memory may be SRAM, and the second memory (320) may be STT-RAM or PCRAM. As another example, the first memory (310) may be volatile memory, and the second memory may be nonvolatile memory.

일 실시 예에 따른 디바이스(100)는 쓰기 집중도가 기설정된 값 이상인 데이터를 제1 메모리에 저장함으로써, 전체적으로 데이터를 저장하는데 요구되는 대기 시간 또는 에너지를 감소시킬 수 있다.A device (100) according to one embodiment can reduce the overall waiting time or energy required to store data by storing data having a write concentration greater than a preset value in a first memory.

단계 S840에서 일 실시 예에 따른 디바이스(100)는 단계 S810에서 획득한 데이터를 제2 메모리에 저장한다. In step S840, the device (100) according to one embodiment stores the data acquired in step S810 in the second memory.

일 실시 예에 따른 디바이스(100)는 쓰기 집중도가 기설정된 값 미만인 데이터를 제2 메모리에 저장함으로써, 전체적으로 데이터를 저장하는데 요구되는 대기 시간 또는 에너지를 감소시킬 수 있다.A device (100) according to one embodiment can reduce the overall waiting time or energy required to store data by storing data having a write concentration below a preset value in a second memory.

도 9는 복수개의 타일들이 동작하는 컴퓨팅 환경에서 일 실시 예에 따른 디바이스(100)가 동작하는 일 예를 나타내는 블록도이다. FIG. 9 is a block diagram illustrating an example of a device (100) operating in a computing environment in which multiple tiles operate according to one embodiment.

일 실시 예에 따른 디바이스(100)는 복수개의 타일들이 동작하는 컴퓨팅 환경에서 동작할 수 있다. 예를 들면, 디바이스(100)는 제1 타일(171) 및 제2 타일(180)을 포함하는 복수개의 타일들이 동작하는 컴퓨팅 환경에서 동작할 수 있다.A device (100) according to one embodiment may operate in a computing environment in which a plurality of tiles operate. For example, the device (100) may operate in a computing environment in which a plurality of tiles including a first tile (171) and a second tile (180) operate.

일 실시 예에 따른 디바이스(100)는 제1 타일(171)에 포함될 수 있다. 도 9를 참조하면, 디바이스(100)는 뱅크 파티셔닝 하드웨어(910), 모니터링 하드웨어(920), 벌크 인벨리데이트 하드웨어(930), 쓰기 집중도 예측부(940), 쓰기 집중도 모니터(950) 및 제1 캐시 뱅크(131)를 포함할 수 있다. A device (100) according to one embodiment may be included in a first tile (171). Referring to FIG. 9, the device (100) may include bank partitioning hardware (910), monitoring hardware (920), bulk invalidation hardware (930), a write intensity prediction unit (940), a write intensity monitor (950), and a first cache bank (131).

도 9에 도시된 디바이스(100)는 본 실시 예와 관련된 구성요소들을 포함하도록 도시되어 있다. 따라서 디바이스(100)에는 도 9에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 더 포함될 수 있음을 관련 기술 분야에서 통상의 지식을 가진 자라면 이해할 수 있다. 또는 다른 실시 예에 따를 경우, 도 9에 도시된 구성요소들 중 일부 구성 요소는 생략될 수 있음을 관련 기술 분야에서 통상의 지식을 가진 자라면 이해할 수 있다.The device (100) illustrated in FIG. 9 is illustrated to include components related to the present embodiment. Therefore, those skilled in the art will understand that the device (100) may further include other general components in addition to the components illustrated in FIG. 9. Alternatively, those skilled in the art will understand that some of the components illustrated in FIG. 9 may be omitted according to another embodiment.

일 실시 예에 따른 디바이스(100)는 분산 캐시에서 하이브리드 캐시를 효율적으로 활용할 수 있다. 예를 들면, 디바이스(100)는 분산 캐시 방식에 따라 하이브리드 캐시를 이용할 수 있다.A device (100) according to one embodiment can efficiently utilize a hybrid cache in a distributed cache. For example, the device (100) can utilize a hybrid cache according to a distributed cache method.

일 실시 예에 따른 뱅크 파티셔닝 하드웨어(910)는 데이터에 대한 파티셔닝을 수행할 수 있다. 일 예로, 뱅크 파티셔닝 하드웨어(910)는 실행될 어플리케이션에 대해서 캐시 사이즈를 할당할 수 있다. 뱅크 파티셔닝 하드웨어(910)는 파티셔닝을 통해 각 어플리케이션에 할당된 캐시 사이즈를 유지할 수 있다. 뱅크 파티셔닝 하드웨어(910)는 파티셔닝을 통해, 복수개의 코어들이 이용되는 캐시 구조에서 간섭(interference)을 막을 수 있다.Bank partitioning hardware (910) according to one embodiment can perform partitioning on data. For example, bank partitioning hardware (910) can allocate cache size for an application to be executed. Bank partitioning hardware (910) can maintain cache size allocated to each application through partitioning. Bank partitioning hardware (910) can prevent interference in a cache structure where multiple cores are used through partitioning.

일 실시 예에 따른 모니터링 하드웨어(920)는 데이터 또는 캐시의 특성을 모니터링 할 수 있다. 일 실시 예에 따른 모니터링 하드웨어(920)는 캐시의 동작 특성을 모니터링할 수 있다. 일 실시 예에 따른 디바이스(100)는 캐시의 동작 특성에 대한 모니터링 결과를 이용하여, 각 어플리케이션의 동작 특성(캐시 크기에 따른 미스(miss) 변화)를 기반으로 하는 파티셔닝 결정(partitioning decision)을 통해 수행되고 있는 어플리케이션의 퍼포먼스를 향상시킬 수 있다. 일 실시 예에 따른 모니터링 하드웨어(920)는 각 어플리케이션에 요구되는 캐시 사이즈를 모니터링할 수 있다.Monitoring hardware (920) according to an embodiment can monitor the characteristics of data or cache. Monitoring hardware (920) according to an embodiment can monitor the operating characteristics of cache. Device (100) according to an embodiment can improve the performance of an application being executed through a partitioning decision based on the operating characteristics of each application (change in miss according to cache size) by using the monitoring result of the operating characteristics of the cache. Monitoring hardware (920) according to an embodiment can monitor the cache size required for each application.

일 실시 예에 따른 벌크 인벨리데이트 하드웨어(930)는 불필요하다고 판단되는 데이터에 대해서 인벨리데이트(invalidate)를 수행할 수 있다. 예를 들면, 벌크 인벨리데이트 하드웨어(930)는 캐시 사이즈를 주기적으로 업데이트하는 과정에서 불필요하다고 판단되는 데이터를 인벨리데이트(invalidate)시킬 수 있다. 디바이스(100)에서 벌크 인벨리데이트 하드웨어(930)는 경우에 따라 생략 가능하다.Bulk invalidation hardware (930) according to one embodiment can invalidate data that is determined to be unnecessary. For example, bulk invalidation hardware (930) can invalidate data that is determined to be unnecessary during a process of periodically updating a cache size. In some cases, bulk invalidation hardware (930) in the device (100) can be omitted.

일 실시 예에 따른 쓰기 집중도 예측부(940)는 뱅크 내 최적화 과정에서 이용될 수 있다. 예를 들면, 쓰기 집중도 예측부(940)는 제1 캐시 뱅크(131)에 저장할 데이터의 쓰기 집중도를 예측 또는 결정할 수 있다. The write concentration prediction unit (940) according to one embodiment may be used in an optimization process within a bank. For example, the write concentration prediction unit (940) may predict or determine the write concentration of data to be stored in the first cache bank (131).

일 실시 예에 따른 쓰기 집중도 예측부(940)는 저장할 데이터의 쓰기 집중도를 예측할 때, PHC(prediction hybrid cache)에서 이용되는 방법을 이용할 수 있다. 예를 들면, 쓰기 집중도 예측부(940)는 저장할 데이터의 쓰기 집중도를 예측할 때 프로그램 카운터를 기반으로 할 수 있다. 일 실시 예에 따른 쓰기 집중도 예측부(940)는 제2 프로그램 카운터에 의해 엑세스된 데이터의 쓰기 집중도 히스토리를 이용해 다음에 제2 프로그램 카운터에 의해 엑세스되는 데이터의 쓰기 집중도를 예측할 수 있다. 동일한 명령(instruction)이 엑세스(access)한 데이터의 특성이 비슷할 경우, 쓰기 집중도에 대한 예측의 정확도가 높아질 수 있다. 다른 예로, 쓰기 집중도 예측부(940)는 저장할 데이터의 쓰기 집중도를 예측할 때 예측 대상이 되는 어플리케이션을 기초로 예측할 수 있다. 일 실시 예에 따른 쓰기 집중도 예측부(940)는 제1 캐시 뱅크에 전송된 데이터에 대응되는 어플리케이션에 대한 쓰기 집중도 히스토리(write intensity history)를 이용하여 제1 캐시 뱅크에 전송된 데이터의 쓰기 집중도를 결정할 수 있다.The write concentration prediction unit (940) according to an embodiment may utilize a method used in a PHC (prediction hybrid cache) when predicting the write concentration of data to be stored. For example, the write concentration prediction unit (940) may use a program counter as the basis when predicting the write concentration of data to be stored. The write concentration prediction unit (940) according to an embodiment may use the write concentration history of data accessed by the second program counter to predict the write concentration of data accessed by the second program counter next. When the characteristics of data accessed by the same instruction are similar, the accuracy of prediction for the write concentration may be increased. As another example, the write concentration prediction unit (940) may predict the write concentration of data to be stored based on an application that is a target of prediction. A write intensity prediction unit (940) according to one embodiment may determine the write intensity of data transferred to the first cache bank by using the write intensity history for an application corresponding to the data transferred to the first cache bank.

일 실시 예에 따른 쓰기 집중도 모니터(950)는 데이터에 대한 쓰기 집중도를 모니터링 함으로서, 뱅크간 최적화 과정에서 이용될 수 있다. 예를 들면, 쓰기 집중도 모니터(950)의 모니터링 결과에 따라, 데이터는 분산되어 전송될 수 있다. 일 실시 예에 따른 쓰기 집중도 모니터(950)는 동시에 수행되고 있는 각각의 서로 다른 어플리케이션의 쓰기 특성을 파악하고, 캐시 뱅크에 저장된 데이터의 쓰기 특성이 유사해지도록 데이터를 분배하기 위해 이용될 수 있다. The write concentration monitor (950) according to one embodiment may be utilized in the inter-bank optimization process by monitoring the write concentration for data. For example, data may be distributed and transmitted based on the monitoring result of the write concentration monitor (950). The write concentration monitor (950) according to one embodiment may be utilized to identify the write characteristics of each different application being simultaneously executed and to distribute the data so that the write characteristics of data stored in the cache bank become similar.

일 실시 예에 따른 쓰기 집중도 모니터(950)는 어떤 어플리케이션이 쓰기 동작이 많은 데이터를 많이 이용하는지 모니터링할 수 있다. 쓰기 집중도 모니터(950)는 캐시에 데이터가 저장되어 있는 동안 해당 데이터에 일어난 쓰기 횟수를 파악하여 저장할 수 있다. 쓰기 집중도 모니터(950)가 저장한 데이터는 주기적으로 데이터 위치를 업데이트할 때 이용될 수 있다. 일 예로, 쓰기 집중도 모니터(950)는 각 어플리케이션의 쓰기 집중도를 각 캐시 뱅크에서 모니터링할 수 있다. 도한 모니터링 결과는 각 캐시 뱅크에 균등하게 쓰기가 발생하도록 캐시 뱅크에 데이터를 할당하기 위해 이용될 수 있다.The write concentration monitor (950) according to one embodiment can monitor which application uses a lot of data with a lot of write operations. The write concentration monitor (950) can identify and store the number of writes that occurred to the data while the data is stored in the cache. The data stored by the write concentration monitor (950) can be used when periodically updating the data location. For example, the write concentration monitor (950) can monitor the write concentration of each application in each cache bank. In addition, the monitoring result can be used to allocate data to the cache bank so that writes occur evenly in each cache bank.

일 실시 예에 따른 제1 캐시 뱅크(131)는 복수개의 메모리를 포함할 수 있다. 예를 들면, 제1 캐시 뱅크(131)는 쓰기 특성이 좋은 제1 메모리(310)와 쓰기 특성이 나쁜 제2 메모리(320)를 포함할 수 있다.A first cache bank (131) according to one embodiment may include a plurality of memories. For example, the first cache bank (131) may include a first memory (310) having good write characteristics and a second memory (320) having bad write characteristics.

제1 메모리(960) 및 제2 메모리(970)는 디바이스(100)와 연결되어 동작할 수 있다. 제1 메모리(960) 및 제2 메모리(970)는 레벨1 메모리일 수 있다. 또한, 제1 메모리(960) 및 제2 메모리(970)는 제1 코어(141)와 연결되어 데이터 처리에 이용될 수 있다. 제1 메모리(960) 및 제2 메모리(970)는 디바이스(100)의 외부에 위치할 수 있다.The first memory (960) and the second memory (970) can be connected to the device (100) and operate. The first memory (960) and the second memory (970) can be level 1 memories. In addition, the first memory (960) and the second memory (970) can be connected to the first core (141) and used for data processing. The first memory (960) and the second memory (970) can be located outside the device (100).

도 9에서는 일 실시 예에 따라 디바이스(100)가 제1 타일(171)에 포함되도록 도시되었지만, 다른 실시 예에 따를 때, 디바이스(100)가 제1 타일(171)에 포함되지 않을 수 있다. In FIG. 9, the device (100) is illustrated as being included in the first tile (171) according to one embodiment, but in another embodiment, the device (100) may not be included in the first tile (171).

또한, 도 9에서는 일 실시 예에 따라 디바이스(100)의 외부에 제1 코어(141)가 위치하도록 도시되었지만, 다른 실시 예에 따를 때, 제1 코어(141)는 디바이스(100)의 내부에 위치할 수 있다.Additionally, although FIG. 9 illustrates that the first core (141) is positioned outside the device (100) according to one embodiment, the first core (141) may be positioned inside the device (100) according to another embodiment.

일 실시 예에 따른 디바이스(100)는 분산 캐시 방식이 이용되어 스케일러블(scalable)하게 동작할 수 있다. 예를 들면, 도 9에 도시된 바와 달리 128개, 256개, 512개, 1024개 등의 타일이 이용되는 컴퓨팅 환경에서 디바이스(100)가 동작할 수 있다. A device (100) according to one embodiment can operate scalably by utilizing a distributed cache method. For example, unlike as illustrated in FIG. 9, the device (100) can operate in a computing environment in which 128, 256, 512, 1024 tiles, etc. are utilized.

도 10은 일 실시 예에 따른 디바이스(100)가 쓰기 집중도를 예측하거나 모니터링하여 데이터를 처리하는 일 예를 나타내는 블록도이다.FIG. 10 is a block diagram illustrating an example of a device (100) processing data by predicting or monitoring write concentration according to one embodiment.

도 10을 참조하면, 디바이스(100)는 뱅크 파티셔닝 하드웨어(910), 모니터링 하드웨어(920), 쓰기 집중도 예측부(940), 쓰기 집중도 모니터(950), 제1 캐시 뱅크(131) 및 파티셔닝 플레이스먼트 알고리즘(1010)을 포함할 수 있다.Referring to FIG. 10, the device (100) may include bank partitioning hardware (910), monitoring hardware (920), a write concentration prediction unit (940), a write concentration monitor (950), a first cache bank (131), and a partitioning placement algorithm (1010).

뱅크 파티셔닝 하드웨어(910), 모니터링 하드웨어(920), 쓰기 집중도 예측부(940), 쓰기 집중도 모니터(950) 및 제1 캐시 뱅크(131)는 도 9에서 상술되었으므로 전체적인 설명을 간단히 하기 위해 상세한 설명을 생략한다.The bank partitioning hardware (910), monitoring hardware (920), write concentration prediction unit (940), write concentration monitor (950), and first cache bank (131) are described above in FIG. 9, so a detailed description is omitted to simplify the overall description.

파티셔닝 플레이스먼트 알고리즘(1010)은 별개의 구성으로 동작할 수도 있으나, 프로세서(120) 내에서 알고리즘으로 구현될 수도 있다. The partitioning placement algorithm (1010) may operate as a separate configuration, but may also be implemented as an algorithm within the processor (120).

일 실시 예에 따른 파티셔닝 플레이스먼트 알고리즘(1010)은 쓰기 집중도 모니터(950)로부터 모니터링된 결과를 수신하고, 파티셔닝을 어떻게 수행할 것인지에 대한 파티셔닝 결과를 나타내는 데이터를 뱅크 파티셔닝 하드웨어(910)에 전송할 수 있다.A partitioning placement algorithm (1010) according to one embodiment may receive monitored results from a write intensity monitor (950) and transmit data representing partitioning results on how to perform partitioning to the bank partitioning hardware (910).

일 실시 예에 따라, 쓰기 집중도 예측부(940) 및 제1 캐시 뱅크(131)는 PHC(prediction hybrid cache)와 같이 동작할 수 있다. 예를 들면, 쓰기 집중도 예측부(940)의 예측 결과에 따라, 제1 캐시 뱅크(131)에 포함된 복수개의 메모리 중 하나의 메모리에 데이터가 저장될 수 있다.According to one embodiment, the write concentration prediction unit (940) and the first cache bank (131) may operate as a prediction hybrid cache (PHC). For example, according to the prediction result of the write concentration prediction unit (940), data may be stored in one of the multiple memories included in the first cache bank (131).

상기 살펴 본 실시 예들에 따른 장치는 프로세서, 프로그램 데이터를 저장하고 실행하는 메모리, 디스크 드라이브와 같은 영구 저장부(permanent storage), 외부 장치와 통신하는 통신 포트, 터치 패널, 키(key), 버튼 등과 같은 사용자 인터페이스 장치 등을 포함할 수 있다. 소프트웨어 모듈 또는 알고리즘으로 구현되는 방법들은 상기 프로세서상에서 실행 가능한 컴퓨터가 읽을 수 있는 코드들 또는 프로그램 명령들로서 컴퓨터가 읽을 수 있는 기록 매체 상에 저장될 수 있다. 여기서 컴퓨터가 읽을 수 있는 기록 매체로 마그네틱 저장 매체(예컨대, ROM(read-only memory), RAM(random-access memory), 플로피 디스크, 하드 디스크 등) 및 광학적 판독 매체(예컨대, 시디롬(CD-ROM), 디브이디(DVD: Digital Versatile Disc)) 등이 있다. 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템들에 분산되어, 분산 방식으로 컴퓨터가 판독 가능한 코드가 저장되고 실행될 수 있다. 매체는 컴퓨터에 의해 판독가능하며, 메모리에 저장되고, 프로세서에서 실행될 수 있다. The device according to the above-described embodiments may include a processor, a memory for storing and executing program data, a permanent storage such as a disk drive, a communication port for communicating with an external device, a user interface device such as a touch panel, a key, a button, etc. The methods implemented as software modules or algorithms may be stored on a computer-readable recording medium as computer-readable codes or program commands executable on the processor. Here, the computer-readable recording medium includes a magnetic storage medium (e.g., a read-only memory (ROM), a random-access memory (RAM), a floppy disk, a hard disk, etc.) and an optical reading medium (e.g., a CD-ROM, a Digital Versatile Disc (DVD)). The computer-readable recording medium may be distributed to computer systems connected to a network, so that the computer-readable code may be stored and executed in a distributed manner. The medium may be readable by a computer, stored in a memory, and executed by a processor.

본 실시 예는 기능적인 블록 구성들 및 다양한 처리 단계들로 나타내어질 수 있다. 이러한 기능 블록들은 특정 기능들을 실행하는 다양한 개수의 하드웨어 또는/및 소프트웨어 구성들로 구현될 수 있다. 예를 들어, 실시 예는 하나 이상의 마이크로프로세서들의 제어 또는 다른 제어 장치들에 의해서 다양한 기능들을 실행할 수 있는, 메모리, 프로세싱, 로직(logic), 룩 업 테이블(look-up table) 등과 같은 직접 회로 구성들을 채용할 수 있다. 구성 요소들이 소프트웨어 프로그래밍 또는 소프트웨어 요소들로 실행될 수 있는 것과 유사하게, 본 실시 예는 데이터 구조, 프로세스들, 루틴들 또는 다른 프로그래밍 구성들의 조합으로 구현되는 다양한 알고리즘을 포함하여, C, C++, 자바(Java), 어셈블러(assembler) 등과 같은 프로그래밍 또는 스크립팅 언어로 구현될 수 있다. 기능적인 측면들은 하나 이상의 프로세서들에서 실행되는 알고리즘으로 구현될 수 있다. 또한, 본 실시 예는 전자적인 환경 설정, 신호 처리, 및/또는 데이터 처리 등을 위하여 종래 기술을 채용할 수 있다. "매커니즘", "요소", "수단", "구성"과 같은 용어는 넓게 사용될 수 있으며, 기계적이고 물리적인 구성들로서 한정되는 것은 아니다. 상기 용어는 프로세서 등과 연계하여 소프트웨어의 일련의 처리들(routines)의 의미를 포함할 수 있다.The present embodiment may be represented by functional block configurations and various processing steps. These functional blocks may be implemented by various numbers of hardware and/or software configurations that perform specific functions. For example, the embodiment may employ direct circuit configurations such as memory, processing, logic, look-up tables, etc., which may perform various functions under the control of one or more microprocessors or other control devices. Similarly to the fact that the components may be implemented as software programs or software elements, the present embodiment may be implemented in a programming or scripting language such as C, C++, Java, assembler, etc., including various algorithms implemented as a combination of data structures, processes, routines, or other programming configurations. The functional aspects may be implemented as algorithms that are executed on one or more processors. In addition, the present embodiment may employ conventional techniques for electronic environment setting, signal processing, and/or data processing. Terms such as "mechanism", "element", "means", and "configuration" may be used broadly and are not limited to mechanical and physical configurations. The terms may also include the meaning of a series of software processes (routines) in connection with a processor, etc.

본 실시 예에서 설명하는 특정 실행들은 예시들로서, 어떠한 방법으로도 기술적 범위를 한정하는 것은 아니다. 명세서의 간결함을 위하여, 종래 전자적인 구성들, 제어 시스템들, 소프트웨어, 상기 시스템들의 다른 기능적인 측면들의 기재는 생략될 수 있다. 또한, 도면에 도시된 구성 요소들 간의 선들의 연결 또는 연결 부재들은 기능적인 연결 및/또는 물리적 또는 회로적 연결들을 예시적으로 나타낸 것으로서, 실제 장치에서는 대체 가능하거나 추가의 다양한 기능적인 연결, 물리적인 연결, 또는 회로 연결들로서 나타내어질 수 있다. The specific implementations described in the present embodiments are examples and are not intended to limit the technical scope in any way. For the sake of brevity of the specification, descriptions of conventional electronic components, control systems, software, and other functional aspects of the systems may be omitted. In addition, the connections or lack of connections of lines between components illustrated in the drawings are merely illustrative of functional connections and/or physical or circuit connections, and may be represented as various functional connections, physical connections, or circuit connections that are replaceable or additional in an actual device.

본 명세서(특히 특허청구범위에서)에서 "상기"의 용어 및 이와 유사한 지시 용어의 사용은 단수 및 복수 모두에 해당하는 것일 수 있다. 또한, 범위(range)를 기재한 경우 상기 범위에 속하는 개별적인 값을 포함하는 것으로서(이에 반하는 기재가 없다면), 상세한 설명에 상기 범위를 구성하는 각 개별적인 값을 기재한 것과 같다. 마지막으로, 방법을 구성하는 단계들에 대하여 명백하게 순서를 기재하거나 반하는 기재가 없다면, 상기 단계들은 적당한 순서로 행해질 수 있다. 반드시 상기 단계들의 기재 순서에 한정되는 것은 아니다. 모든 예들 또는 예시적인 용어(예들 들어, 등등)의 사용은 단순히 기술적 사상을 상세히 설명하기 위한 것으로서 특허청구범위에 의해 한정되지 않는 이상 상기 예들 또는 예시적인 용어로 인해 범위가 한정되는 것은 아니다. 또한, 당업자는 다양한 수정, 조합 및 변경이 부가된 특허청구범위 또는 그 균등물의 범주 내에서 설계 조건 및 팩터에 따라 구성될 수 있음을 알 수 있다.The use of the term "above" and similar referential terms in this specification (especially in the claims) may refer to both the singular and the plural. In addition, when a range is described, it is intended to include individual values within the range (unless otherwise stated), and each individual value constituting the range is described in the detailed description. Finally, unless the order of the steps constituting the method is explicitly stated or stated to the contrary, the steps may be performed in any suitable order. The order in which the steps are described is not necessarily limited. The use of all examples or exemplary terms (e.g., etc.) is intended merely to further illustrate the technical idea and is not intended to limit the scope of the invention by reason of the examples or exemplary terms, unless otherwise stated by the claims. Furthermore, those skilled in the art will recognize that various modifications, combinations, and variations may be made within the scope of the appended claims or their equivalents, depending on design conditions and factors.

Claims

In a method of processing data using multiple cores,
A step of receiving data to be stored in some of the cache banks among the plurality of cache banks corresponding to the plurality of cores;
A step of partitioning and transmitting the received data to some of the cache banks according to the write intensity of the received data; and
A method comprising the step of storing data transmitted to a first cache bank, which is one of the cache banks, in the first cache bank.

In paragraph 1,
The step of storing the data transmitted to the first cache bank in the first cache bank
A method of storing data transferred to the first cache bank in one of a plurality of memories having different write characteristics included in the first cache bank, depending on the write concentration of the data transferred to the first cache bank.

In paragraph 1,
Step of receiving a cache access request; and
A method further comprising the step of outputting data stored in the first cache bank according to the received cache access request.

In paragraph 1,
The above split and transfer step is
A method of dividing the received data and transmitting it to some of the cache banks so that writes are uniformly distributed to the some of the cache banks.

In the second paragraph,
A method wherein the plurality of memories include a first memory having a latency time or energy required for writing unit data less than a preset value and a second memory having a latency time or energy greater than or equal to the preset value.

In paragraph 5,
The step of storing the data transmitted to the first cache bank in the first cache bank
A method of storing data transferred to the first cache bank in the second memory when the write concentration of data transferred to the first cache bank is lower than a preset value.

In paragraph 5,
The step of storing the data transmitted to the first cache bank in the first cache bank
A method of storing data transferred to the first cache bank in the first memory when the write concentration of data transferred to the first cache bank is higher than a preset value.

delete

In the second paragraph,
A method further comprising the step of determining the write intensity of data transferred to the first cache bank by using the write intensity history of data accessed by the program counter corresponding to the data transferred to the first cache bank.

In the second paragraph,
A method further comprising the step of determining a write intensity of data transferred to the first cache bank by using a write intensity history for an application corresponding to the data transferred to the first cache bank.

In a device that processes data using multiple cores,
A receiving unit that receives data to be stored in some of the cache banks among the plurality of cache banks corresponding to the plurality of cores;
A processor that partitions and transmits the received data to some of the cache banks according to the write intensity of the received data; and
A device comprising a first cache bank, said first cache bank storing data transmitted to said first cache bank, said first cache bank being one of said cache banks.

delete