CN115344826A

CN115344826A - Computing device, method of operation, and machine-readable storage medium

Info

Publication number: CN115344826A
Application number: CN202210979891.4A
Authority: CN
Inventors: 不公告发明人
Original assignee: Shanghai Biren Intelligent Technology Co Ltd
Current assignee: Shanghai Biren Intelligent Technology Co Ltd
Priority date: 2022-08-16
Filing date: 2022-08-16
Publication date: 2022-11-15

Abstract

The invention provides a computing device, an operating method and a machine-readable storage medium. In an embodiment according to the invention, the method of operation comprises: checking data type information carried by a current instruction, wherein the data type information represents the data type of a target operand corresponding to the current instruction; when the data type information indicates that the data type of the target operand is the self-adaptive data type, reading metadata corresponding to the target operand to acquire the actual data type of the target operand from the metadata; and executing the current instruction to process the target operand based on the actual data type of the target operand recorded by the metadata.

Description

Computing device, method of operation, and machine-readable storage medium

技术领域technical field

本发明涉及一种指令集，且特别涉及一种计算装置、操作方法和机器可读存储介质。The present invention relates to an instruction set, and in particular to a computing device, an operating method and a machine-readable storage medium.

背景技术Background technique

一般而言，在编写计算程序(program)时，编写者是知道运算数(或称操作数、operand)的数据类型，因此编写者可以将带有“指定(固定)数据类型”信息的指令编写至计算程序中，然后使用编译器(compiler)对计算程序进行编译。举例来说，计算程序可能包括自带有信息“32位的浮点数”(固定数据类型)的加载指令，用以将数据类型为“32位浮点数”的一个运算数从内存加载至运算核(例如张量核，tensor core)。或者，计算程序可能包括自带有信息“32位的浮点数”(固定数据类型)的矩阵乘和累加(matrix multiply andaccumulation，MMA)指令，用以使运算核对已被加载的“32位浮点数”的两个运算数进行矩阵乘计算。无论如何，在有一些情况下运算数的数据类型可能在编译时是不确定的(数据类型为未知)。举例来说，卷积神经网络(Convolutional Neural Network,CNN)运算程序的隐藏层(Hidden layer)的计算结果的数据类型可能在实际执行计算后才会被动态决定。然而，目前的指令集并没有支持“不确定数据类型”。Generally speaking, when writing a calculation program (program), the writer knows the data type of the operand (or operand, operand), so the writer can write instructions with "specified (fixed) data type" information into the calculation program, and then use a compiler to compile the calculation program. For example, the calculation program may include a load instruction with the information "32-bit floating point number" (fixed data type), which is used to load an operand whose data type is "32-bit floating point number" from the memory to the calculation core (eg tensor core, tensor core). Alternatively, the calculation program may include matrix multiply and accumulate (MMA) instructions with information "32-bit floating-point numbers" (fixed data type) to make the calculation check the loaded "32-bit floating-point numbers The two operands of " perform matrix multiplication calculation. However, there are cases where the data type of an operand may be undefined at compile time (data type is unknown). For example, the data type of the calculation result of the hidden layer (Hidden layer) of the Convolutional Neural Network (CNN) operation program may be dynamically determined after the calculation is actually performed. However, the current instruction set does not support "indeterminate data types".

发明内容Contents of the invention

本发明提供一种计算装置及其操作方法，以及机器可读存储介质，以支持自适应数据类型(adaptive data type)。自适应数据类型意为，数据类型在编译(compile)时为未知。The present invention provides a computing device and its operating method, as well as a machine-readable storage medium to support adaptive data types. Adaptive data types mean that the data type is unknown at compile time.

在根据本发明的实施例中，所述操作方法包括：检查目前指令自带的数据类型信息，其中数据类型信息表示目前指令所对应的目标运算数(operand)的数据类型；当数据类型信息表示目标运算数的数据类型为自适应数据类型时，读取目标运算数所对应的元数据(meta data)，以从元数据获知目标运算数的实际数据类型，或者直接获取所述目标运算数的所述实际数据类型；以及基于元数据所记载目标运算数的实际数据类型，执行目前指令以处理目标运算数。In an embodiment according to the present invention, the operation method includes: checking the data type information carried by the current instruction, wherein the data type information indicates the data type of the target operand (operand) corresponding to the current instruction; when the data type information indicates When the data type of the target operand is an adaptive data type, read the metadata (meta data) corresponding to the target operand to obtain the actual data type of the target operand from the metadata, or directly obtain the the actual data type; and based on the actual data type of the target operand recorded in the metadata, execute the current instruction to process the target operand.

在根据本发明的实施例中，所述机器可读存储介质用于存储非暂时性机器可读指令。当所述非暂时性机器可读指令由计算机执行时，可以实现所述计算装置的操作方法。In an embodiment according to the present invention, the machine-readable storage medium is used to store non-transitory machine-readable instructions. The operating method of the computing device may be implemented when the non-transitory machine readable instructions are executed by a computer.

在根据本发明的实施例中，所述计算装置包括内存以及运算核。内存用以存放目标运算数。运算核耦接至内存。运算核检查目前指令自带的数据类型信息，其中数据类型信息表示目前指令所对应的目标运算数的数据类型。当数据类型信息表示目标运算数的数据类型为自适应数据类型时，运算核读取目标运算数所对应的元数据，以从所述元数据获知目标运算数的实际数据类型，或者直接获取所述目标运算数的所述实际数据类型。基于元数据所记载目标运算数的实际数据类型，运算核执行目前指令以处理目标运算数。In an embodiment according to the present invention, the computing device includes a memory and a computing core. Memory is used to store the destination operand. The computing core is coupled to the memory. The operation core checks the data type information of the current instruction, wherein the data type information indicates the data type of the target operand corresponding to the current instruction. When the data type information indicates that the data type of the target operand is an adaptive data type, the operation core reads the metadata corresponding to the target operand, so as to obtain the actual data type of the target operand from the metadata, or directly obtain the The actual data type of the target operand. Based on the actual data type of the target operand recorded in the metadata, the operation core executes the current instruction to process the target operand.

基于上述，运算核可以检查目前指令自带的数据类型信息来判断目前指令所对应的目标运算数的数据类型是固定(指定)数据类型还是自适应数据类型。所述固定数据类型意指，目标运算数的数据类型在编译时为已知。所述自适应数据类型意指，目标运算数的数据类型在编译时为未知，而在执行程序时被动态决定。在执行程序时目标运算数的实际数据类型被记录于目标运算数所对应的元数据。在运算核执行目前指令前，运算核检查目前指令的目标运算数的数据类型。当目标运算数的数据类型为自适应数据类型时，运算核可以从目标运算数所对应的元数据获知目标运算数的实际数据类型，或者直接获取所述目标运算数的所述实际数据类型。基于元数据所记载目标运算数的实际数据类型，运算核可以正确执行目前指令以处理目标运算数。Based on the above, the operation core can check the data type information of the current instruction to determine whether the data type of the target operand corresponding to the current instruction is a fixed (specified) data type or an adaptive data type. The fixed data type means that the data type of the target operand is known at compile time. The adaptive data type means that the data type of the target operand is unknown when compiling, but is determined dynamically when the program is executed. The actual data type of the target operand is recorded in the metadata corresponding to the target operand when the program is executed. Before the operation core executes the current instruction, the operation core checks the data type of the target operand of the current instruction. When the data type of the target operand is an adaptive data type, the operation core may obtain the actual data type of the target operand from the metadata corresponding to the target operand, or directly obtain the actual data type of the target operand. Based on the actual data type of the target operand recorded in the metadata, the computing core can correctly execute the current instruction to process the target operand.

附图说明Description of drawings

图1是依照本发明的一实施例的一种计算装置的电路方块(circuit block)示意图。FIG. 1 is a schematic diagram of a circuit block of a computing device according to an embodiment of the present invention.

图2是依照本发明的一实施例的一种计算装置的操作方法的流程示意图。FIG. 2 is a schematic flowchart of an operating method of a computing device according to an embodiment of the present invention.

图3是依照本发明的一实施例所绘示，运算核的电路方块示意图。FIG. 3 is a schematic diagram of a circuit block diagram of a computing core according to an embodiment of the present invention.

图4是依照本发明的另一实施例所绘示，运算核的电路方块示意图。FIG. 4 is a schematic circuit block diagram of a computing core according to another embodiment of the present invention.

附图标记说明Explanation of reference signs

100：计算装置100: computing device

110：内存110: memory

120：运算核120: computing core

121：运算电路121: Operation circuit

122、126：运算数缓冲器122, 126: operand buffer

123：转换单元123: conversion unit

124：加载单元124: load unit

125：状态寄存器125: status register

Dconv：运算数Dconv: Operand

Dm：元数据Dm: metadata

Dorig：计算结果Dorig: calculation result

S210～S250：步骤S210～S250: steps

ST：统计结果ST: Statistical results

具体实施方式Detailed ways

现将详细地参考本发明的示范性实施例，示范性实施例的实例说明于附图中。只要有可能，相同组件符号在附图和描述中用来表示相同或相似部分。Reference will now be made in detail to the exemplary embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used in the drawings and description to refer to the same or like parts.

在本案说明书全文(包括权利要求)中所使用的“耦接(或连接)”一词可指任何直接或间接的连接手段。举例而言，若文中描述第一装置耦接(或连接)于第二装置，则应该被解释成该第一装置可以直接连接于该第二装置，或者该第一装置可以透过其他装置或某种连接手段而间接地连接至该第二装置。本案说明书全文(包括权利要求)中提及的“第一”、“第二”等用语是用以命名组件(element)的名称，而并非用来限制组件数量的上限或下限，亦非用来限制组件的次序。另外，凡可能之处，在附图及实施方式中使用相同标号的组件/构件/步骤代表相同或类似部分。不同实施例中使用相同标号或使用相同用语的组件/构件/步骤可以相互参照相关说明。As used throughout this specification, including the claims, the term "coupled (or connected)" may refer to any means of connection, direct or indirect. For example, if it is described that a first device is coupled (or connected) to a second device, it should be interpreted that the first device can be directly connected to the second device, or the first device can be connected through other devices or The second device is indirectly connected to the second device through some connection means. The terms "first" and "second" mentioned in the entire description of this case (including the claims) are used to name the name of the component (element), not to limit the upper or lower limit of the number of components, nor to limit the number of components. Restricts the order of components. In addition, wherever possible, components/members/steps using the same reference numerals in the drawings and embodiments represent the same or similar parts. Components/components/steps using the same symbols or using the same terms in different embodiments can refer to related descriptions.

图1是依照本发明的一实施例的一种计算装置100的电路方块(circuit block)示意图。图1所示计算装置100包括内存110以及运算核120。内存110用以存放运算数(operand)。本实施利并不限制运算数的具体数据结构。举例来说，在神经网络的应用中，运算数可以是矢量(vector)、张量(tensor)或是其他数据。基于实际设计，运算数可以是一个矩阵(matrix)，也可以是一个矩阵被分切后的多个块中的任一个。所述矩阵的大小与所述块的大小可以依照实际设计来决定。举例来说，在一些应用例中，一个块(运算数)的大小可以是32*32、64*64或是其他尺寸。FIG. 1 is a schematic diagram of a circuit block of a computing device 100 according to an embodiment of the present invention. The computing device 100 shown in FIG. 1 includes a memory 110 and a computing core 120 . The memory 110 is used for storing operands. This embodiment does not limit the specific data structure of the operand. For example, in the application of neural network, the operand can be vector, tensor or other data. Based on the actual design, the operand can be a matrix (matrix), or any one of multiple blocks after a matrix is divided. The size of the matrix and the size of the block can be determined according to actual design. For example, in some applications, the size of a block (operand) can be 32*32, 64*64 or other sizes.

运算核120耦接至内存110。在不同的应用范例中，所述运算核120包括张量核(tensor core)、通用矩阵乘(general matrix multiply，GEMM)核、算术逻辑单元(arithmetic logic unit，ALU)以及/或是其他运算单元。依照不同的设计需求，在一些实施例中，上述运算核120的实现方式可以是硬件(hardware)电路。在另一些实施例中，运算核120的实现方式可以是固件(firmware)、软件(software，即程序)或是前述二者的组合形式。在又一些实施例中，运算核120的实现方式可以是硬件、固件、软件中的多者的组合形式。The computing core 120 is coupled to the memory 110 . In different application examples, the calculation core 120 includes a tensor core (tensor core), a general matrix multiply (general matrix multiply, GEMM) core, an arithmetic logic unit (arithmetic logic unit, ALU) and/or other calculation units . According to different design requirements, in some embodiments, the implementation of the computing core 120 may be a hardware circuit. In some other embodiments, the computing core 120 may be implemented in firmware (firmware), software (software, ie program), or a combination of the two. In some other embodiments, the computing core 120 may be implemented in a combination of hardware, firmware, and software.

以硬件形式而言，上述运算核120可以实现于集成电路(integrated circuit)上的逻辑电路。举例来说，运算核120的相关功能可以被实现于一或多个控制器、微控制器(Microcontroller)、微处理器(Microprocessor)、特殊应用集成电路(Application-specific integrated circuit,ASIC)、数字信号处理器(digital signal processor,DSP)、场可程序逻辑门阵列(Field Programmable Gate Array,FPGA)及/或其他处理单元中的各种逻辑区块、模块和电路。运算核120的相关功能可以利用硬件描述语言(hardwaredescription languages，例如Verilog HDL或VHDL)或其他合适的编程语言来实现为硬件电路，例如集成电路中的各种逻辑区块、模块和电路。In terms of hardware, the above computing core 120 may be implemented as a logic circuit on an integrated circuit. For example, the relevant functions of the computing core 120 can be implemented in one or more controllers, microcontrollers (Microcontroller), microprocessors (Microprocessor), application-specific integrated circuits (Application-specific integrated circuit, ASIC), digital Various logic blocks, modules and circuits in a signal processor (digital signal processor, DSP), field programmable logic gate array (Field Programmable Gate Array, FPGA) and/or other processing units. Related functions of the computing core 120 can be implemented as hardware circuits, such as various logic blocks, modules and circuits in an integrated circuit, by using hardware description languages (such as Verilog HDL or VHDL) or other suitable programming languages.

以软件形式及/或固件形式而言，上述运算核120的相关功能可以被实现为编程码(programming codes)。例如，利用一般的编程语言(programming languages，例如C、C++或汇编语言)或其他合适的编程语言来实现运算核120。所述编程码可以被记录/存放在非临时的机器可读存储介质(non-transitory machine-readable storage medium)中。在一些实施例中，所述机器可读存储介质例如包括半导体内存以及(或是)存储装置。所述半导体内存包括记忆卡、只读存储器(Read Only Memory，ROM)、闪存(FLASH memory)、可程序设计的逻辑电路或是其他半导体内存。所述存储装置包括带(tape)、碟(disk)、硬盘(hard diskdrive，HDD)、固态硬盘(Solid-state drive，SSD)或是其他存储装置。电子设备(例如计算机、中央处理器(Central Processing Unit，CPU)、控制器、微控制器或微处理器)可以从所述机器可读存储介质中读取并执行所述编程码，从而实现运算核120的相关功能。或者，所述编程码可经由任意传输媒体(例如通信网路或广播电波等)而提供给所述电子设备。所述通信网路例如是因特网(Internet)、有线通信(wired communication)网络、无线通信(wireless communication)网络或其它通信介质。In terms of software and/or firmware, the related functions of the computing core 120 may be implemented as programming codes. For example, the computing core 120 is realized by using general programming languages (such as C, C++ or assembly language) or other suitable programming languages. The programming code may be recorded/stored in a non-transitory machine-readable storage medium. In some embodiments, the machine-readable storage medium includes, for example, a semiconductor memory and/or a storage device. The semiconductor memory includes memory card, read only memory (Read Only Memory, ROM), flash memory (FLASH memory), programmable logic circuit or other semiconductor memory. The storage device includes a tape, a disk, a hard disk drive (HDD), a solid-state drive (SSD), or other storage devices. An electronic device (such as a computer, a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, or a microprocessor) can read and execute the programming code from the machine-readable storage medium, thereby realizing calculation related functions of the core 120. Alternatively, the programming code may be provided to the electronic device via any transmission medium (such as a communication network or broadcast waves, etc.). The communication network is, for example, the Internet, a wired communication network, a wireless communication network or other communication media.

图2是依照本发明的一实施例的一种计算装置的操作方法的流程示意图。在一些实施例中，图2所示计算装置的操作方法可以实现于固件(firmware)或软件(software，即程序)。例如，图2所示计算装置的操作方法的相关操作可以被实现为非暂时性机器可读指令(编程码或程序)，而所述非暂时性机器可读指令可以被存储在机器可读存储介质。当非暂时性机器可读指令由计算机执行时可以实现图2所示计算装置的操作方法。在另一些实施例中，图2所示计算装置的操作方法可以实现于硬件，例如实现于图1所示计算装置100。FIG. 2 is a schematic flowchart of an operating method of a computing device according to an embodiment of the present invention. In some embodiments, the operation method of the computing device shown in FIG. 2 may be implemented in firmware or software (software, ie a program). For example, the relevant operations of the operating method of the computing device shown in FIG. 2 can be implemented as non-transitory machine-readable instructions (programming codes or programs), and the non-transitory machine-readable instructions can be stored in a machine-readable storage medium. The operating method of the computing device shown in FIG. 2 can be implemented when the non-transitory machine-readable instructions are executed by a computer. In other embodiments, the operation method of the computing device shown in FIG. 2 may be implemented in hardware, such as the computing device 100 shown in FIG. 1 .

运算核120可以从内存110提取指令(以下称目前指令)。例如，运算核120可以从内存110提取加载(load)指令、矩阵乘和累加(matrix multiply and accumulation，MMA)指令或是其他指令。一般而言，假设目前指令是用以处理一个或多个运算数，则这个目前指令自带了所述一个或多个运算数的数据类型信息。所述数据类型信息表示目前指令所对应的目标运算数的数据类型。假设数据类型在编译时为已知，则所述数据类型信息表示任何“固定(指定)数据类型”。举例来说，依照实际应用情境，所述固定数据类型可以是4位有符号整数(signed integer)s4、8位有符号整数s8、8位无符号整数(unsigned integer)u8、8位浮点数f8、8位脑浮点数(brain float)bf8、16位有符号整数s16、标准16位浮点数(standard16-bit float)f16、16位脑浮点数bf16、标准32位浮点数f32、32位快速浮点数(fastfloat)ff32或是32位以上浮点数f32+。此外，依照实际应用情境，所述运算数可以是纯量(scalar)、矢量(vector)、矩阵(matrix)、张量(Tensor)或是其他运算数。举例来说，在神经网络的应用中，基于实际设计，所述运算数可以是一个矩阵被分切后的多个块中的任一个。所述矩阵的大小与所述块的大小可以依照实际设计来决定。举例来说，在一些应用例中，一个块(运算数)的大小可以是32*32、64*64或是其他尺寸。The computing core 120 can fetch instructions (hereinafter referred to as current instructions) from the memory 110 . For example, the computing core 120 may fetch a load instruction, a matrix multiply and accumulate (MMA) instruction or other instructions from the memory 110 . Generally speaking, assuming that the current instruction is used to process one or more operands, the current instruction carries the data type information of the one or more operands. The data type information indicates the data type of the target operand corresponding to the current instruction. The data type information represents any "fixed (specified) data type", assuming the data type is known at compile time. For example, according to the actual application situation, the fixed data type may be 4-bit signed integer (signed integer) s4, 8-bit signed integer (s8), 8-bit unsigned integer (unsigned integer) u8, 8-bit floating point number f8 , 8-bit brain float bf8, 16-bit signed integer s16, standard 16-bit float f16, 16-bit brain float bf16, standard 32-bit float f32, 32-bit fast float Point number (fastfloat) ff32 or 32-bit floating point number f32+. In addition, according to the actual application situation, the operand may be a scalar, a vector, a matrix, a tensor or other operands. For example, in the application of the neural network, based on the actual design, the operand may be any one of multiple blocks after a matrix is divided. The size of the matrix and the size of the block can be determined according to actual design. For example, in some applications, the size of a block (operand) can be 32*32, 64*64 or other sizes.

在有一些情况下，目前指令所对应的目标运算数的数据类型可能在编译时是不确定的(数据类型暂时未知)。举例来说，卷积神经网络(Convolutional Neural Network,CNN)运算程序的隐藏层(Hidden layer)的计算结果的数据类型可能在实际执行计算后才会被动态决定。本实施例的指令集可以支持“不确定数据类型”，亦即自适应数据类型(adaptive data type)。所述自适应数据类型表示，所述目标运算数的数据类型在编译时为未知(不确定)，而在实际执行时被动态决定。In some cases, the data type of the target operand corresponding to the current instruction may not be determined at compile time (the data type is temporarily unknown). For example, the data type of the calculation result of the hidden layer (Hidden layer) of the Convolutional Neural Network (CNN) operation program may be dynamically determined after the calculation is actually performed. The instruction set of this embodiment can support "uncertain data type", that is, adaptive data type (adaptive data type). The adaptive data type means that the data type of the target operand is unknown (undefined) at compile time, but is dynamically determined at actual execution time.

请参照图1与图2。在步骤S210中，运算核120可以从检查目前指令自带的数据类型信息。当所述数据类型信息表示目前指令的所有目标运算数的数据类型皆为固定数据类型时(步骤S220的判断结果为“否”)，运算核120可以基于所述固定数据类型而执行目前指令以处理目标运算数(步骤S230)。依照实际设计，在一些实施例中，步骤S230可以是公知的作法，故在此不予赘述。Please refer to Figure 1 and Figure 2. In step S210, the computing core 120 may check the data type information of the current instruction. When the data type information indicates that the data types of all target operands of the current instruction are fixed data types (the determination result of step S220 is "No"), the operation core 120 may execute the current instruction based on the fixed data types to Process the target operand (step S230). According to the actual design, in some embodiments, step S230 may be a well-known practice, so it will not be repeated here.

所述数据类型信息的具体内容可以依照实际设计来决定。举例来说(但不限于此)，所述数据类型信息可以包括4位编码位。当所述4位编码位(数据类型信息)为第一值(例如0)时，表示所述固定数据类型为4位有符号整数s4。当数据类型信息为第二值(例如1)时，表示所述固定数据类型为8位有符号整数s8。当所述数据类型信息为第三值(例如2)时，表示所述固定数据类型为8位无符号整数u8。当所述数据类型信息为第四值(例如3)时，表示所述固定数据类型为标准16位浮点数f16。当所述数据类型信息为第五值(例如4)时，表示所述固定数据类型为标准32位浮点数f32。当所述数据类型信息为第六值(例如5)时，表示所述固定数据类型为16位脑浮点数bf16。当所述数据类型信息为第七值(例如9)时，表示所述固定数据类型为带有4位指数(exponent)的8位浮点数f8。当所述数据类型信息为第八值(例如10)时，表示所述固定数据类型为带有5位指数的8位脑浮点数bf8。当所述数据类型信息为第九值(例如15)时，表示所述目标运算数的数据类型为自适应数据类型。The specific content of the data type information may be determined according to actual design. For example (but not limited thereto), the data type information may include 4 coded bits. When the 4-bit coded bit (data type information) is the first value (for example, 0), it means that the fixed data type is a 4-bit signed integer s4. When the data type information is the second value (for example, 1), it means that the fixed data type is an 8-bit signed integer s8. When the data type information is a third value (for example, 2), it means that the fixed data type is an 8-bit unsigned integer u8. When the data type information is the fourth value (for example, 3), it means that the fixed data type is a standard 16-bit floating point number f16. When the data type information is the fifth value (for example, 4), it means that the fixed data type is a standard 32-bit floating point number f32. When the data type information is the sixth value (for example, 5), it means that the fixed data type is a 16-bit brain floating point number bf16. When the data type information is the seventh value (for example, 9), it means that the fixed data type is an 8-bit floating point number f8 with a 4-bit exponent. When the data type information is the eighth value (for example, 10), it means that the fixed data type is an 8-bit brain floating point number bf8 with a 5-bit exponent. When the data type information is the ninth value (for example, 15), it indicates that the data type of the target operand is an adaptive data type.

当所述数据类型信息表示目前指令的任何一个目标运算数的数据类型为自适应数据类型时(步骤S220的判断结果为“是”)，运算核120可以读取目标运算数所对应的元数据(meta data)，以从所述元数据获知目标运算数的实际数据类型，或者直接获取所述目标运算数的所述实际数据类型(步骤S240)。所述元数据的具体内容可以依照实际设计来决定。举例来说(但不限于此)，所述元数据可以包括实际数据类型字段，用以记载元数据所对应的目标运算数的实际数据类型。作为诸多范例的其中一个，所述实际数据类型字段所记载的实际数据类型包括具有第一结构的8位浮点数、具有第二结构的8位浮点数或是具有第三结构的16位浮点数。举例来说，假设所述实际数据类型字段包括2位编码数。当编码数为0时表示，目标运算数的实际数据类型为具有“1位符号(sign)、5位指数(exponent)以及2位尾数(mantissa)”(第一结构)的8位浮点数。其中，所述符号用来表示正负号。当编码数为1时表示，目标运算数的实际数据类型为“1位符号、4位指数以及3位尾数”(第二结构)的8位浮点数。当编码数为2时表示，目标运算数的实际数据类型为“1位符号、5位指数以及10位尾数”(第三结构)的16位浮点数。When the data type information indicates that the data type of any target operand of the current instruction is an adaptive data type (the judgment result of step S220 is "Yes"), the operation core 120 can read the metadata corresponding to the target operand (meta data) to obtain the actual data type of the target operand from the metadata, or directly obtain the actual data type of the target operand (step S240). The specific content of the metadata can be determined according to the actual design. For example (but not limited thereto), the metadata may include an actual data type field for recording the actual data type of the target operand corresponding to the metadata. As one of many examples, the actual data type recorded in the actual data type field includes an 8-bit floating-point number with a first structure, an 8-bit floating-point number with a second structure, or a 16-bit floating-point number with a third structure . For example, assume that the actual data type field includes a 2-bit encoded number. When the code number is 0, it means that the actual data type of the target operand is an 8-bit floating-point number with "1-bit sign (sign), 5-bit exponent (exponent) and 2-bit mantissa (mantissa)" (first structure). Wherein, the symbols are used to represent positive and negative signs. When the encoding number is 1, it means that the actual data type of the target operand is an 8-bit floating point number of "1-bit sign, 4-bit exponent and 3-bit mantissa" (second structure). When the encoding number is 2, it means that the actual data type of the target operand is a 16-bit floating point number of "1-bit sign, 5-bit exponent and 10-bit mantissa" (third structure).

在一些实施例中，所述元数据还可以包括缩放因子(scaling factor)字段，用以记载目标运算数的指数的移动量(offset)。运算核120可以根据目标运算数(例如区块)中每一个元素的指数值的范围，将长格式(long-format)数据转换为短格式数据。举例来说，假设某一个区块(目标运算数)中所有元素的指数值的范围为10～20，运算核120可以将指数值范围10～20移动至指数值范围0～10，以及将移动量“-10”记载在元数据的缩放因子字段。因此，当运算核120执行目前指令时，运算核120可以依据元数据的缩放因子字段的移动量“-10”，将目标运算数中所有元素的指数值的范围从“0～10”恢复为“10～20”。In some embodiments, the metadata may further include a scaling factor field, which is used to record the displacement (offset) of the exponent of the target operand. The calculation core 120 can convert the long-format data into short-format data according to the range of the exponent value of each element in the target operand (eg block). For example, assuming that the exponent values of all elements in a certain block (target operand) range from 10 to 20, the operation core 120 can move the exponent value range from 10 to 20 to the exponent value range from 0 to 10, and move The amount "-10" is described in the scaling factor field of the metadata. Therefore, when the computing core 120 executes the current instruction, the computing core 120 can restore the range of exponent values of all elements in the target operand from "0-10" to "10~20".

基于所述元数据所记载所述目标运算数的所述实际数据类型，运算核120可以执行所述目前指令以处理目标运算数(步骤S250)。举例来说，假设目前指令包括加载指令。当加载指令自带的数据类型信息表示目标运算数的数据类型为自适应数据类型时，运算核120可以从内存110读取目标运算数所对应的元数据，以从元数据获知目标运算数的实际数据类型(步骤S240)，然后将元数据以及实际数据类型记载在运算核120内部的寄存器(register)，例如状态寄存器(state register)或是其他寄存器。因此，基于所述元数据所记载的目标运算数的实际数据类型，运算核120可以执行加载指令(目前指令)以将目标运算数从内存110载入运算核120。又假设目前指令包括矩阵乘和累加(MMA)指令。当MMA指令自带的数据类型信息表示目标运算数的数据类型为自适应数据类型时，运算核120可以从自己内部的寄存器直接获取目标运算数的实际数据类型(步骤S240)。Based on the actual data type of the target operand recorded in the metadata, the computing core 120 may execute the current instruction to process the target operand (step S250 ). For example, assume that the current command includes a load command. When the data type information carried by the load instruction indicates that the data type of the target operand is an adaptive data type, the operation core 120 can read the metadata corresponding to the target operand from the memory 110, so as to obtain the target operand from the metadata. The actual data type (step S240 ), and then record the metadata and the actual data type in a register (register) inside the operation core 120 , such as a state register (state register) or other registers. Therefore, based on the actual data type of the target operand recorded in the metadata, the computing core 120 may execute a load instruction (current instruction) to load the target operand from the memory 110 into the computing core 120 . Assume further that the current instruction includes a matrix multiply and accumulate (MMA) instruction. When the data type information carried by the MMA instruction indicates that the data type of the target operand is an adaptive data type, the operation core 120 can directly obtain the actual data type of the target operand from its internal register (step S240).

图3是依照本发明的一实施例所绘示，运算核120的电路方块示意图。图3所示运算核120包括运算电路121、运算数缓冲器(operand buffer)122以及转换单元(conversionunit)123。运算电路121完成前一层计算后生成计算结果，并将计算结果Dorig存放至运算数缓冲器122。依照实际设计，在不同实施例中，运算数缓冲器122可以被配置在运算电路121内部，或被配置在运算电路121外部，或被配置在归约缓冲区(reduction buffer)中，或被配置在线程本地寄存器(thread local registers)中。此外，运算电路121可以统计所述计算结果Dorig的数值特征而生成统计结果ST给转换单元123。FIG. 3 is a schematic circuit block diagram of the computing core 120 according to an embodiment of the present invention. The operation core 120 shown in FIG. 3 includes an operation circuit 121 , an operand buffer (operand buffer) 122 and a conversion unit (conversion unit) 123 . The calculation circuit 121 generates a calculation result after completing the calculation of the previous layer, and stores the calculation result Dorig in the operand buffer 122 . According to the actual design, in different embodiments, the operand buffer 122 can be configured inside the operation circuit 121, or outside the operation circuit 121, or be configured in a reduction buffer (reduction buffer), or be configured In thread local registers. In addition, the computing circuit 121 can count numerical features of the calculation result Dorig to generate a statistical result ST to the conversion unit 123 .

运算数缓冲器122可以提供计算结果Dorig给转换单元123。转换单元123基于统计结果ST将计算结果Dorig转换为具有适于下一层计算的数据类型的运算数Dconv(目标运算数)与对应于所述运算数Dconv的元数据Dm。实际执行时，转换单元123动态决定所述运算数Dconv的数据类型，因此转换单元123将运算数Dconv的实际数据类型记载在运算数Dconv所对应的元数据Dm，然后将运算数Dconv与元数据Dm存放在内存110。The operand buffer 122 can provide the calculation result Dorig to the conversion unit 123 . The conversion unit 123 converts the calculation result Dorig into an operand Dconv (target operand) having a data type suitable for next layer calculation and metadata Dm corresponding to the operand Dconv based on the statistical result ST. During actual execution, the conversion unit 123 dynamically determines the data type of the operand Dconv, so the conversion unit 123 records the actual data type of the operand Dconv in the metadata Dm corresponding to the operand Dconv, and then combines the operand Dconv with the metadata Dm is stored in memory 110 .

图4是依照本发明的另一实施例所绘示，运算核120的电路方块示意图。图4所示运算核120包括运算电路121、加载单元(load unit)124、状态寄存器(state register)125以及运算数缓冲器126。加载单元124耦接至内存110。状态寄存器125耦接于加载单元124与运算电路121之间。运算数缓冲器126耦接于加载单元124与运算电路121之间。依照实际设计，在不同实施例中，运算电路121可以包括张量核(tensor core)、通用矩阵乘(GEMM)核、算术逻辑单元(ALU)以及/或是其他运算单元。FIG. 4 is a schematic circuit block diagram of the computing core 120 according to another embodiment of the present invention. The operation core 120 shown in FIG. 4 includes an operation circuit 121 , a load unit 124 , a state register 125 and an operand buffer 126 . The loading unit 124 is coupled to the memory 110 . The status register 125 is coupled between the loading unit 124 and the operation circuit 121 . The operand buffer 126 is coupled between the loading unit 124 and the operation circuit 121 . According to actual design, in different embodiments, the operation circuit 121 may include a tensor core, a general matrix multiplication (GEMM) core, an arithmetic logic unit (ALU) and/or other operation units.

作为一个说明范例，假设目前指令包括加载指令。当加载指令自带的数据类型信息表示目标运算数的数据类型为自适应数据类型时，加载单元124可以从内存110读取目标运算数所对应的元数据，以从元数据获知目标运算数的实际数据类型。加载单元124将所述元数据存放在状态寄存器125，以供运算电路121使用。此外，加载单元124可以基于元数据所记载“目标运算数的实际数据类型”从内存110读取目标运算数。加载单元124将所述目标运算数存放在运算数缓冲器126，以供运算电路121使用。As an illustrative example, assume that the current command includes a load command. When the data type information carried by the load instruction indicates that the data type of the target operand is an adaptive data type, the loading unit 124 can read the metadata corresponding to the target operand from the memory 110, so as to obtain the target operand from the metadata. actual data type. The loading unit 124 stores the metadata in the state register 125 for use by the operation circuit 121 . In addition, the loading unit 124 may read the target operand from the memory 110 based on the “actual data type of the target operand” described in the metadata. The loading unit 124 stores the target operand in the operand buffer 126 for use by the operation circuit 121 .

作为另一个说明范例，假设目前指令包括矩阵乘和累加(MMA)指令，而此矩阵乘和累加指令所对应的目标运算数(第一运算数与第二运算数)已被先前执行完毕的加载指令加载至运算数缓冲器126。此外，所述第一运算数对应于第一元数据，所述第二运算数对应于第二元数据。从前段说明内容可以类推，先前执行完毕的加载指令可以将所述第一元数据以及所述第一元数据所记载的实际数据类型(所述第一运算数的实际数据类型)与所述第二元数据以及所述第二元数据所记载的实际数据类型(所述第一运算数的实际数据类型)存放在状态寄存器125。当矩阵乘和累加指令自带的数据类型信息表示，第一运算数的数据类型为自适应数据类型时，运算电路121可以从状态寄存器125直接获取第一运算数的实际数据类型。当矩阵乘和累加指令自带的数据类型信息表示，第二运算数的数据类型为自适应数据类型时，运算电路121可以从状态寄存器125直接获取第二运算数的实际数据类型。基于第一运算数的实际数据类型与第二运算数的实际数据类型，运算电路121可以从运算数缓冲器126正确读取第一运算数与第二运算数，以及对第一运算数与第二运算数进行矩阵乘计算。As another illustrative example, assume that the current instruction includes a matrix multiply and accumulate (MMA) instruction, and the target operands (the first operand and the second operand) corresponding to the matrix multiply and accumulate instruction have been loaded by the previously executed load Instructions are loaded into operand buffer 126 . Furthermore, the first operand corresponds to first metadata and the second operand corresponds to second metadata. It can be deduced from the description in the previous paragraph that the previously executed load instruction can combine the first metadata and the actual data type recorded in the first metadata (the actual data type of the first operand) with the first The binary data and the actual data type recorded in the second metadata (the actual data type of the first operand) are stored in the status register 125 . When the data type information of the matrix multiply and accumulate instruction indicates that the data type of the first operand is an adaptive data type, the operation circuit 121 can directly obtain the actual data type of the first operand from the status register 125 . When the data type information of the matrix multiply and accumulate instruction indicates that the data type of the second operand is an adaptive data type, the operation circuit 121 can directly obtain the actual data type of the second operand from the status register 125 . Based on the actual data type of the first operand and the actual data type of the second operand, the operation circuit 121 can correctly read the first operand and the second operand from the operand buffer 126, and correctly Perform matrix multiplication with two operands.

综上所述，运算核120可以检查目前指令自带的数据类型信息来判断目前指令所对应的目标运算数的数据类型是固定(指定)数据类型还是自适应数据类型。所述固定数据类型意指，目标运算数的数据类型在编译时为已知。所述自适应数据类型意指，目标运算数的数据类型在编译时为未知，而在执行程序时被动态决定。在实际执行程序时，目标运算数的实际数据类型被记录于目标运算数所对应的元数据。在运算核120执行目前指令前，运算核120可以检查目前指令所对应的目标运算数的数据类型。当目标运算数的数据类型为自适应数据类型时，运算核120可以从目标运算数所对应的元数据获知目标运算数的实际数据类型。基于元数据所记载目标运算数的实际数据类型，运算核120可以正确执行目前指令以处理目标运算数。To sum up, the operation core 120 can check the data type information of the current instruction to determine whether the data type of the target operand corresponding to the current instruction is a fixed (specified) data type or an adaptive data type. The fixed data type means that the data type of the target operand is known at compile time. The adaptive data type means that the data type of the target operand is unknown when compiling, but is determined dynamically when the program is executed. When the program is actually executed, the actual data type of the target operand is recorded in the metadata corresponding to the target operand. Before the computing core 120 executes the current instruction, the computing core 120 may check the data type of the target operand corresponding to the current instruction. When the data type of the target operand is an adaptive data type, the calculation core 120 may obtain the actual data type of the target operand from the metadata corresponding to the target operand. Based on the actual data type of the target operand recorded in the metadata, the computing core 120 can correctly execute the current instruction to process the target operand.

最后应说明的是：以上各实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述各实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分或者全部技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than limiting them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present invention. scope.

Claims

1. A method of operation of a computing device, the method of operation comprising:

checking data type information carried by a current instruction, wherein the data type information represents a data type of a target operand corresponding to the current instruction;

when the data type information indicates that the data type of the target operand is an adaptive data type, reading metadata corresponding to the target operand to obtain the actual data type of the target operand from the metadata, or directly obtaining the actual data type of the target operand; and

based on the actual data type of the target operand that is recorded by the metadata, the current instruction is executed to process the target operand.

2. The method of claim 1, wherein the adaptive data type indicates that a data type of the target operand is unknown at compile time and is dynamically determined at execution time.

3. The method of operation of claim 1, further comprising:

when the data type information indicates that the data type of the target operand is a fixed data type, the current instruction is executed to process the target operand based on the fixed data type.

4. The method of operation of claim 3, wherein the fixed data type comprises a 4-bit signed integer, an 8-bit unsigned integer, an 8-bit floating point, an 8-bit brain floating point, a 16-bit signed integer, a standard 16-bit floating point, a 16-bit brain floating point, a standard 32-bit floating point, a 32-bit fast floating point, or a floating point above 32 bits.

5. The method of claim 4, wherein the fixed data type is a 4-bit signed integer s4 when the data type information is a first value, the fixed data type is an 8-bit signed integer s8 when the data type information is a second value, the fixed data type is an 8-bit unsigned integer u8 when the data type information is a third value, the fixed data type is a standard 16-bit floating point f16 when the data type information is a fourth value, the fixed data type is a standard 32-bit floating point f32 when the data type information is a fifth value, the fixed data type is a 16-bit brain floating point bf16 when the data type information is a sixth value, the fixed data type is an 8-bit floating point f8 with a 4-bit exponent when the data type information is a seventh value, the fixed data type is an 8-bit floating point with a 5-bit exponent when the data type information is an eighth value, and the data type is adapted to the target data type operation.

6. The method of claim 1, wherein the current instruction comprises a load instruction, the method further comprising:

when the data type information carried by the load instruction indicates that the data type of the target operand is the self-adaptive data type, reading the metadata corresponding to the target operand from a memory so as to obtain the actual data type of the target operand from the metadata;

storing the metadata and the actual data type recorded by the metadata in a state register of an operation core;

reading the target operand from the memory based on the actual data type of the target operand recorded by the metadata; and

depositing the target operand in an operand buffer of the arithmetic core.

7. The method of operation of claim 6, wherein the arithmetic core comprises a tensor core, a universal matrix multiplication core, or an arithmetic logic unit.

8. The method of operation of claim 1 wherein the current instruction comprises a matrix multiply and accumulate instruction, the target operand comprises a first operand and a second operand, the metadata comprises a first metadata and a second metadata, the first metadata corresponds to the first operand, the second metadata corresponds to the second operand, the method further comprising:

when the data type information carried by the matrix multiply-accumulate instruction indicates that the data type of the first operand is the self-adaptive data type, directly acquiring the actual data type of the first operand from a state register of an operation core;

when the data type information carried by the matrix multiply and accumulate instruction indicates that the data type of the second operand is the adaptive data type, directly acquiring the actual data type of the second operand from the state register of the operation core;

reading the first operand and the second operand from an operand buffer of the arithmetic core based on the actual data type of the first operand and the actual data type of the second operand; and

performing a matrix multiplication calculation on the first operand and the second operand.

9. The method of claim 1, wherein the metadata comprises an actual data type field for describing the actual data type of the target operand to which the metadata corresponds.

10. The method of operation of claim 9, wherein the real data type specified by the real data type field comprises an 8-bit floating point number having a first structure, an 8-bit floating point number having a second structure, or a 16-bit floating point number having a third structure.

11. The method of operation of claim 10 wherein the first structure is a "1-bit symbol, a 5-bit exponent, and a 2-bit mantissa", the second structure is a "1-bit symbol, a 4-bit exponent, and a 3-bit mantissa", and the third structure is a "1-bit symbol, a 5-bit exponent, and a 10-bit mantissa".

12. The method of operation of claim 9 wherein the metadata further comprises a scale factor field to document an amount of shift in the exponent of the target operand.

13. The method of operation of claim 1, further comprising:

generating a calculation result through an operation core;

counting the numerical characteristics of the calculation result through the operation core to generate a statistical result;

converting, by the computation core, the computation result into the target operand and the metadata based on the statistical result; and

and storing the target operand and the metadata in a memory.

14. A machine-readable storage medium storing non-transitory machine-readable instructions which, when executed by a computer, may implement a method of operation of the computing device of any of claims 1-13.

15. A computing device, the computing device comprising:

a memory for storing a target operand; and

an operation core coupled to the memory, wherein,

the arithmetic core checks data type information carried by a current instruction, wherein the data type information represents the data type of the target operand corresponding to the current instruction;

when the data type information indicates that the data type of the target operand is an adaptive data type, the arithmetic core reads metadata corresponding to the target operand to obtain an actual data type of the target operand from the metadata, or directly obtains the actual data type of the target operand; and

the arithmetic core executes the current instruction to process the target operand based on the actual data type of the target operand documented by the metadata.

16. The computing device of claim 15, wherein the adaptive data type represents a data type of the target operand that is unknown at compile time and dynamically determined at execution time.

17. The computing device of claim 15,

when the data type information indicates that the data type of the target operand is a fixed data type, the arithmetic core executes the current instruction to process the target operand based on the fixed data type.

18. The computing device of claim 17, wherein the fixed data type comprises a 4-bit signed integer, an 8-bit unsigned integer, an 8-bit floating point number, an 8-bit brain floating point number, a 16-bit signed integer, a standard 16-bit floating point number, a 16-bit brain floating point number, a standard 32-bit floating point number, a 32-bit fast floating point number, or a floating point number greater than 32 bits.

19. The computing device of claim 18, wherein the fixed data type is a 4-bit signed integer s4 when the data type information is a first value, the fixed data type is an 8-bit signed integer s8 when the data type information is a second value, the fixed data type is an 8-bit unsigned integer u8 when the data type information is a third value, the fixed data type is a standard 16-bit floating point f16 when the data type information is a fourth value, the fixed data type is a standard 32-bit floating point f32 when the data type information is a fifth value, the fixed data type is a 16-bit brain floating point bf16 when the data type information is a sixth value, the fixed data type is an 8-bit floating point f8 with a 4-bit exponent when the data type information is a seventh value, the fixed data type is an 8-bit brain floating point bf8 with a 5-bit exponent when the data type information is an eighth value, and the data type is adaptable to the target data type operation when the data type information is a ninth value.

20. The computing device of claim 15, wherein the current instruction comprises a load instruction, and wherein the arithmetic core comprises:

a load unit coupled to the memory, wherein when the data type information carried by the load instruction indicates that the data type of the target operand is the adaptive data type, the load unit reads the metadata corresponding to the target operand from the memory to obtain the actual data type of the target operand from the metadata, and the load unit reads the target operand from the memory based on the actual data type of the target operand recorded by the metadata;

a status register coupled to the load unit, wherein the load unit stores the metadata and the actual data type recorded by the metadata in the status register; and

an operand buffer coupled to the load unit, wherein the load unit stores the target operand in the operand buffer.

21. The computing device of claim 20, wherein the computational core further comprises:

an arithmetic circuit coupled to the status register and the operand buffer, wherein the arithmetic circuit comprises a tensor core, a universal matrix multiplication core, or an arithmetic logic unit.

22. The computing device of claim 15, wherein the current instruction comprises a matrix multiply and accumulate instruction, and wherein the arithmetic core comprises:

an operand buffer to store the target operand, wherein the target operand comprises a first operand and a second operand;

a status register for storing the metadata and the actual data type recorded by the metadata, wherein the metadata includes a first metadata and a second metadata, the first metadata corresponds to the first operand, and the second metadata corresponds to the second operand; and

an arithmetic circuit coupled to the status register and the operand buffer, wherein

When the data type information carried by the matrix multiply and accumulate instruction indicates that the data type of the first operand is the adaptive data type, the arithmetic circuit directly acquires the actual data type of the first operand from the state register;

when the data type information carried by the matrix multiply-and-accumulate instruction indicates that the data type of the second operand is the adaptive data type, the arithmetic circuit directly acquires the actual data type of the second operand from the status register;

based on the actual data type of the first operand and the actual data type of the second operand, the arithmetic circuitry reads the first operand and the second operand from the operand buffer; and

the arithmetic circuit performs a matrix multiplication calculation on the first operand and the second operand.

23. The computing device of claim 15, wherein the metadata comprises an actual data type field to record the actual data type of the target operand to which the metadata corresponds.

24. The computing device of claim 23, wherein the actual data type recited in the actual data type field comprises an 8-bit floating point number having a first structure, an 8-bit floating point number having a second structure, or a 16-bit floating point number having a third structure.

25. The computing device of claim 24, wherein the first structure is a "1-bit symbol, a 5-bit exponent, and a 2-bit mantissa", wherein the second structure is a "1-bit symbol, a 4-bit exponent, and a 3-bit mantissa", and wherein the third structure is a "1-bit symbol, a 5-bit exponent, and a 10-bit mantissa".

26. The computing device of claim 23, wherein the metadata further comprises a scale factor field to document an amount of shift in exponent for the target operand.

27. The computing device as claimed in claim 15, wherein the computing core generates a computation result, the computing core performs statistics on a numerical characteristic of the computation result to generate a statistical result, the computing core converts the computation result into the target operand and the metadata based on the statistical result, and the computing core stores the target operand and the metadata in the memory.