CN115344826A - Computing device, method of operation, and machine-readable storage medium - Google Patents
Computing device, method of operation, and machine-readable storage medium Download PDFInfo
- Publication number
- CN115344826A CN115344826A CN202210979891.4A CN202210979891A CN115344826A CN 115344826 A CN115344826 A CN 115344826A CN 202210979891 A CN202210979891 A CN 202210979891A CN 115344826 A CN115344826 A CN 115344826A
- Authority
- CN
- China
- Prior art keywords
- data type
- operand
- bit
- metadata
- target operand
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/52—Multiplying; Dividing
- G06F7/523—Multiplying only
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Algebra (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Executing Machine-Instructions (AREA)
Abstract
Description
技术领域technical field
本发明涉及一种指令集,且特别涉及一种计算装置、操作方法和机器可读存储介质。The present invention relates to an instruction set, and in particular to a computing device, an operating method and a machine-readable storage medium.
背景技术Background technique
一般而言,在编写计算程序(program)时,编写者是知道运算数(或称操作数、operand)的数据类型,因此编写者可以将带有“指定(固定)数据类型”信息的指令编写至计算程序中,然后使用编译器(compiler)对计算程序进行编译。举例来说,计算程序可能包括自带有信息“32位的浮点数”(固定数据类型)的加载指令,用以将数据类型为“32位浮点数”的一个运算数从内存加载至运算核(例如张量核,tensor core)。或者,计算程序可能包括自带有信息“32位的浮点数”(固定数据类型)的矩阵乘和累加(matrix multiply andaccumulation,MMA)指令,用以使运算核对已被加载的“32位浮点数”的两个运算数进行矩阵乘计算。无论如何,在有一些情况下运算数的数据类型可能在编译时是不确定的(数据类型为未知)。举例来说,卷积神经网络(Convolutional Neural Network,CNN)运算程序的隐藏层(Hidden layer)的计算结果的数据类型可能在实际执行计算后才会被动态决定。然而,目前的指令集并没有支持“不确定数据类型”。Generally speaking, when writing a calculation program (program), the writer knows the data type of the operand (or operand, operand), so the writer can write instructions with "specified (fixed) data type" information into the calculation program, and then use a compiler to compile the calculation program. For example, the calculation program may include a load instruction with the information "32-bit floating point number" (fixed data type), which is used to load an operand whose data type is "32-bit floating point number" from the memory to the calculation core (eg tensor core, tensor core). Alternatively, the calculation program may include matrix multiply and accumulate (MMA) instructions with information "32-bit floating-point numbers" (fixed data type) to make the calculation check the loaded "32-bit floating-point numbers The two operands of " perform matrix multiplication calculation. However, there are cases where the data type of an operand may be undefined at compile time (data type is unknown). For example, the data type of the calculation result of the hidden layer (Hidden layer) of the Convolutional Neural Network (CNN) operation program may be dynamically determined after the calculation is actually performed. However, the current instruction set does not support "indeterminate data types".
发明内容Contents of the invention
本发明提供一种计算装置及其操作方法,以及机器可读存储介质,以支持自适应数据类型(adaptive data type)。自适应数据类型意为,数据类型在编译(compile)时为未知。The present invention provides a computing device and its operating method, as well as a machine-readable storage medium to support adaptive data types. Adaptive data types mean that the data type is unknown at compile time.
在根据本发明的实施例中,所述操作方法包括:检查目前指令自带的数据类型信息,其中数据类型信息表示目前指令所对应的目标运算数(operand)的数据类型;当数据类型信息表示目标运算数的数据类型为自适应数据类型时,读取目标运算数所对应的元数据(meta data),以从元数据获知目标运算数的实际数据类型,或者直接获取所述目标运算数的所述实际数据类型;以及基于元数据所记载目标运算数的实际数据类型,执行目前指令以处理目标运算数。In an embodiment according to the present invention, the operation method includes: checking the data type information carried by the current instruction, wherein the data type information indicates the data type of the target operand (operand) corresponding to the current instruction; when the data type information indicates When the data type of the target operand is an adaptive data type, read the metadata (meta data) corresponding to the target operand to obtain the actual data type of the target operand from the metadata, or directly obtain the the actual data type; and based on the actual data type of the target operand recorded in the metadata, execute the current instruction to process the target operand.
在根据本发明的实施例中,所述机器可读存储介质用于存储非暂时性机器可读指令。当所述非暂时性机器可读指令由计算机执行时,可以实现所述计算装置的操作方法。In an embodiment according to the present invention, the machine-readable storage medium is used to store non-transitory machine-readable instructions. The operating method of the computing device may be implemented when the non-transitory machine readable instructions are executed by a computer.
在根据本发明的实施例中,所述计算装置包括内存以及运算核。内存用以存放目标运算数。运算核耦接至内存。运算核检查目前指令自带的数据类型信息,其中数据类型信息表示目前指令所对应的目标运算数的数据类型。当数据类型信息表示目标运算数的数据类型为自适应数据类型时,运算核读取目标运算数所对应的元数据,以从所述元数据获知目标运算数的实际数据类型,或者直接获取所述目标运算数的所述实际数据类型。基于元数据所记载目标运算数的实际数据类型,运算核执行目前指令以处理目标运算数。In an embodiment according to the present invention, the computing device includes a memory and a computing core. Memory is used to store the destination operand. The computing core is coupled to the memory. The operation core checks the data type information of the current instruction, wherein the data type information indicates the data type of the target operand corresponding to the current instruction. When the data type information indicates that the data type of the target operand is an adaptive data type, the operation core reads the metadata corresponding to the target operand, so as to obtain the actual data type of the target operand from the metadata, or directly obtain the The actual data type of the target operand. Based on the actual data type of the target operand recorded in the metadata, the operation core executes the current instruction to process the target operand.
基于上述,运算核可以检查目前指令自带的数据类型信息来判断目前指令所对应的目标运算数的数据类型是固定(指定)数据类型还是自适应数据类型。所述固定数据类型意指,目标运算数的数据类型在编译时为已知。所述自适应数据类型意指,目标运算数的数据类型在编译时为未知,而在执行程序时被动态决定。在执行程序时目标运算数的实际数据类型被记录于目标运算数所对应的元数据。在运算核执行目前指令前,运算核检查目前指令的目标运算数的数据类型。当目标运算数的数据类型为自适应数据类型时,运算核可以从目标运算数所对应的元数据获知目标运算数的实际数据类型,或者直接获取所述目标运算数的所述实际数据类型。基于元数据所记载目标运算数的实际数据类型,运算核可以正确执行目前指令以处理目标运算数。Based on the above, the operation core can check the data type information of the current instruction to determine whether the data type of the target operand corresponding to the current instruction is a fixed (specified) data type or an adaptive data type. The fixed data type means that the data type of the target operand is known at compile time. The adaptive data type means that the data type of the target operand is unknown when compiling, but is determined dynamically when the program is executed. The actual data type of the target operand is recorded in the metadata corresponding to the target operand when the program is executed. Before the operation core executes the current instruction, the operation core checks the data type of the target operand of the current instruction. When the data type of the target operand is an adaptive data type, the operation core may obtain the actual data type of the target operand from the metadata corresponding to the target operand, or directly obtain the actual data type of the target operand. Based on the actual data type of the target operand recorded in the metadata, the computing core can correctly execute the current instruction to process the target operand.
附图说明Description of drawings
图1是依照本发明的一实施例的一种计算装置的电路方块(circuit block)示意图。FIG. 1 is a schematic diagram of a circuit block of a computing device according to an embodiment of the present invention.
图2是依照本发明的一实施例的一种计算装置的操作方法的流程示意图。FIG. 2 is a schematic flowchart of an operating method of a computing device according to an embodiment of the present invention.
图3是依照本发明的一实施例所绘示,运算核的电路方块示意图。FIG. 3 is a schematic diagram of a circuit block diagram of a computing core according to an embodiment of the present invention.
图4是依照本发明的另一实施例所绘示,运算核的电路方块示意图。FIG. 4 is a schematic circuit block diagram of a computing core according to another embodiment of the present invention.
附图标记说明Explanation of reference signs
100:计算装置100: computing device
110:内存110: memory
120:运算核120: computing core
121:运算电路121: Operation circuit
122、126:运算数缓冲器122, 126: operand buffer
123:转换单元123: conversion unit
124:加载单元124: load unit
125:状态寄存器125: status register
Dconv:运算数Dconv: Operand
Dm:元数据Dm: metadata
Dorig:计算结果Dorig: calculation result
S210~S250:步骤S210~S250: steps
ST:统计结果ST: Statistical results
具体实施方式Detailed ways
现将详细地参考本发明的示范性实施例,示范性实施例的实例说明于附图中。只要有可能,相同组件符号在附图和描述中用来表示相同或相似部分。Reference will now be made in detail to the exemplary embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used in the drawings and description to refer to the same or like parts.
在本案说明书全文(包括权利要求)中所使用的“耦接(或连接)”一词可指任何直接或间接的连接手段。举例而言,若文中描述第一装置耦接(或连接)于第二装置,则应该被解释成该第一装置可以直接连接于该第二装置,或者该第一装置可以透过其他装置或某种连接手段而间接地连接至该第二装置。本案说明书全文(包括权利要求)中提及的“第一”、“第二”等用语是用以命名组件(element)的名称,而并非用来限制组件数量的上限或下限,亦非用来限制组件的次序。另外,凡可能之处,在附图及实施方式中使用相同标号的组件/构件/步骤代表相同或类似部分。不同实施例中使用相同标号或使用相同用语的组件/构件/步骤可以相互参照相关说明。As used throughout this specification, including the claims, the term "coupled (or connected)" may refer to any means of connection, direct or indirect. For example, if it is described that a first device is coupled (or connected) to a second device, it should be interpreted that the first device can be directly connected to the second device, or the first device can be connected through other devices or The second device is indirectly connected to the second device through some connection means. The terms "first" and "second" mentioned in the entire description of this case (including the claims) are used to name the name of the component (element), not to limit the upper or lower limit of the number of components, nor to limit the number of components. Restricts the order of components. In addition, wherever possible, components/members/steps using the same reference numerals in the drawings and embodiments represent the same or similar parts. Components/components/steps using the same symbols or using the same terms in different embodiments can refer to related descriptions.
图1是依照本发明的一实施例的一种计算装置100的电路方块(circuit block)示意图。图1所示计算装置100包括内存110以及运算核120。内存110用以存放运算数(operand)。本实施利并不限制运算数的具体数据结构。举例来说,在神经网络的应用中,运算数可以是矢量(vector)、张量(tensor)或是其他数据。基于实际设计,运算数可以是一个矩阵(matrix),也可以是一个矩阵被分切后的多个块中的任一个。所述矩阵的大小与所述块的大小可以依照实际设计来决定。举例来说,在一些应用例中,一个块(运算数)的大小可以是32*32、64*64或是其他尺寸。FIG. 1 is a schematic diagram of a circuit block of a
运算核120耦接至内存110。在不同的应用范例中,所述运算核120包括张量核(tensor core)、通用矩阵乘(general matrix multiply,GEMM)核、算术逻辑单元(arithmetic logic unit,ALU)以及/或是其他运算单元。依照不同的设计需求,在一些实施例中,上述运算核120的实现方式可以是硬件(hardware)电路。在另一些实施例中,运算核120的实现方式可以是固件(firmware)、软件(software,即程序)或是前述二者的组合形式。在又一些实施例中,运算核120的实现方式可以是硬件、固件、软件中的多者的组合形式。The
以硬件形式而言,上述运算核120可以实现于集成电路(integrated circuit)上的逻辑电路。举例来说,运算核120的相关功能可以被实现于一或多个控制器、微控制器(Microcontroller)、微处理器(Microprocessor)、特殊应用集成电路(Application-specific integrated circuit,ASIC)、数字信号处理器(digital signal processor,DSP)、场可程序逻辑门阵列(Field Programmable Gate Array,FPGA)及/或其他处理单元中的各种逻辑区块、模块和电路。运算核120的相关功能可以利用硬件描述语言(hardwaredescription languages,例如Verilog HDL或VHDL)或其他合适的编程语言来实现为硬件电路,例如集成电路中的各种逻辑区块、模块和电路。In terms of hardware, the
以软件形式及/或固件形式而言,上述运算核120的相关功能可以被实现为编程码(programming codes)。例如,利用一般的编程语言(programming languages,例如C、C++或汇编语言)或其他合适的编程语言来实现运算核120。所述编程码可以被记录/存放在非临时的机器可读存储介质(non-transitory machine-readable storage medium)中。在一些实施例中,所述机器可读存储介质例如包括半导体内存以及(或是)存储装置。所述半导体内存包括记忆卡、只读存储器(Read Only Memory,ROM)、闪存(FLASH memory)、可程序设计的逻辑电路或是其他半导体内存。所述存储装置包括带(tape)、碟(disk)、硬盘(hard diskdrive,HDD)、固态硬盘(Solid-state drive,SSD)或是其他存储装置。电子设备(例如计算机、中央处理器(Central Processing Unit,CPU)、控制器、微控制器或微处理器)可以从所述机器可读存储介质中读取并执行所述编程码,从而实现运算核120的相关功能。或者,所述编程码可经由任意传输媒体(例如通信网路或广播电波等)而提供给所述电子设备。所述通信网路例如是因特网(Internet)、有线通信(wired communication)网络、无线通信(wireless communication)网络或其它通信介质。In terms of software and/or firmware, the related functions of the
图2是依照本发明的一实施例的一种计算装置的操作方法的流程示意图。在一些实施例中,图2所示计算装置的操作方法可以实现于固件(firmware)或软件(software,即程序)。例如,图2所示计算装置的操作方法的相关操作可以被实现为非暂时性机器可读指令(编程码或程序),而所述非暂时性机器可读指令可以被存储在机器可读存储介质。当非暂时性机器可读指令由计算机执行时可以实现图2所示计算装置的操作方法。在另一些实施例中,图2所示计算装置的操作方法可以实现于硬件,例如实现于图1所示计算装置100。FIG. 2 is a schematic flowchart of an operating method of a computing device according to an embodiment of the present invention. In some embodiments, the operation method of the computing device shown in FIG. 2 may be implemented in firmware or software (software, ie a program). For example, the relevant operations of the operating method of the computing device shown in FIG. 2 can be implemented as non-transitory machine-readable instructions (programming codes or programs), and the non-transitory machine-readable instructions can be stored in a machine-readable storage medium. The operating method of the computing device shown in FIG. 2 can be implemented when the non-transitory machine-readable instructions are executed by a computer. In other embodiments, the operation method of the computing device shown in FIG. 2 may be implemented in hardware, such as the
运算核120可以从内存110提取指令(以下称目前指令)。例如,运算核120可以从内存110提取加载(load)指令、矩阵乘和累加(matrix multiply and accumulation,MMA)指令或是其他指令。一般而言,假设目前指令是用以处理一个或多个运算数,则这个目前指令自带了所述一个或多个运算数的数据类型信息。所述数据类型信息表示目前指令所对应的目标运算数的数据类型。假设数据类型在编译时为已知,则所述数据类型信息表示任何“固定(指定)数据类型”。举例来说,依照实际应用情境,所述固定数据类型可以是4位有符号整数(signed integer)s4、8位有符号整数s8、8位无符号整数(unsigned integer)u8、8位浮点数f8、8位脑浮点数(brain float)bf8、16位有符号整数s16、标准16位浮点数(standard16-bit float)f16、16位脑浮点数bf16、标准32位浮点数f32、32位快速浮点数(fastfloat)ff32或是32位以上浮点数f32+。此外,依照实际应用情境,所述运算数可以是纯量(scalar)、矢量(vector)、矩阵(matrix)、张量(Tensor)或是其他运算数。举例来说,在神经网络的应用中,基于实际设计,所述运算数可以是一个矩阵被分切后的多个块中的任一个。所述矩阵的大小与所述块的大小可以依照实际设计来决定。举例来说,在一些应用例中,一个块(运算数)的大小可以是32*32、64*64或是其他尺寸。The
在有一些情况下,目前指令所对应的目标运算数的数据类型可能在编译时是不确定的(数据类型暂时未知)。举例来说,卷积神经网络(Convolutional Neural Network,CNN)运算程序的隐藏层(Hidden layer)的计算结果的数据类型可能在实际执行计算后才会被动态决定。本实施例的指令集可以支持“不确定数据类型”,亦即自适应数据类型(adaptive data type)。所述自适应数据类型表示,所述目标运算数的数据类型在编译时为未知(不确定),而在实际执行时被动态决定。In some cases, the data type of the target operand corresponding to the current instruction may not be determined at compile time (the data type is temporarily unknown). For example, the data type of the calculation result of the hidden layer (Hidden layer) of the Convolutional Neural Network (CNN) operation program may be dynamically determined after the calculation is actually performed. The instruction set of this embodiment can support "uncertain data type", that is, adaptive data type (adaptive data type). The adaptive data type means that the data type of the target operand is unknown (undefined) at compile time, but is dynamically determined at actual execution time.
请参照图1与图2。在步骤S210中,运算核120可以从检查目前指令自带的数据类型信息。当所述数据类型信息表示目前指令的所有目标运算数的数据类型皆为固定数据类型时(步骤S220的判断结果为“否”),运算核120可以基于所述固定数据类型而执行目前指令以处理目标运算数(步骤S230)。依照实际设计,在一些实施例中,步骤S230可以是公知的作法,故在此不予赘述。Please refer to Figure 1 and Figure 2. In step S210, the
所述数据类型信息的具体内容可以依照实际设计来决定。举例来说(但不限于此),所述数据类型信息可以包括4位编码位。当所述4位编码位(数据类型信息)为第一值(例如0)时,表示所述固定数据类型为4位有符号整数s4。当数据类型信息为第二值(例如1)时,表示所述固定数据类型为8位有符号整数s8。当所述数据类型信息为第三值(例如2)时,表示所述固定数据类型为8位无符号整数u8。当所述数据类型信息为第四值(例如3)时,表示所述固定数据类型为标准16位浮点数f16。当所述数据类型信息为第五值(例如4)时,表示所述固定数据类型为标准32位浮点数f32。当所述数据类型信息为第六值(例如5)时,表示所述固定数据类型为16位脑浮点数bf16。当所述数据类型信息为第七值(例如9)时,表示所述固定数据类型为带有4位指数(exponent)的8位浮点数f8。当所述数据类型信息为第八值(例如10)时,表示所述固定数据类型为带有5位指数的8位脑浮点数bf8。当所述数据类型信息为第九值(例如15)时,表示所述目标运算数的数据类型为自适应数据类型。The specific content of the data type information may be determined according to actual design. For example (but not limited thereto), the data type information may include 4 coded bits. When the 4-bit coded bit (data type information) is the first value (for example, 0), it means that the fixed data type is a 4-bit signed integer s4. When the data type information is the second value (for example, 1), it means that the fixed data type is an 8-bit signed integer s8. When the data type information is a third value (for example, 2), it means that the fixed data type is an 8-bit unsigned integer u8. When the data type information is the fourth value (for example, 3), it means that the fixed data type is a standard 16-bit floating point number f16. When the data type information is the fifth value (for example, 4), it means that the fixed data type is a standard 32-bit floating point number f32. When the data type information is the sixth value (for example, 5), it means that the fixed data type is a 16-bit brain floating point number bf16. When the data type information is the seventh value (for example, 9), it means that the fixed data type is an 8-bit floating point number f8 with a 4-bit exponent. When the data type information is the eighth value (for example, 10), it means that the fixed data type is an 8-bit brain floating point number bf8 with a 5-bit exponent. When the data type information is the ninth value (for example, 15), it indicates that the data type of the target operand is an adaptive data type.
当所述数据类型信息表示目前指令的任何一个目标运算数的数据类型为自适应数据类型时(步骤S220的判断结果为“是”),运算核120可以读取目标运算数所对应的元数据(meta data),以从所述元数据获知目标运算数的实际数据类型,或者直接获取所述目标运算数的所述实际数据类型(步骤S240)。所述元数据的具体内容可以依照实际设计来决定。举例来说(但不限于此),所述元数据可以包括实际数据类型字段,用以记载元数据所对应的目标运算数的实际数据类型。作为诸多范例的其中一个,所述实际数据类型字段所记载的实际数据类型包括具有第一结构的8位浮点数、具有第二结构的8位浮点数或是具有第三结构的16位浮点数。举例来说,假设所述实际数据类型字段包括2位编码数。当编码数为0时表示,目标运算数的实际数据类型为具有“1位符号(sign)、5位指数(exponent)以及2位尾数(mantissa)”(第一结构)的8位浮点数。其中,所述符号用来表示正负号。当编码数为1时表示,目标运算数的实际数据类型为“1位符号、4位指数以及3位尾数”(第二结构)的8位浮点数。当编码数为2时表示,目标运算数的实际数据类型为“1位符号、5位指数以及10位尾数”(第三结构)的16位浮点数。When the data type information indicates that the data type of any target operand of the current instruction is an adaptive data type (the judgment result of step S220 is "Yes"), the
在一些实施例中,所述元数据还可以包括缩放因子(scaling factor)字段,用以记载目标运算数的指数的移动量(offset)。运算核120可以根据目标运算数(例如区块)中每一个元素的指数值的范围,将长格式(long-format)数据转换为短格式数据。举例来说,假设某一个区块(目标运算数)中所有元素的指数值的范围为10~20,运算核120可以将指数值范围10~20移动至指数值范围0~10,以及将移动量“-10”记载在元数据的缩放因子字段。因此,当运算核120执行目前指令时,运算核120可以依据元数据的缩放因子字段的移动量“-10”,将目标运算数中所有元素的指数值的范围从“0~10”恢复为“10~20”。In some embodiments, the metadata may further include a scaling factor field, which is used to record the displacement (offset) of the exponent of the target operand. The
基于所述元数据所记载所述目标运算数的所述实际数据类型,运算核120可以执行所述目前指令以处理目标运算数(步骤S250)。举例来说,假设目前指令包括加载指令。当加载指令自带的数据类型信息表示目标运算数的数据类型为自适应数据类型时,运算核120可以从内存110读取目标运算数所对应的元数据,以从元数据获知目标运算数的实际数据类型(步骤S240),然后将元数据以及实际数据类型记载在运算核120内部的寄存器(register),例如状态寄存器(state register)或是其他寄存器。因此,基于所述元数据所记载的目标运算数的实际数据类型,运算核120可以执行加载指令(目前指令)以将目标运算数从内存110载入运算核120。又假设目前指令包括矩阵乘和累加(MMA)指令。当MMA指令自带的数据类型信息表示目标运算数的数据类型为自适应数据类型时,运算核120可以从自己内部的寄存器直接获取目标运算数的实际数据类型(步骤S240)。Based on the actual data type of the target operand recorded in the metadata, the
图3是依照本发明的一实施例所绘示,运算核120的电路方块示意图。图3所示运算核120包括运算电路121、运算数缓冲器(operand buffer)122以及转换单元(conversionunit)123。运算电路121完成前一层计算后生成计算结果,并将计算结果Dorig存放至运算数缓冲器122。依照实际设计,在不同实施例中,运算数缓冲器122可以被配置在运算电路121内部,或被配置在运算电路121外部,或被配置在归约缓冲区(reduction buffer)中,或被配置在线程本地寄存器(thread local registers)中。此外,运算电路121可以统计所述计算结果Dorig的数值特征而生成统计结果ST给转换单元123。FIG. 3 is a schematic circuit block diagram of the
运算数缓冲器122可以提供计算结果Dorig给转换单元123。转换单元123基于统计结果ST将计算结果Dorig转换为具有适于下一层计算的数据类型的运算数Dconv(目标运算数)与对应于所述运算数Dconv的元数据Dm。实际执行时,转换单元123动态决定所述运算数Dconv的数据类型,因此转换单元123将运算数Dconv的实际数据类型记载在运算数Dconv所对应的元数据Dm,然后将运算数Dconv与元数据Dm存放在内存110。The
图4是依照本发明的另一实施例所绘示,运算核120的电路方块示意图。图4所示运算核120包括运算电路121、加载单元(load unit)124、状态寄存器(state register)125以及运算数缓冲器126。加载单元124耦接至内存110。状态寄存器125耦接于加载单元124与运算电路121之间。运算数缓冲器126耦接于加载单元124与运算电路121之间。依照实际设计,在不同实施例中,运算电路121可以包括张量核(tensor core)、通用矩阵乘(GEMM)核、算术逻辑单元(ALU)以及/或是其他运算单元。FIG. 4 is a schematic circuit block diagram of the
作为一个说明范例,假设目前指令包括加载指令。当加载指令自带的数据类型信息表示目标运算数的数据类型为自适应数据类型时,加载单元124可以从内存110读取目标运算数所对应的元数据,以从元数据获知目标运算数的实际数据类型。加载单元124将所述元数据存放在状态寄存器125,以供运算电路121使用。此外,加载单元124可以基于元数据所记载“目标运算数的实际数据类型”从内存110读取目标运算数。加载单元124将所述目标运算数存放在运算数缓冲器126,以供运算电路121使用。As an illustrative example, assume that the current command includes a load command. When the data type information carried by the load instruction indicates that the data type of the target operand is an adaptive data type, the
作为另一个说明范例,假设目前指令包括矩阵乘和累加(MMA)指令,而此矩阵乘和累加指令所对应的目标运算数(第一运算数与第二运算数)已被先前执行完毕的加载指令加载至运算数缓冲器126。此外,所述第一运算数对应于第一元数据,所述第二运算数对应于第二元数据。从前段说明内容可以类推,先前执行完毕的加载指令可以将所述第一元数据以及所述第一元数据所记载的实际数据类型(所述第一运算数的实际数据类型)与所述第二元数据以及所述第二元数据所记载的实际数据类型(所述第一运算数的实际数据类型)存放在状态寄存器125。当矩阵乘和累加指令自带的数据类型信息表示,第一运算数的数据类型为自适应数据类型时,运算电路121可以从状态寄存器125直接获取第一运算数的实际数据类型。当矩阵乘和累加指令自带的数据类型信息表示,第二运算数的数据类型为自适应数据类型时,运算电路121可以从状态寄存器125直接获取第二运算数的实际数据类型。基于第一运算数的实际数据类型与第二运算数的实际数据类型,运算电路121可以从运算数缓冲器126正确读取第一运算数与第二运算数,以及对第一运算数与第二运算数进行矩阵乘计算。As another illustrative example, assume that the current instruction includes a matrix multiply and accumulate (MMA) instruction, and the target operands (the first operand and the second operand) corresponding to the matrix multiply and accumulate instruction have been loaded by the previously executed load Instructions are loaded into
综上所述,运算核120可以检查目前指令自带的数据类型信息来判断目前指令所对应的目标运算数的数据类型是固定(指定)数据类型还是自适应数据类型。所述固定数据类型意指,目标运算数的数据类型在编译时为已知。所述自适应数据类型意指,目标运算数的数据类型在编译时为未知,而在执行程序时被动态决定。在实际执行程序时,目标运算数的实际数据类型被记录于目标运算数所对应的元数据。在运算核120执行目前指令前,运算核120可以检查目前指令所对应的目标运算数的数据类型。当目标运算数的数据类型为自适应数据类型时,运算核120可以从目标运算数所对应的元数据获知目标运算数的实际数据类型。基于元数据所记载目标运算数的实际数据类型,运算核120可以正确执行目前指令以处理目标运算数。To sum up, the
最后应说明的是:以上各实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述各实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present invention, rather than limiting them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: It is still possible to modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements for some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the technical solutions of the various embodiments of the present invention. scope.
Claims (27)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210979891.4A CN115344826A (en) | 2022-08-16 | 2022-08-16 | Computing device, method of operation, and machine-readable storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210979891.4A CN115344826A (en) | 2022-08-16 | 2022-08-16 | Computing device, method of operation, and machine-readable storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN115344826A true CN115344826A (en) | 2022-11-15 |
Family
ID=83952625
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202210979891.4A Pending CN115344826A (en) | 2022-08-16 | 2022-08-16 | Computing device, method of operation, and machine-readable storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115344826A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2024108836A1 (en) * | 2022-11-24 | 2024-05-30 | 上海壁仞科技股份有限公司 | Computing device, operating method, and machine-readable storage medium |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060236315A1 (en) * | 2005-04-18 | 2006-10-19 | Gilad Bracha | Reifying generic types while maintaining migration compatibility |
| US20090018989A1 (en) * | 2007-07-12 | 2009-01-15 | Oracle International Corporation | Using sql extensibility for processing dynamically typed xml data in xquery queries |
| US20150234783A1 (en) * | 2014-02-20 | 2015-08-20 | International Business Machines Corporation | Iterative refinement apparatus |
| CN110023923A (en) * | 2016-11-27 | 2019-07-16 | 亚马逊科技公司 | It generates data and converts workflow |
| CN110163350A (en) * | 2018-02-13 | 2019-08-23 | 上海寒武纪信息科技有限公司 | A kind of computing device and method |
-
2022
- 2022-08-16 CN CN202210979891.4A patent/CN115344826A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060236315A1 (en) * | 2005-04-18 | 2006-10-19 | Gilad Bracha | Reifying generic types while maintaining migration compatibility |
| US20090018989A1 (en) * | 2007-07-12 | 2009-01-15 | Oracle International Corporation | Using sql extensibility for processing dynamically typed xml data in xquery queries |
| US20150234783A1 (en) * | 2014-02-20 | 2015-08-20 | International Business Machines Corporation | Iterative refinement apparatus |
| CN110023923A (en) * | 2016-11-27 | 2019-07-16 | 亚马逊科技公司 | It generates data and converts workflow |
| CN110163350A (en) * | 2018-02-13 | 2019-08-23 | 上海寒武纪信息科技有限公司 | A kind of computing device and method |
Non-Patent Citations (1)
| Title |
|---|
| 掘金翻译计划: "TypeScript 3.0: unknown 类型", pages 1 - 6, Retrieved from the Internet <URL:https://juejin.im/post/5d04ac745188250a8b1fd203> * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2024108836A1 (en) * | 2022-11-24 | 2024-05-30 | 上海壁仞科技股份有限公司 | Computing device, operating method, and machine-readable storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| KR102471606B1 (en) | Floating-point instruction format with built-in rounding rules | |
| JP6333439B2 (en) | Perform rounding according to instructions | |
| US11175891B2 (en) | Systems and methods to perform floating-point addition with selected rounding | |
| CN1327339C (en) | Method, apparatus and system for power estimation based instruction scheduling | |
| TWI405126B (en) | Microprocessors and methods for executing instruction | |
| CN101859243A (en) | Device and method for controlling precision of dynamic floating point operation register | |
| CN108089882B (en) | Encoding and decoding variable length instructions | |
| JP7385009B2 (en) | Compression support command | |
| US10579338B2 (en) | Apparatus and method for processing input operand values | |
| CN110474645A (en) | For compressing the system of floating data | |
| CN111340207B (en) | Floating point number conversion method and device | |
| CN106528050B (en) | Trailing or leading digit predictor | |
| KR20210028075A (en) | System to perform unary functions using range-specific coefficient sets | |
| CN113805974B (en) | Application-based data type selection | |
| CN115344826A (en) | Computing device, method of operation, and machine-readable storage medium | |
| CN116451769A (en) | Quantification method and electronic equipment of language model | |
| JP6324264B2 (en) | Ternary inner product arithmetic circuit, ternary inner product arithmetic processing program, and arithmetic processing method using ternary inner product arithmetic circuit | |
| TW202311935A (en) | Method and system for execution of a conditional statement by an arithmetic and/or bitwise unit | |
| US20230161555A1 (en) | System and method performing floating-point operations | |
| CN115202617A (en) | Method, system and device for recoding and decoding floating-point number | |
| CN116382782A (en) | Vector operation method, vector operator, electronic device, and storage medium | |
| US20210034329A1 (en) | Parallel rounding for conversion from binary floating point to binary coded decimal | |
| CN100340973C (en) | Processor | |
| US20090094306A1 (en) | Cordic rotation angle calculation | |
| US20250130767A1 (en) | Floating-point conversion circuit |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| CB02 | Change of applicant information | ||
| CB02 | Change of applicant information |
Country or region after: China Address after: 201100 room 1302, 13 / F, building 16, No. 2388, Chenhang highway, Minhang District, Shanghai Applicant after: Shanghai Bi Ren Technology Co.,Ltd. Address before: 201100 room 1302, 13 / F, building 16, No. 2388, Chenhang highway, Minhang District, Shanghai Applicant before: Shanghai Bilin Intelligent Technology Co.,Ltd. Country or region before: China |