CN102665049A - Programmable visual chip-based visual image processing system - Google Patents
Programmable visual chip-based visual image processing system Download PDFInfo
- Publication number
- CN102665049A CN102665049A CN2012100884200A CN201210088420A CN102665049A CN 102665049 A CN102665049 A CN 102665049A CN 2012100884200 A CN2012100884200 A CN 2012100884200A CN 201210088420 A CN201210088420 A CN 201210088420A CN 102665049 A CN102665049 A CN 102665049A
- Authority
- CN
- China
- Prior art keywords
- array
- data
- bit
- pixel
- parallel
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N25/00—Circuitry of solid-state image sensors [SSIS]; Control thereof
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N25/00—Circuitry of solid-state image sensors [SSIS]; Control thereof
- H04N25/70—SSIS architectures; Circuits associated therewith
- H04N25/76—Addressed sensors, e.g. MOS or CMOS sensors
- H04N25/78—Readout circuits for addressed sensors, e.g. output amplifiers or A/D converters
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Processing (AREA)
Abstract
本发明公开了一种基于可编程视觉芯片的视觉图像处理系统,包括图像传感器和多级并行数字处理电路。其中图像传感器主要包括像素阵列、模拟预处理电路阵列和模数转换电路阵列,数字处理电路主要包括像素级并行的处理单元阵列、行并行处理单元阵列、片上人工神经网络和精简指令处理器双核子系统。该系统可实现高速高质量图像采集和多级并行图像处理,可通过编程实现多种高速智能视觉应用,相比传统图像系统具有高速度、高集成、低功耗、低成本的优势。本发明提出了一种实现上述架构的实施例以及基于该实施例的多种高速智能视觉图像处理算法,包括高速运动检测、高速手势识别和快速人脸检测,处理速度可达到1000帧/秒,满足高速实时处理需求。
The invention discloses a visual image processing system based on a programmable visual chip, which includes an image sensor and a multi-stage parallel digital processing circuit. The image sensor mainly includes a pixel array, an analog preprocessing circuit array and an analog-to-digital conversion circuit array, and the digital processing circuit mainly includes a pixel-level parallel processing unit array, a row-parallel processing unit array, an on-chip artificial neural network, and a reduced instruction processor dual-core system. The system can realize high-speed and high-quality image acquisition and multi-level parallel image processing, and can realize a variety of high-speed intelligent vision applications through programming. Compared with traditional image systems, it has the advantages of high speed, high integration, low power consumption, and low cost. The present invention proposes an embodiment to realize the above architecture and various high-speed intelligent visual image processing algorithms based on this embodiment, including high-speed motion detection, high-speed gesture recognition and fast face detection, and the processing speed can reach 1000 frames per second. Meet the high-speed real-time processing requirements.
Description
技术领域 technical field
本发明涉及可编程视觉芯片及图像处理技术领域,尤其涉及一种基于可编程视觉芯片的视觉图像处理系统,具有高速度、高集成、低功耗、低成本的优势,可应用于多种嵌入式高速实时视觉图像处理系统,实现包括高速目标追踪、自然人机交互、环境监控、智能交通、机器人视觉等在内的各种智能视觉图像应用。The present invention relates to the technical field of programmable visual chips and image processing, in particular to a visual image processing system based on programmable visual chips, which has the advantages of high speed, high integration, low power consumption, and low cost, and can be applied to various embedded High-speed real-time visual image processing system realizes various intelligent visual image applications including high-speed target tracking, natural human-computer interaction, environmental monitoring, intelligent transportation, robot vision, etc.
背景技术 Background technique
传统的视觉图像处理系统包括分立的摄像头和通用处理器(或数字信号处理器(DSP)),摄像头使用图像传感器获取图像,并将获取的大量原始图像数据串行传送到通用处理器或DSP中进行处理,由于是串行传送,所以存在严重的带宽限制。另一方面,在通用处理器或DSP中利用软件对图像进行处理往往也是逐个像素串行处理的,存在串行处理的瓶颈。由于串行传输和串行处理的限制,传统视觉图像系统一般只能达到30帧/秒的速度,远远无法满足高速实时性需求,比如某些工业控制系统中经常要求1000帧/秒的速度。The traditional visual image processing system includes a discrete camera and a general-purpose processor (or digital signal processor (DSP)), the camera uses an image sensor to acquire images, and serially transmits a large amount of raw image data acquired to the general-purpose processor or DSP processing, there are severe bandwidth constraints due to serial transfer. On the other hand, using software to process images in a general-purpose processor or DSP is often serially processed pixel by pixel, and there is a bottleneck of serial processing. Due to the limitations of serial transmission and serial processing, traditional visual image systems generally can only reach a speed of 30 frames per second, which is far from meeting the high-speed real-time requirements. For example, some industrial control systems often require a speed of 1000 frames per second .
而视觉芯片的出现有效的满足了高速实时性需求,该视觉芯片模仿人类视觉系统的原理,将图像传感器和图像处理电路集成在同一块芯片内,图像传感器获取的图像数据被并行传送到图像处理电路中,而图像处理电路本身在硬件上是采用像素级大规模并行体系架构,最终图像处理电路输出少量图像特征数据或分析识别结果,从而很好的克服了传统视觉图像处理系统中数据串行传输和串行处理的瓶颈,实时性得到大幅提升,不少采用视觉芯片的系统可以达到1000帧/秒以上的处理速度。The emergence of the vision chip effectively meets the high-speed real-time requirements. The vision chip imitates the principle of the human visual system, integrates the image sensor and the image processing circuit in the same chip, and the image data acquired by the image sensor is transmitted to the image processing in parallel. In the circuit, the image processing circuit itself adopts pixel-level large-scale parallel architecture in hardware, and finally the image processing circuit outputs a small amount of image feature data or analysis and recognition results, thus well overcoming the problem of data serialization in traditional visual image processing systems. The bottleneck of transmission and serial processing, the real-time performance has been greatly improved, and many systems using visual chips can achieve a processing speed of more than 1000 frames per second.
视觉芯片可分为专用视觉芯片和可编程视觉芯片,由于后者可通过编程灵活实现多种应用,应对复杂多变的实际环境,因此具有更大的实用价值。Vision chips can be divided into dedicated vision chips and programmable vision chips. The latter has greater practical value because it can flexibly implement multiple applications through programming and deal with complex and changeable actual environments.
但是,目前国内外对可编程视觉芯片体系架构的研究存在严重不足,表现在:However, there are serious deficiencies in the research on the architecture of programmable vision chips at home and abroad, as shown in:
(1)每一个像素单元都包含感光元、读出电路和处理电路,芯片面积较大,极大地限制了分辨率和填充率,原始图像质量差;而且由于感光元和读出电路是模拟电路,因此处理电路也往往使用模拟电路,导致图像处理的可靠性和灵活性较较差。(1) Each pixel unit includes a photosensitive element, a readout circuit and a processing circuit. The chip area is large, which greatly limits the resolution and filling rate, and the original image quality is poor; and because the photosensitive element and the readout circuit are analog circuits , so the processing circuit often uses analog circuits, resulting in poor reliability and flexibility of image processing.
(2)这些像素单元排列成二维阵列,工作在单指令多数据(SIMD)模式下,可实现全像素并行图像采集及局域处理,但无法实现快速灵活的广域处理;(2) These pixel units are arranged in a two-dimensional array and work in the single instruction multiple data (SIMD) mode, which can realize full-pixel parallel image acquisition and local processing, but cannot realize fast and flexible wide-area processing;
(3)上述工作在SIMD模式下的可编程视觉芯片体系架构支持低级图像处理和部分中级图像处理,但缺乏高级图像处理功能,尤其缺乏类似人脑神经的简单直观的快速特征识别能力,因此必须借助外部通用处理器才能组成完整的视觉图像系统,这样就限制了视觉芯片在某些对体积、功耗和成本有严格要求的嵌入式场合的应用。(3) The above-mentioned programmable vision chip architecture working in SIMD mode supports low-level image processing and some intermediate-level image processing, but lacks advanced image processing functions, especially the simple and intuitive fast feature recognition ability similar to human brain nerves, so it must A complete visual image system can only be formed with the help of an external general-purpose processor, which limits the application of visual chips in some embedded occasions that have strict requirements on volume, power consumption and cost.
发明内容 Contents of the invention
(一)要解决的技术问题(1) Technical problems to be solved
针对以上可编程视觉芯片存在的问题,本发明提供了一种像素单元和处理电路分离的、基于多级并行数字处理的、且带有片上人工神经网络的、基于可编程视觉芯片的视觉图像处理系统,以达到较高的分辨率和填充率,结合局域处理和广域处理功能,支持灵活快速的低、中、高级图像处理和片上反馈控制,实现功能完整的片上视觉系统,通过多种典型的高速智能视觉应用算法,其处理速度可达到1000帧/秒。Aiming at the problems existing in the above programmable visual chip, the present invention provides a visual image processing based on a programmable visual chip that separates pixel units and processing circuits, is based on multi-level parallel digital processing, and has an on-chip artificial neural network. System, in order to achieve higher resolution and fill rate, combined with local area processing and wide area processing functions, supports flexible and fast low, medium, and high-level image processing and on-chip feedback control, and realizes a full-featured on-chip vision system, through a variety of Typical high-speed intelligent vision application algorithm, its processing speed can reach 1000 frames per second.
(二)技术方案(2) Technical solution
为达到上述目的,本发明提供了一种基于可编程视觉芯片的视觉图像处理系统,包括:To achieve the above object, the invention provides a visual image processing system based on a programmable visual chip, comprising:
图像传感器,用于高速采集原始图像数据,并将采集的该原始图像数据并行传输到多级并行数字处理电路;以及an image sensor for collecting raw image data at high speed, and transmitting the collected raw image data in parallel to a multi-stage parallel digital processing circuit; and
多级并行数字处理电路,用于对接收自图像传感器的该原始图像数据进行快速并行处理,输出处理结果。The multi-stage parallel digital processing circuit is used for performing fast parallel processing on the original image data received from the image sensor, and outputting the processing result.
上述方案中,所述图像传感器包括:In the above solution, the image sensor includes:
N×N像素阵列1,用于高速采集原始图像数据,并将采集的该原始图像数据输出给N×1行并行模拟预处理阵列3,其中N为自然数;The N×
N×1行并行模拟预处理阵列3,用于去除该原始图像数据中的固定噪声,提高该原始图像数据的动态范围,并输出给N×1行并行模数转换阵列4;N×1 parallel analog preprocessing array 3, used to remove fixed noise in the original image data, improve the dynamic range of the original image data, and output to N×1 parallel analog-to-digital conversion array 4;
N×1行并行模数转换阵列4,用于将每一列模拟像素数据转换为高精度数字像素数据,并输出给输出像素选择模块5;N×1 parallel analog-to-digital conversion array 4, used to convert each column of analog pixel data into high-precision digital pixel data, and output to the output
输出像素选择模块5,用于并行接收所述N×1行并行模数转换阵列4的N个数字像素数据作为输入,并从中选择M个像素数据作为该图像传感器的输出,实现对像素行的选择,其中M为自然数且M<N;以及The output
图像传感器控制模块6,用于根据内部的参数寄存器控制N×N像素阵列1、N×1行并行模拟预处理阵列3、N×1行并行模数转换阵列4和输出像素选择模块5的工作时序,实现对该图像传感器的动态控制。The image sensor control module 6 is used to control the work of the N×
上述方案中,所述N×N像素阵列1包含N×N个二维排列的像素单元2,其中每个像素单元2均包含感光元和相应的读出电路;所述N×1行并行模拟预处理阵列3包含N个一维排列的模拟预处理单元,其中每个模拟预处理单元均包含用于去除固定噪声的相关双采样(CDS)电路和用于提高动态范围的可控增益放大电路(PGA);所述N×1行并行模数转换阵列4包含N个一维排列的模数转换单元;所述输出像素选择模块5配合图像传感器控制模块6对像素行列的选择,实现对该图像传感器灵活的区域处理和/或亚采样处理。In the above solution, the N×
上述方案中,所述图像传感器控制模块6中的参数寄存器,其中的数据能够通过片上总线接口从模块外部进行读写,实现对该图像传感器的动态控制。In the above solution, the data in the parameter register in the image sensor control module 6 can be read and written from the outside of the module through the on-chip bus interface, so as to realize the dynamic control of the image sensor.
上述方案中,所述图像传感器控制模块6控制所述N×N像素阵列1滚动曝光,并且每次选择其中一列以行并行方式输出N个模拟像素值至所述N×1行并行模拟预处理阵列3,通过所述N×1行并行模拟预处理阵列3进行噪声去除和动态范围提升,然后进入所述N×1行并行模数转换阵列4并行转换为高精度数字像素数据,最后通过所述输出像素选择模块5输出M个数字像素数据作为该图像传感器的最终输出,提供给所述多级并行数字处理电路。In the above solution, the image sensor control module 6 controls the rolling exposure of the N×
上述方案中,所述多级并行数字处理电路包括:In the above solution, the multi-stage parallel digital processing circuit includes:
M×M像素级并行处理单元阵列7,用于对接收自图像传感器的数字像素数据进行适合像素级并行的局域线性处理,并将处理结果输出给M×1行处理单元阵列9,其中M为自然数且M<N;The M×M pixel-level parallel processing unit array 7 is used to perform local linear processing suitable for pixel-level parallel processing on the digital pixel data received from the image sensor, and output the processing result to the M×1 row processing unit array 9, wherein M Is a natural number and M<N;
M×1行处理单元阵列9,用于加速低、中级图像中适合以行并行方式完成的非线性处理和广域处理,实现对图像特征的提取;M×1 row processing unit array 9, used to accelerate the non-linear processing and wide-area processing suitable for row-parallel processing in low- and middle-level images, so as to realize the extraction of image features;
处理阵列控制模块11,用于从其内部变长单指令多数据(SIMD)指令存储器中取出控制所述M×M像素级并行处理单元阵列7和所述M×1行处理单元阵列9的控制指令,并译码输出到所述M×M像素级并行处理单元阵列7和所述M×1行处理单元阵列9;The processing array control module 11 is used to fetch the control of controlling the M×M pixel-level parallel processing unit array 7 and the M×1 row processing unit array 9 from its internal variable-length single instruction multiple data (SIMD) instruction memory. Instructions are decoded and output to the M×M pixel-level parallel processing unit array 7 and the M×1 row processing unit array 9;
片上可配置人工神经网络12,用于完成高级图像处理中的特征识别或特征压缩任务,其输入为所述M×1行处理单元阵列9提取的特征向量数据,输出为特征识别的结果;An artificial
精简指令处理器双核子系统13,用于实现线程级并行的处理,进行高级图像处理中除正常特征识别以外的不规则处理以及对整个系统的控制;The streamlined instruction processor dual-
随机/顺序混合I/O存储器14;random/sequential mixed I/
系统线程标志15;
片上总线16,用于将来自所述精简指令处理器双核子系统13的读写控制信号和逻辑地址信息映射到其他各个总线从器件模块所需的选通使能信号和物理地址信息,以驱动这些从器件模块完成各种操作。The on-
上述方案中,所述M×M像素级并行处理单元阵列7包含M×M个二维排列的像素级并行处理单元PE8,所有像素级并行处理单元PE8工作在单指令多数据(SIMD)模式下,接受相同的PE阵列控制指令,执行相同的操作,但是所操作的数据来自各个单元本地的存储器。In the above scheme, the M×M pixel-level parallel processing unit array 7 includes M×M two-dimensionally arranged pixel-level parallel processing units PE8, and all pixel-level parallel processing units PE8 work in single instruction multiple data (SIMD) mode , accept the same PE array control command, and perform the same operation, but the data operated comes from the local memory of each unit.
上述方案中,所述每个像素级并行处理单元PE8对应于一帧中所述N×N像素阵列1的一个或多个图像像素,当每个像素级并行处理单元PE8对应一个像素时,由于M<N,整个处理单元阵列对应于所述N×N像素阵列1的一个M×M的子区域图像或是整个所述N×N像素阵列1的M×M亚采样图像,此时所述M×M像素级并行处理单元阵列7以全并行方式对一帧分辨率为M×M子图像或亚采样图像进行处理;当每个像素级并行处理单元PE8对应多个像素时,整个所述M×M像素级并行处理单元阵列7对应于整个N×N像素阵列1或是N×N像素阵列1中大于M×M的子区域,此时是以部分像素并行的方式对整帧图像进行处理。In the above scheme, each pixel-level parallel processing unit PE8 corresponds to one or more image pixels of the N×
上述方案中,该视觉图像处理系统是通过图像传感器控制模块6动态切换像素级并行处理单元PE8与图像像素之间的对应方式,由此实现多分辨率视觉图像处理。In the above solution, the visual image processing system uses the image sensor control module 6 to dynamically switch the correspondence between the pixel-level parallel processing unit PE8 and the image pixels, thereby realizing multi-resolution visual image processing.
上述方案中,所述像素级并行处理单元PE8用于完成基本的1比特求和、求反、求与、求或等算术逻辑操作,低中级图像处理中的多比特算术逻辑运算是通过分解为上述基本1比特运算在所述像素级并行处理单元PE8上实现的;所述像素级并行处理单元PE8的数据可与其上、下、左、右的邻近处理单元进行交互传递,通过多次的邻近处理单元数据传递,每个所述像素级并行处理单元PE8可与任意位置的其他处理单元产生交互。In the above scheme, the pixel-level parallel processing unit PE8 is used to complete basic arithmetic logic operations such as 1-bit summation, negation, summation, and summation, and the multi-bit arithmetic logic operations in low-level image processing are decomposed into The above-mentioned basic 1-bit operation is realized on the pixel-level parallel processing unit PE8; the data of the pixel-level parallel processing unit PE8 can be interactively transmitted with its upper, lower, left, and right adjacent processing units, and through multiple adjacent Processing unit data transfer, each pixel-level parallel processing unit PE8 can interact with other processing units at any position.
上述方案中,所述像素级并行处理单元PE8包括第一操作数选择器31、第二操作数选择器32、1比特算术逻辑运算单元33、1比特临时数据寄存器34和位平面随机存储器35,其中:第一操作数选择器31根据所述处理阵列控制模块11输出的控制指令从本单元或邻近处理单元的位平面存储器35的输出中选择一个作为1比特算术逻辑运算单元33的第一操作数;第二操作数选择器32根据所述处理阵列控制模块11输出的控制指令从本单元的1比特临时寄存器34的输出或1比特立即数0和1中选择一个作为1比特算术逻辑运算单元33的第二操作数。In the above solution, the pixel-level parallel processing unit PE8 includes a
上述方案中,所述1比特算术逻辑运算单元33包括:一个全加器、一个非门、一个二输入与门、一个二输入或门、一个进位寄存器以及一个输出结果选择器;其中,所述进位寄存器用于寄存加法运算产生的进位结果,该进位结果用于多比特算术运算,所述进位寄存器能够被所述处理阵列控制模块11输出的控制指令清零;所述输出结果选择器根据所述处理阵列控制模块11输出的控制指令从全加器、非门、与门、或门计算的输出中选择一个作为1比特算术逻辑运算单元33的结果。In the above scheme, the 1-
上述方案中,所述位平面随机存储器35是数据位宽为1比特、支持同时读写的小容量随机存储器,其读写地址来自所述处理阵列控制模块11输出的控制指令,其写入数据来自1比特算术逻辑运算单元33的输出,其读出数据作为本单元或邻近处理单元的第一操作数选择器的输入之一。In the above scheme, the bit-plane
上述方案中,所述处理阵列控制模块11输出的控制指令能够选择将1比特算术逻辑运算单元33的每次输出结果数据写入到所述位平面随机存储器35还是所述1比特临时寄存器34,每次必须且只能写入其中之一。In the above solution, the control instruction output by the processing array control module 11 can select whether to write each output result data of the 1-
上述方案中,所述M×1行处理单元阵列9包含M个一维排列的行并行处理单元RP 10,所有行并行处理单元RP 10工作在单指令多数据(SIMD)模式下,接受相同的RP阵列控制指令,执行相同的操作,但是所操作的数据来自各个单元本地的寄存器;所述每个行并行处理单元RP10用于完成k-bit的算术操作,包括加法、减法、求绝对值、数据移位、以及比较大小,大于k-bit的数据操作能够被分解为若干个小于k-bit的操作串行来完成。In the above scheme, the M × 1 row processing unit array 9 includes M one-dimensionally arranged row parallel
上述方案中,所述每个行并行处理单元RP 10对应于所述M×M像素级并行处理单元阵列7中同一行的所有像素级并行处理单元PE 8,该行每个像素级并行处理单元PE 8的数据能够逐个进入行并行处理单元RP 10被进一步操作。In the above scheme, each row of parallel
上述方案中,所述每个行并行处理单元RP均能够与其上下方的行并行处理单元RP进行数据交互,其中有些行并行处理单元RP还能够与相隔其上下方S行的行并行处理单元RP进行数据交互,这些行并行处理单元RP被称为跳跃行处理单元,除这些跳跃行处理单元之外的行并行处理单元RP被称为普通行处理单元;整个行处理单元阵列中,从第一行开始,每隔S行放置一个跳跃行处理单元,其余各行均放置普通行处理单元;其中S为自然数。In the above solution, each row parallel processing unit RP can perform data interaction with the row parallel processing unit RP above and below it, and some row parallel processing units RP can also communicate with the row parallel processing unit RP separated by S rows above and below it. For data interaction, these row parallel processing units RP are called skip row processing units, and row parallel processing units RP other than these skip row processing units are called ordinary row processing units; in the entire row processing unit array, starting from the first At the beginning of the row, place a jump row processing unit every S rows, and place normal row processing units in the remaining rows; where S is a natural number.
上述方案中,所述跳跃行处理单元能够远距离直接进行数据交互,不需逐个通过所有行并行处理单元RP 10进行数据交互,能够实现快速灵活的行间广域处理。In the above solution, the skipping row processing unit can directly perform data interaction at a long distance without going through all row parallel
上述方案中,所述行并行处理单元RP包括:一个k-bit缓冲移位寄存器41,用于实现与所述M×M像素级并行处理单元阵列7的串并/并串数据转换,并作为阵列外部片上总线对所述M×1行处理单元阵列9的数据访问接口,同时可被其所属RP单元的寄存器文件的读出数据所更新;一个k-bit第一操作数选择器42,用于根据所述处理阵列控制模块11输出的控制指令从本单元或邻近行处理单元的寄存器文件输出、本单元缓冲移位寄存器的输出中选择一个作为所述k-bit算术运算单元44的第一操作数;一个k-bit第二操作数选择器43,用于根据所述处理阵列控制模块11输出的控制指令从本单元临时寄存器输出或来自阵列控制指令的立即数中选择一个作为所述k-bit算术运算单元44的第二操作数;一个k-bit算术运算单元44,用于执行广域处理和非线性处理,该广域处理包括k-bit加法、减法、求绝对值、数据移位和大小比较;一个条件选择器45,用于根据所述处理阵列控制模块11输出的控制指令从本单元所在行的像素级并行处理单元PE 8输出的1bit数据、来自k-bit算术运算单元44的条件标志寄存器以及1bit常数1中选择一个作为条件运算使能信号,该信号将使能所述k-bit三态缓冲门46;一个k-bit三态缓冲门46,用于接收k-bit算术运算单元44的输出结果,在条件选择器45所输出条件使能信号的控制下决定是否将本次操作的数据写入k-bit临时寄存器47或k-bit位宽的寄存器文件48,以实现条件运算;以及一个k-bit临时寄存器47和一个k-bit位宽的寄存器文件48。In the above scheme, the row parallel processing unit RP includes: a k-bit
上述方案中,所述k-bit缓冲移位寄存器41能够在阵列控制指令下按比特进行左右移位,以实现与所述M×M像素级并行处理单元阵列7的串并/并串数据转换;还能够在阵列外部信号控制下,与所述行并行处理单元RP 10上下方单元中的缓冲移位寄存器所有比特并行上下移位,以实现阵列外部片上总线对所述M×1行处理单元阵列9的数据访问;该k-bit缓冲移位寄存器41的输出作为k-bit第一操作数选择器42的输入之一,其值也能被寄存器文件的读出数据所更新。In the above solution, the k-bit
上述方案中,所述k-bit第一操作数选择器42在根据控制指令从本单元或邻近行处理单元的寄存器文件输出、本单元缓冲移位寄存器的输出中选择时,如果本单元为跳跃行处理单元,则其选择范围还包括与其相隔S行的跳跃行处理单元。In the above scheme, when the k-bit
上述方案中,所述k-bit算术运算单元44还根据每次运算结果更新其内部的“进位/借位”以及“结果为零”标志寄存器,便于大于k-bit的数据运算以及条件运算;其标志寄存器能够被处理阵列控制模块输出的控制指令清零。In the above scheme, the k-bit arithmetic operation unit 44 also updates its internal "carry/borrow" and "result is zero" flag registers according to the result of each operation, so as to facilitate data operations and conditional operations greater than k-bit; Its flag register can be cleared by processing the control command output by the array control module.
上述方案中,所述k-bit位宽的寄存器文件48为数据位宽k-bit、支持同时读写的小容量随机存储器或寄存器堆,其读写地址来自所述处理阵列控制模块11输出的控制指令,其写入数据来自k-bit三态缓冲门46的输出,其读出数据作为本单元或邻近行处理单元的k-bit第一操作数选择器42的输入之一;如果本单元为跳跃行处理单元,则还包括与其相隔S行的跳跃行处理单元。In the above scheme, the
上述方案中,所述处理阵列控制模块11输出的控制指令用于选择将所述k-bit算术运算单元44的每次输出结果数据写入到k-bit临时寄存器47或k-bit位宽的寄存器文件48,当所述k-bit三态缓冲门46被使能时必须且只能写入其中之一。In the above scheme, the control instruction output by the processing array control module 11 is used to select to write each output result data of the k-bit arithmetic operation unit 44 into the k-bit temporary register 47 or the k-bit wide Register files 48, when the k-bit
上述方案中,所述条件选择器45能够直接来自像素级并行处理单元PE8的1bit数据作为条件使能信号,不需经过基于所述k-bit缓冲移位寄存器41的串并转换,有利于实现灵活快速的行内广域处理。In the above scheme, the condition selector 45 can directly come from the 1-bit data of the pixel-level parallel processing unit PE8 as a condition enable signal, without going through the serial-to-parallel conversion based on the k-bit
上述方案中,当所述M×1行处理单元阵列9完成较复杂的算法而寄存器文件的存储空间不够时,能够将数据通过所述k-bit缓冲移位寄存器41存入所述M×M像素级并行处理单元阵列7中;当所述M×1行处理单元阵列9所有操作完成时,能够将结果数据写入所述k-bit缓冲移位寄存器41,再由阵列外部片上总线16读走。In the above scheme, when the M×1 line processing unit array 9 completes a relatively complex algorithm and the storage space of the register file is not enough, the data can be stored in the M×M through the k-bit
上述方案中,所述处理阵列控制模块11从变长SIMD存储器内部读取指令片段的位置由片上总线16动态配置,且当该段指令执行完成后生成完成标志报告给片上总线16。In the above solution, the location where the processing array control module 11 reads the instruction segment from the variable-length SIMD memory is dynamically configured by the on-
上述方案中,为了既支持所述M×M像素级并行处理单元阵列7和所述M×1行处理单元阵列9的协同操作,又减少所需片上指令存储空间,该视觉图像处理系统采取变长SIMD指令机制,其中变长SIMD指令存储器每个地址上都存储了一条2L-bit指令字,根据指令字头能够区分这是一条控制所述M×M像素级并行处理单元阵列7和所述M×1行处理单元阵列9协同工作的2L-bit超长SIMD指令,还是控制所述M×M像素级并行处理单元阵列7和所述M×1行处理单元阵列9单独工作的两条L-bit普通SIMD指令;所述处理阵列控制模块11内嵌有变长SIMD指令的调度和译码功能单元。In the above solution, in order to support the coordinated operation of the M×M pixel-level parallel processing unit array 7 and the M×1 row processing unit array 9, and reduce the required on-chip instruction storage space, the visual image processing system adopts variable Long SIMD instruction mechanism, in which a 2L-bit instruction word is stored in each address of the variable-length SIMD instruction memory, which can be distinguished according to the instruction word header. The 2L-bit ultra-long SIMD instruction that M×1 row processing unit array 9 works together, or the two L that control the M×M pixel-level parallel processing unit array 7 and the M×1 row processing unit array 9 work independently -bit ordinary SIMD instruction; the processing array control module 11 is embedded with a scheduling and decoding functional unit for variable-length SIMD instructions.
上述方案中,所述片上可配置人工神经网络12包括:输入神经元向量寄存器组51,包括T1个输入神经元寄存器,其中每个输入神经元寄存器用于存储J1比特定点数据,其中T1<<M;神经元广播器52,用于接受所述输入神经元向量寄存器组51的数据,并每次选择其中一个广播到并行运算单元阵列53,作为并行运算单元阵列53中各个并行运算单元的操作数之一;并行运算单元阵列53,包含T2个并行运算单元,T2≤T1,每个并行运算单元接受所述神经元广播器52广播的输入神经元作为第一个操作数,同时分别接收权重/阈值存储器55每个地址上的T2个权重/阈值数据作为第二个操作数,其中权重/阈值为J比特定点数据,J>J1;输出神经元向量寄存器组54,包括T2个输出神经元寄存器,其中每个输出神经元寄存器存储J2比特定点数据;权重/阈值存储器55,其中存有运算过程所需的权重和阈值数据,每个地址上有T2个J比特定点数据;神经网络控制模块56,用于根据配置的参数信息控制整个片上可配置人工神经网络12的并行运算过程,片上可配置人工神经网络12正常工作时存储器地址由神经网络控制模块56给出;总线读写接口57,用于片上可配置人工神经网络12中的输入神经元向量寄存器组51、输出神经元向量寄存器组54、权重/阈值存储器55中的数据被外部写入和读出;并行运算单元中的分段线性映射单元的映射函数和神经网络控制模块56的控制参数也由该总线读写接口57灵活配置。In the above solution, the on-chip configurable artificial neural network 12 includes: an input neuron vector register group 51, including T1 input neuron registers, wherein each input neuron register is used to store J1 specific point data, where T1<< M; neuron broadcaster 52, used to accept the data of the input neuron vector register group 51, and select one of them to broadcast to the parallel operation unit array 53 at a time, as the operation of each parallel operation unit in the parallel operation unit array 53 One of the numbers; the parallel computing unit array 53 includes T2 parallel computing units, T2≤T1, each parallel computing unit accepts the input neuron broadcast by the neuron broadcaster 52 as the first operand, and simultaneously receives the weights respectively T2 weight/threshold data on each address of the threshold memory 55 is used as the second operand, wherein the weight/threshold is J ratio specific point data, J>J1; output neuron vector register group 54, including T2 output neurons Registers, wherein each output neuron register stores J2 ratio specific point data; weight/threshold memory 55, wherein there are weights and threshold data required for the operation process, T2 J ratio specific point data are arranged on each address; neural network control module 56, used to control the parallel operation process of the entire on-chip configurable artificial neural network 12 according to the configured parameter information, and the memory address of the on-chip configurable artificial neural network 12 is given by the neural network control module 56 when the on-chip configurable artificial neural network 12 works normally; the bus read-write interface 57, The data in the input neuron vector register group 51, the output neuron vector register group 54, and the weight/threshold memory 55 in the on-chip configurable artificial neural network 12 are externally written and read; the segmentation in the parallel operation unit The mapping function of the linear mapping unit and the control parameters of the neural network control module 56 are also flexibly configured by the bus read-write interface 57 .
上述方案中,所述每个并行运算单元包括定点乘法器、累加寄存器和分段线性映射单元,其中,所述定点乘法器和所述累加寄存器用于完成输入神经元数据与相应权重因子/阈值的乘累加运算,所述累加寄存器能够被神经网络控制模块清零,所述分段线性映射单元用于实现激活转移函数,其输出用于更新所述输出神经元向量寄存器组54。In the above solution, each of the parallel operation units includes a fixed-point multiplier, an accumulation register and a piecewise linear mapping unit, wherein the fixed-point multiplier and the accumulation register are used to complete the input neuron data and the corresponding weight factor/threshold The multiplication and accumulation operation, the accumulation register can be cleared by the neural network control module, the piecewise linear mapping unit is used to implement the activation transfer function, and its output is used to update the output neuron vector register set 54 .
上述方案中,在所述神经网络控制模块56的控制下,所述神经元广播器52每次广播一个输入神经元到所述并行运算单元阵列53,同时从所述权重/阈值存储器55中取出与被广播的输入神经元对应的权重/阈值数据到所述并行运算单元阵列53,经过各个并行运算单元的乘法器相乘后累加到累加寄存器,全部完成后再并行实施分段线性映射,将最终结果归一化为T2比特后送入所述输出神经元向量寄存器组54。In the above scheme, under the control of the neural
上述方案中,所述写入权重/阈值存储器55的数据和配置并行运算单元及神经网络控制模块56的数据是根据对神经网络的训练结果得到的,训练过程是在精简指令处理器双核子系统13或者系统外部通用处理器上实现。In the above scheme, the data written into the weight/
上述方案中,所述片上可配置人工神经网络12支持最大T1个输入神经元,最大T2个输出神经元,且T2≤T1,当输入神经元数目小于T1、或输出神经元数目小于T2时,剩余的输入神经元寄存器、输出神经元寄存器和权重/阈值存储器中对应的数据将被自动置为0。In the above solution, the on-chip configurable artificial
上述方案中,所述输出神经元寄存器的数据由片上总线16读出,并再次输入到输入神经元寄存器,实现多层神经网络的计算。In the above solution, the data of the output neuron register is read out by the on-
上述方案中,所述精简指令处理器双核子系统13包括1号精简指令处理器核(RISC#1)、1号RISC私有程序/数据存储器、2号精简指令处理器核(RISC#2)、2号RISC私有程序/数据存储器、处理器核间通信信箱和处理器仲裁器,其中:该精简指令处理器双核子系统13的1号精简指令处理器核(RISC#1)和2号精简指令处理器核(RISC#2)分别具有P比特数据位宽的私有程序/数据存储器,以实现线程级并行的处理,用于负责高级图像处理中除正常特征识别以外的不规则处理以及对整个系统的控制。In the above scheme, the RISC dual-
上述方案中,所述1号精简指令处理器核(RISC#1)和2号精简指令处理器核(RISC#2)之间利用所述处理器核间通信信箱进行通信以实现必要的线程同步和数据交换;所述1号精简指令处理器核(RISC#1)和2号精简指令处理器核(RISC#2)对片上总线的访问权通过所述处理器仲裁器控制,该处理器仲裁器在硬件上支持固定优先级和先来先服务两种仲裁方式;所述处理器核间通信信箱为同步双向FIFO。In the above scheme, the No. 1
上述方案中,所述精简指令处理器双核子系统13还根据所述M×M像素级并行处理单元阵列7和所述M×1行处理单元阵列9进行处理所获得的宏观图像信息或感兴趣目标范围动态调整所述图像传感器控制模块6的参数寄存器中的数据,以自适应不断变化的应用环境,以及满足本系统或目标在环境中的相对运动所带来的多分辨率处理需求。In the above scheme, the RISC dual-
上述方案中,所述随机/顺序混合I/O存储器14为一双端口存储器,其中一个端口为P比特位宽,可由片上总线进行随机读写访问,另一端口为PS(PS<P)比特位宽,由片外器件进行顺序读写访问,且读写相互独立;片外进行顺序读写时的使能信号能被该存储器内嵌的地址生成模块自动映射成该存储器的物理地址;该物理地址能被外部重定向清零。In the above scheme, the random/sequence mixed I/
上述方案中,所述系统线程标志15为W比特寄存器,其中某些比特由系统内部的片上总线16负责控制写入,而另外一些比特则由系统外部器件负责控制写入;系统内外均可读标志寄存器的所有比特。In the above scheme, the
上述方案中,所述片上总线16将来自所述精简指令处理器双核子系统13的读写控制信号和逻辑地址信息映射到其他各个总线从器件模块所需的选通使能信号和物理地址信息时,所述器件模块包括图像传感器控制模块、处理阵列控制模块、片上人工神经网络、随机/顺序混合I/O存储器、以及系统线程标志。In the above scheme, the on-
上述方案中,在该视觉图像处理系统中,由图像传感器获得的数字像素数据以行并行方式载入到所述M×M像素级并行处理单元阵列7中,在所述M×M像素级并行处理单元阵列7和所述M×1行处理单元阵列9的协同配合下灵活完成各种低、中级图像处理,提取出图像特征送入片上人工神经网络12进行特征识别,以及还由精简指令处理器双核子系统13做进一步分析处理,得到最终所需的少量结果数据并输出。In the above solution, in the visual image processing system, the digital pixel data obtained by the image sensor is loaded into the M×M pixel-level parallel processing unit array 7 in a row-parallel manner, and the M×M pixel-level parallel processing unit array 7 is Under the cooperation of the processing unit array 7 and the M×1 row processing unit array 9, various low-level and intermediate-level image processing can be flexibly completed, and image features are extracted and sent to the on-chip artificial
(三)有益效果(3) Beneficial effects
从上述技术方案可以看出,本发明具有以下有益效果:As can be seen from the foregoing technical solutions, the present invention has the following beneficial effects:
1、本发明提供的基于可编程视觉芯片的视觉图像处理系统,像素单元阵列和处理电路分离,彻底解决了传统视觉芯片面积随分辨率迅速增长、分辨率和填充率过低限制原始图像质量的难题,而且处理单元PE和像素单元灵活的对应关系有助于实现灵活的多分辨率处理。1. The visual image processing system based on the programmable visual chip provided by the present invention, the separation of the pixel unit array and the processing circuit completely solves the problem that the area of the traditional visual chip increases rapidly with the resolution, and the resolution and filling rate are too low to limit the quality of the original image. difficult problem, and the flexible correspondence between the processing unit PE and the pixel unit helps to realize flexible multi-resolution processing.
2、本发明提供的基于可编程视觉芯片的视觉图像处理系统,引入基于多级并行数字处理电路以及片上人工神经网络的体系架构,能够通过编程高速、灵活的完成各种低、中、高级图像处理,真正实现了单芯片片上视觉图像处理系统,丰富和扩展了视觉芯片在各种对体积、功耗、成本有严格限制的嵌入式场合的应用。2. The visual image processing system based on the programmable visual chip provided by the present invention introduces an architecture based on multi-level parallel digital processing circuits and on-chip artificial neural networks, and can quickly and flexibly complete various low, medium and high-level images through programming Processing, truly realize the single-chip on-chip visual image processing system, enrich and expand the application of visual chips in various embedded occasions with strict restrictions on volume, power consumption and cost.
3、本发明提供的基于可编程视觉芯片的视觉图像处理系统,带有跳跃行处理单元的可编程行处理单元阵列能够实现灵活、快速的广域处理功能,加快了特征提取的速度。3. In the visual image processing system based on the programmable visual chip provided by the present invention, the programmable row processing unit array with skip row processing unit can realize flexible and fast wide-area processing function, and accelerate the speed of feature extraction.
4、本发明提供的基于可编程视觉芯片的视觉图像处理系统,具有高速、灵活的视觉图像实时处理能力,处理速度可以超过1000帧/秒。4. The visual image processing system based on the programmable visual chip provided by the present invention has high-speed, flexible real-time visual image processing capability, and the processing speed can exceed 1000 frames per second.
附图说明 Description of drawings
图1是本发明提供的基于可编程视觉芯片的视觉图像处理系统的结构示意图;Fig. 1 is the structural representation of the visual image processing system based on programmable vision chip provided by the present invention;
图2是图1中图像传感器阵列中连续一行的电路图,包括一行滚动曝光的高速四管像素单元与其后续的行并行模拟处理单元(包括相关双采样电路和可控增益放大电路)及基于循环冗余机制的模数转换电路单元;Fig. 2 is a circuit diagram of a continuous row in the image sensor array in Fig. 1, including a row of rolling exposure high-speed four-tube pixel unit and its subsequent row parallel analog processing unit (including correlated double sampling circuit and controllable gain amplifier circuit) and based on cyclic redundancy The analog-to-digital conversion circuit unit of the redundant mechanism;
图3是图1中像素级并行处理单元阵列中的处理单元PE的电路结构图;Fig. 3 is a circuit structure diagram of a processing unit PE in the pixel-level parallel processing unit array in Fig. 1;
图4是图1中行并行处理单元阵列中的行处理单元RP的电路结构图;Fig. 4 is a circuit structure diagram of the row processing unit RP in the row parallel processing unit array in Fig. 1;
图5是图1中片上可配置人工神经网络的电路结构图;Fig. 5 is a circuit structure diagram of an on-chip configurable artificial neural network in Fig. 1;
图6是基于图1中视觉图像处理系统的1000帧/秒高速目标追踪算法流程图;Fig. 6 is the 1000 frame/second high-speed target tracking algorithm flow chart based on the visual image processing system in Fig. 1;
图7是基于图1中视觉图像处理系统的高速手势识别算法流程图;Fig. 7 is a high-speed gesture recognition algorithm flow chart based on the visual image processing system in Fig. 1;
图8是图7中算法所需识别的四类手势的二值化图像;Fig. 8 is the binarized image of the four types of gestures required to be recognized by the algorithm in Fig. 7;
图9是基于图1中视觉图像处理系统的快速人脸检测的算法示意图。FIG. 9 is a schematic diagram of an algorithm for fast face detection based on the visual image processing system in FIG. 1 .
具体实施方式 Detailed ways
为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明进一步详细说明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be described in further detail below in conjunction with specific embodiments and with reference to the accompanying drawings.
如图1所示,图1是本发明提供的基于可编程视觉芯片的视觉图像处理系统的结构示意图,该系统包括图像传感模块和多级并行数字图像处理模块两部分。其中图像传感器模块包括N×N像素阵列,N×1行并行模拟预处理阵列,N×1行并行模数转换单元(ADC)阵列、输出像素选择模块和图像传感器控制模块,在本实施例中,N=256,可以达到机器视觉等应用中的标准分辨率。多级并行数字图像处理模块包括M×M像素级并行处理单元(PE)阵列、M×1行处理单元(RP)阵列、片上可配置人工神经网络、精简指令处理器双核子系统、随机/顺序混合I/O存储器14、系统线程标志和高速片上总线。在本实施例中,M=64,配合输出像素选择模块和图像传感器控制模块对感兴趣像素阵列区域和分辨率的灵活选择,可以在较小的芯片面积内实现多分辨率处理。As shown in Figure 1, Figure 1 is a schematic structural diagram of a visual image processing system based on a programmable visual chip provided by the present invention. The system includes two parts: an image sensing module and a multi-stage parallel digital image processing module. Wherein the image sensor module includes an N×N pixel array, an N×1 parallel analog preprocessing array, an N×1 parallel analog-to-digital conversion unit (ADC) array, an output pixel selection module and an image sensor control module, in this embodiment , N=256, which can reach the standard resolution in applications such as machine vision. Multi-level parallel digital image processing module includes M×M pixel-level parallel processing unit (PE) array, M×1 line processing unit (RP) array, on-chip configurable artificial neural network, reduced instruction processor dual-core subsystem, random/sequential Mixed I/
如图2所示是本实施例中一行像素单元极其相应行的模拟预处理单元和模数转换单元(ADC),其中像素单元采用了标准的四管像素结构,配合模拟预处理单元中的相关双采样电路CDS可消除复位噪声和固定模式噪声,另外模拟预处理单元中的可控增益放大电路PGA可以灵活改变等效反馈电容的大小,以此实现不同的增益,提高图像的动态范围和对比度,最后该模拟图像数据经过小面积循环冗余数模转换ADC单元转换为数字信号,经过输出像素选择模块输入到像素级并行处理单元(PE)阵列开始进行处理。As shown in Figure 2, the analog preprocessing unit and analog-to-digital conversion unit (ADC) of a row of pixel units and corresponding rows in this embodiment, wherein the pixel unit adopts a standard four-tube pixel structure, cooperates with the correlation in the analog preprocessing unit The double-sampling circuit CDS can eliminate reset noise and fixed pattern noise. In addition, the controllable gain amplifier circuit PGA in the analog preprocessing unit can flexibly change the size of the equivalent feedback capacitor to achieve different gains and improve the dynamic range and contrast of the image. , and finally the analog image data is converted into a digital signal by a small-area cyclic redundant digital-to-analog conversion ADC unit, and input to a pixel-level parallel processing unit (PE) array through an output pixel selection module for processing.
图像传感器模块可以通过外部片上总线动态的配置输出像素选择模块和图像传感器控制模块,以实现对像素阵列灵活的行列选择,即在不同的亚采样分辨率下选择不同的感兴趣区域,输入到PE阵列极其后续处理模块进行处理,实现多分辨率处理。另外图像传感器模块还可以通过外部片上总线动态的配置图像传感器控制模块的参数寄存器,以实现不同的积分曝光时间、不同的帧率和不同的PGA增益等等,实时根据应用环境和算法的需要调整图像传感器的工作方式。The image sensor module can dynamically configure the output pixel selection module and image sensor control module through the external on-chip bus to achieve flexible row and column selection of the pixel array, that is, to select different regions of interest under different sub-sampling resolutions, and input them to the PE The array and its subsequent processing modules are processed to realize multi-resolution processing. In addition, the image sensor module can also dynamically configure the parameter registers of the image sensor control module through the external on-chip bus to achieve different integral exposure times, different frame rates and different PGA gains, etc., and adjust in real time according to the needs of the application environment and algorithms How image sensors work.
如图3所示是本实施例中像素级并行处理单元PE的具体电路结构图。每个PE单元包括第一操作数选择器、第二操作数选择器、1bit算术逻辑运算单元(ALU)、1bit临时数据寄存器和位平面随机存储器。第一操作数选择器用于选择来自本单元或是邻近(上下左右)四个PE单元中位平面存储器的输出数据作为本单元1bit ALU的第一个操作数,这样就实现了邻近PE单元之间的数据交互,而第二操作数用于选择本单元1bit临时寄存器中的数据或是常数0、1作为本单元1bit ALU的第二个操作数。单元中1bit ALU可以完成1bit的与、或、非逻辑运算和1bit加法运算,多比特的加法、减法、乘法等较复杂的算术运算可以通过分解为多次串行1bit加法来实现,其中ALU中的进位寄存器用于寄存进位标志。位平面存储器的输入输出数据都为1bit,多比特灰度图像数据按位(bit)存储在该存储器中,因此占据多个地址。在本实施例中,该位平面存储器容量为64bit,可以满足绝大多应用中低、中级图像处理的数据存储需求。PE单元ALU处理的结果可以存入临时寄存器或该位平面存储器中。FIG. 3 is a specific circuit structure diagram of the pixel-level parallel processing unit PE in this embodiment. Each PE unit includes a first operand selector, a second operand selector, a 1-bit arithmetic logic operation unit (ALU), a 1-bit temporary data register, and a bit-plane random access memory. The first operand selector is used to select the output data from the bit plane memory in this unit or the four adjacent (upper, lower, left, right) PE units as the first operand of the 1bit ALU of this unit, thus realizing the connection between adjacent PE units The data interaction, and the second operand is used to select the data in the 1bit temporary register of the unit or the constant 0, 1 as the second operand of the 1bit ALU of the unit. The 1-bit ALU in the unit can complete 1-bit AND, OR, non-logic operations, and 1-bit addition operations. More complex arithmetic operations such as multi-bit addition, subtraction, and multiplication can be realized by decomposing into multiple serial 1-bit additions. Among them, the ALU The carry register is used to store the carry flag. The input and output data of the bit plane memory are both 1 bit, and the multi-bit grayscale image data is stored in the memory bit by bit, thus occupying multiple addresses. In this embodiment, the bit-plane memory has a capacity of 64 bits, which can meet the data storage requirements of low-level and medium-level image processing in most applications. The results of PE unit ALU processing can be stored in temporary registers or in the bit plane memory.
PE阵列的指令存储在处理阵列控制模块中,它在处理阵列控制模块的控制下以单指令多数据(SIMD)模式并行工作,每个时钟周期可执行一条指令。The instructions of the PE array are stored in the processing array control module, and it works in parallel under the control of the processing array control module in the single instruction multiple data (SIMD) mode, and each clock cycle can execute one instruction.
像素级并行处理单元(PE)阵列主要用于低、中级图像处理中的局域线性运算,包括背景减除、线性灰度变换、平滑滤波、边缘检测、阈值分割、二值形态学,等等,这些运算都可以像素级并行高速完成,相比串行处理的加速比为O(M×M)。The pixel-level parallel processing unit (PE) array is mainly used for local linear operations in low-level and mid-level image processing, including background subtraction, linear grayscale transformation, smoothing filtering, edge detection, threshold segmentation, binary morphology, etc. , these operations can be completed in parallel at pixel level at high speed, and the acceleration ratio compared to serial processing is O(M×M).
如图4所示为本实施例中行并行处理单元RP的具体电路结构图,。每个RP单元包括一个k-bit缓冲移位寄存器、一个k-bit第一操作数选择器、一个k-bit第二操作数选择器、一个k-bit算术运算单元、一个条件选择器、一个k-bit三态缓冲门、一个k-bit临时寄存器和一个k-bit位宽的寄存器文件。在本实施例中,k=8,这是因为灰度图像数据一般为8bit,因此k=8可以实现较好的性能-面积平衡。RP单元中的各个缓冲移位寄存器组成了移位寄存器阵列,可以支持字并位串的左右移位以实现各个RP单元和相同行PE单元的数据交互,也可以支持字串位并的上下移位以实现阵列外部的总线与RP阵列的数据读写交互。FIG. 4 is a specific circuit structure diagram of the row parallel processing unit RP in this embodiment. Each RP unit includes a k-bit buffer shift register, a k-bit first operand selector, a k-bit second operand selector, a k-bit arithmetic operation unit, a condition selector, a A k-bit tri-state buffer gate, a k-bit temporary register and a k-bit wide register file. In this embodiment, k=8, because the grayscale image data is generally 8 bits, so k=8 can achieve a better performance-area balance. Each buffer shift register in the RP unit forms a shift register array, which can support the left and right shift of the word parallel bit string to realize the data interaction between each RP unit and the PE unit of the same row, and can also support the up and down shift of the word string bit string Bits to realize the data read and write interaction between the bus outside the array and the RP array.
RP单元分为跳跃RP单元和普通RP单元,每种RP单元都可以与邻近(上下)RP单元通过第一操作数选择器对数据输入的选择来实现交互,但跳跃RP单元的第一操作数选择器还可以选择与本单元相隔S行的RP单元数据来实现“跳跃”交互,这些跳跃RP单元在RP阵列中每隔S行放置一个,所有的RP单元组成了跳跃链,以加速某些统计类广域操作。在本实施例中S=8,因为可以通过简单的理论推导得出S的最佳值为M的平方根。RP单元ALU的第二操作数来自本单元临时寄存器或是RP阵列指令中的立即数域,其ALU可以通过硬件完成8bit数据的求最大/最小值、加法/减法、数据移位和求绝对值,并生成标志位以利于下一周期的条件操作。条件选择器选择该ALU的标志位或者来自该行PE单元的1bit数据(通常为二值图像数据或某些标志数据)控制三态缓冲门以实现条件写入,这样就实现了RP单元的条件操作。临时寄存器和寄存器文件用于存储RP单元处理过程中的数据。由于RP阵列的输出一般不再是图像数据,而是图像中的某些特征,因此所需存储量较小,当RP单元确实需要较大存储容量时,可以通过缓冲寄存器的串并转换和本行PE单元共享位平面存储器的存储空间。在本实施例中,RP单元寄存器文件的存储容量为8bit×16。RP units are divided into jumping RP units and ordinary RP units. Each type of RP unit can interact with adjacent (upper and lower) RP units through the selection of data input by the first operand selector, but the first operand of the jumping RP unit The selector can also select RP unit data that is separated from this unit by S rows to realize "jumping" interaction. These jumping RP units are placed every S rows in the RP array, and all RP units form a skipping chain to speed up some Statistical wide-area operations. In this embodiment, S=8, because the optimal value of S can be derived from the square root of M through simple theoretical derivation. The second operand of the RP unit ALU comes from the temporary register of the unit or the immediate field in the RP array instruction, and its ALU can complete the maximum/minimum value, addition/subtraction, data shift and absolute value calculation of 8bit data through hardware , and generate a flag bit to facilitate the conditional operation of the next cycle. The condition selector selects the flag bit of the ALU or the 1bit data from the row of PE units (usually binary image data or some flag data) to control the tri-state buffer gate to achieve conditional writing, thus realizing the condition of the RP unit operate. Temporary registers and register files are used to store data during RP unit processing. Since the output of the RP array is generally no longer image data, but some features in the image, the required storage capacity is relatively small. The row PE units share the storage space of the bit-plane memory. In this embodiment, the storage capacity of the RP unit register file is 8bit×16.
RP单元主要用于低、中级图像处理中适合以行并行方式完成的广域运算和非线性运算,包括中值滤波、灰度形态学算法、平均灰度计算、形状特征提取(比如面积、周长、目标区域矩形限定框),等等,这些运算的主要步骤可以行并行方式完成,相比串行处理的加速比为O(M)。The RP unit is mainly used for wide-area operations and nonlinear operations that are suitable for row-parallel processing in low- and intermediate-level image processing, including median filtering, gray-scale morphology algorithms, average gray-scale calculations, and shape feature extraction (such as area, circumference, etc.). length, target area rectangular bounding box), etc., the main steps of these operations can be done in parallel, and the speedup ratio is O(M) compared to serial processing.
RP阵列也工作在单指令多数据(SIMD)方式下,其指令同样来自处理阵列控制模块,每个时钟周期可执行一条指令。另外RP阵列操作时经常需要PE阵列配合,为了既能支持PE阵列和RP阵列的协同工作,又能在只需其中一个阵列单独工作时不浪费指令存储空间,采用了一种变长超长SIMD指令字(Variable VLIW SIMD,VVS)机制,可通过指令头区分这是一条控制PE阵列和RP阵列协同工作的2L-bit超长SIMD指令字还是两条连续的控制PE阵列或RP阵列工作的L-bit普通SIMD指令字。处理阵列控制模块在外部总线所写入控制参数的控制下进行取指,并负责对指令的解释和调度。在本实施例中,L=32,片上指令存储空间为8KB,满足绝大多数典型应用低、中级图像处理算法的存储需求。The RP array also works in the single instruction multiple data (SIMD) mode, and its instructions also come from the processing array control module, and each clock cycle can execute one instruction. In addition, the RP array often requires the cooperation of the PE array. In order to support the cooperative work of the PE array and the RP array, and not waste instruction storage space when only one of the arrays is required to work alone, a variable-length super-length SIMD is adopted. Instruction word (Variable VLIW SIMD, VVS) mechanism, through the instruction header, it can be distinguished whether it is a 2L-bit ultra-long SIMD instruction word that controls the work of the PE array and the RP array or two consecutive L that control the work of the PE array or the RP array -bit common SIMD instruction word. The processing array control module fetches instructions under the control of the control parameters written in the external bus, and is responsible for explaining and scheduling instructions. In this embodiment, L=32, and the on-chip instruction storage space is 8KB, which meets the storage requirements of most typical applications of low-level and middle-level image processing algorithms.
如图5所示是本实施例中片上可配置人工神经网络的具体结构。该人工神经网络包括输入J1比特精度的神经元寄存器组、神经元广播器、并行运算单元阵列、J2比特精度的输出神经元寄存器组、J比特精度的权重/阈值存储器和神经网络控制模块。其中每个并行运算单元进一步包括:定点乘法器、累加寄存器和分段线性映射单元。As shown in FIG. 5 is the specific structure of the on-chip configurable artificial neural network in this embodiment. The artificial neural network includes an input neuron register set of J1 bit precision, a neuron broadcaster, a parallel operation unit array, an output neuron register set of J2 bit precision, a weight/threshold memory of J bit precision and a neural network control module. Each parallel operation unit further includes: a fixed-point multiplier, an accumulation register and a segmented linear mapping unit.
由RP阵列提取的图像特征数据被片上总线加载到T1个输入神经元寄存器中,在神经网络控制模块的控制下,神经元广播器依次广播各个输入神经元寄存器的数据到并行运算单元阵列作为操作数之一,而另一个操作数来自权重/阈值存储器,该存储器中每一个地址上存储了T2个权重/阈值数据,分别对应T2个并行运算单元,而存储器的地址则由神经网络控制模块给出。并行运算单元将输入神经元寄存器数据和相应的权重数据相乘后累加到累加寄存器中,当T1个输入神经元寄存器数据均被广播处理之后再减去权重/阈值存储器中的阈值信心,最后输入到分段线性映射单元实现转移函数,其结果就是代表识别结果的输出神经元寄存器的值,并最终通过片上总线读出。人工神经网络是以矢量级并行方式来完成特征识别任务的,相比串行处理可获得约为O(T2)的加速比。The image feature data extracted by the RP array is loaded into T1 input neuron registers by the on-chip bus. Under the control of the neural network control module, the neuron broadcaster broadcasts the data of each input neuron register to the parallel operation unit array in turn as an operation One of the numbers, and the other operand comes from the weight/threshold memory, each address in the memory stores T2 weight/threshold data, corresponding to T2 parallel computing units, and the address of the memory is given by the neural network control module out. The parallel operation unit multiplies the input neuron register data and the corresponding weight data and accumulates them in the accumulation register. After the T1 input neuron register data are broadcast and processed, the threshold confidence in the weight/threshold memory is subtracted, and finally input To the piecewise linear mapping unit to implement the transfer function, the result is the value of the output neuron register representing the recognition result, and finally read out through the on-chip bus. The artificial neural network completes the feature recognition task in a vector-level parallel manner, and can obtain an acceleration ratio of about O(T2) compared with serial processing.
输入神经元的有效个数(代表图像特征维数)可以小于T1,同样,输出神经元的有效个数(代表目标识别的分类数)也可以小于T2;当这两种情形发生时,可以通过配置神经网络控制模块中的参数寄存器来简化运算过程,使得剩余的无效神经元数据并不参与运算以加快处理速度。分段线性映射函数的两个“拐点”也是可配置的。另外,还可以将上一次运算结束后的输出神经元寄存器的数据读出后再反馈作为下一次运算开始前的输入神经元寄存器的数据,这样就可以动态实现任意多层神经网络,以完成复杂的识别任务。总之,该片上人工神经网络具有非常良好的可配置性。The effective number of input neurons (representing the image feature dimension) can be less than T1, and the effective number of output neurons (representing the classification number of target recognition) can also be less than T2; when these two situations occur, it can be passed The parameter register in the neural network control module is configured to simplify the operation process, so that the remaining invalid neuron data does not participate in the operation to speed up the processing speed. The two "inflection points" of the piecewise linear mapping function are also configurable. In addition, the data of the output neuron register after the last operation can be read out and fed back as the data of the input neuron register before the next operation, so that any multi-layer neural network can be dynamically realized to complete complex recognition task. In conclusion, the on-chip artificial neural network has very good configurability.
该人工神经网络的权重/阈值数据是由训练得到的,由于训练过程并不包含在系统正常工作的处理流程中,不影响系统运行的实时性,而且训练过程本身较复杂不宜用硬件直接实现,因此可以在RISC双核子系统甚至系统外的通用处理器上完成训练过程,训练结束后再将得到的权重和阈值数据下载到人工神经网络的权重/阈值存储器中。训练既可以采用有监督学习方式,也可采用无监督学习方式,因此可以实现包括反向传播(BP)神经网络、自组织映射(SOM)神经网络和矢量量化(LVQ)神经网络在内的多种人工神经网络。The weight/threshold data of the artificial neural network is obtained by training. Since the training process is not included in the normal working process of the system, it does not affect the real-time performance of the system operation, and the training process itself is too complicated to be directly implemented by hardware. Therefore, the training process can be completed on the RISC dual-core subsystem or even on the general-purpose processor outside the system. After the training, the obtained weight and threshold data can be downloaded to the weight/threshold memory of the artificial neural network. The training can adopt both supervised learning and unsupervised learning, so multiple neural networks including backpropagation (BP) neural network, self-organizing map (SOM) neural network and vector quantization (LVQ) neural network can be realized. an artificial neural network.
在本实施例中,T1=16,T2=8,J1=8,J2=8,J=12,且权重/阈值存储器的容量为12bit×256,这样的精度和容量配置可以满足大多数应用中目标识别算法的需求,当特征维数高于16时,可以通过多次处理不同的特征子空间来完成识别过程。In this embodiment, T1=16, T2=8, J1=8, J2=8, J=12, and the capacity of the weight/threshold memory is 12bit×256, such precision and capacity configuration can satisfy most applications The requirements of the target recognition algorithm, when the feature dimension is higher than 16, the recognition process can be completed by processing different feature subspaces multiple times.
在本实施例中,图1中的精简指令处理器(RISC)双核子系统主要用于完成高级图像处理中不规则的复杂算法(比如人工神经网络的训练、霍夫变换、主分量分析等)、动态配置以及统一控制系统内其它模块的并行工作。该RISC双核子系统包括两个P比特数据位宽的精简指令处理器(RISC)核及各自的私有程序/数据存储器、处理器核间通信信箱和处理器仲裁器。其中每个RISC都可以独自访问自身私有的程序/数据存储器,但必须通过处理器仲裁器来申请访问片上其它资源,该仲裁器在硬件上支持先来先服务和固定优先级仲裁算法并可灵活配置改变。处理器核间通信信箱实质为一双端口同步FIFO,用于支持双核间的线程同步。该RISC双核子系统具有线程级并行处理能力,相比单核单线程处理可获得一定的加速比,减少复杂高级处理的时间。在本实施例中,P=32而FIFO容量为32bit×16。In this embodiment, the RISC dual-core subsystem in FIG. 1 is mainly used to complete irregular and complex algorithms in advanced image processing (such as artificial neural network training, Hough transform, principal component analysis, etc.) , dynamic configuration and parallel work of other modules in the unified control system. The RISC dual-core subsystem includes two P-bit data width RISC cores and their own private program/data memories, inter-processor core communication mailboxes and processor arbitrators. Each RISC can independently access its own private program/data memory, but must apply for access to other resources on the chip through the processor arbiter, which supports first-come-first-serve and fixed-priority arbitration algorithms in hardware and can be flexibly Configuration changes. The communication mailbox between processor cores is essentially a dual-port synchronous FIFO for supporting thread synchronization between dual cores. The RISC dual-core subsystem has thread-level parallel processing capabilities, which can obtain a certain speed-up ratio compared with single-core single-thread processing, and reduce the time for complex and advanced processing. In this embodiment, P=32 and the FIFO capacity is 32bit×16.
在本实施例中,图1中的随机/顺序混合I/O存储器用于系统内外的数据交互,为一双端口存储器。为了尽量减少系统引脚数量,其中一个端口为P比特位宽,可由片上总线进行随机读写访问,另一端口为PS(PS<P)比特位宽,可由片外器件进行顺序读写访问,且读写相互独立;片外进行顺序读写时的使能信号可被该存储器内嵌的地址生成模块自动映射成该存储器的物理地址;该物理地址可被外部重定向清零。该存储器面向系统内外的两个端口可以工作在不同的时钟频率下,有利于扩展系统的应用范围。在本实施例中,P=32,而PS=8。In this embodiment, the random/sequential mixed I/O memory in FIG. 1 is used for data exchange inside and outside the system, and is a dual-port memory. In order to reduce the number of system pins as much as possible, one of the ports is P bits wide, which can be accessed randomly by the on-chip bus, and the other port is PS (PS<P) bits wide, which can be accessed sequentially by off-chip devices. And reading and writing are independent of each other; the enable signal for sequential reading and writing outside the chip can be automatically mapped to the physical address of the memory by the address generation module embedded in the memory; the physical address can be cleared by external redirection. The two ports of the memory facing the inside and outside of the system can work at different clock frequencies, which is beneficial to expand the application range of the system. In this example, P=32 and PS=8.
在本实施例中,图1中的系统线程标志为W比特寄存器,其中某些比特由系统内部的片上总线负责控制写入,而另外一些比特则由系统外部器件负责控制写入;系统内外均可读标志寄存器的所有比特。该寄存器可作为片系统内外线程交互和同步的标志使用。在本实施例中,W=4,且其中三个比特由系统外部控制写入,一个比特由系统内部控制写入。In the present embodiment, the system thread mark in Fig. 1 is a W bit register, wherein some bits are responsible for controlling writing by the on-chip bus inside the system, and other bits are then being responsible for controlling writing by external devices of the system; both inside and outside the system All bits of the flags register can be read. This register can be used as a flag for the interaction and synchronization of threads inside and outside the chip system. In this embodiment, W=4, and three bits are written by external control of the system, and one bit is written by internal control of the system.
在本实施例中,图1中的片上总线将来自RISC双核子系统主器件的读写控制信号和逻辑地址信息映射到其他各个总线从器件模块(包括图像传感器控制模块,处理阵列控制模块,片上人工神经网络,随机/顺序混合I/O存储器、线程标志)所需的选通使能信号和物理地址信息,以驱动这些从器件模块完成各种操作。在本实施例中,片上总线的数据位宽为32比特,最多支持16个从器件。In this embodiment, the on-chip bus in Fig. 1 maps the read and write control signals and logical address information from the RISC dual-core subsystem main device to other various bus slave modules (including image sensor control module, processing array control module, on-chip Artificial neural network, random/sequential mixed I/O memory, thread flag) required strobe enable signal and physical address information to drive these slave device modules to complete various operations. In this embodiment, the data bit width of the on-chip bus is 32 bits, and supports up to 16 slave devices.
在本实施例中,整个高速片上视觉系统(可编程视觉芯片)的工作流程如下:由图像传感器获得的数字像素数据以行并行方式载入到处理单元PE阵列中,在处理单元PE阵列和行处理单元阵列的协同配合下灵活完成各种低、中级图像处理,提取出图像特征送入片上人工神经网络进行特征识别,有时还需要精简指令处理器双核子系统做进一步分析处理,得到最终所需的少量结果数据并输出到系统外部。In this embodiment, the workflow of the entire high-speed on-chip vision system (programmable vision chip) is as follows: the digital pixel data obtained by the image sensor is loaded into the processing unit PE array in a row-parallel manner, and the processing unit PE array and row With the cooperation of the processing unit array, various low-level and intermediate-level image processing can be flexibly completed, and the extracted image features will be sent to the on-chip artificial neural network for feature recognition. Sometimes it is necessary to simplify the instruction processor dual-core subsystem for further analysis and processing to obtain the final required image. A small amount of result data and output to the outside of the system.
同时,RISC双核子系统还可以根据PE阵列和RP阵列所进行处理获得的宏观图像信息或感兴趣目标范围动态调整图像传感器控制模块的参数寄存器中的数据,以自动适应不断变化的应用环境,以及满足本系统或目标在环境中的相对运动所带来的多分辨率处理需求。At the same time, the RISC dual-core subsystem can also dynamically adjust the data in the parameter register of the image sensor control module according to the macroscopic image information obtained by processing the PE array and the RP array or the target range of interest, so as to automatically adapt to the changing application environment, and Meet the multi-resolution processing requirements brought by the system or the relative motion of the target in the environment.
下面通过在本实施例中所提出的基于可编程视觉芯片的视觉图像处理系统上开发运行的三个典型高速视觉图像处理算法来详细说明本实施例的具体应用。The specific application of this embodiment will be described in detail below through three typical high-speed visual image processing algorithms developed and run on the programmable vision chip-based visual image processing system proposed in this embodiment.
(一)高速目标追踪(1) High-speed target tracking
如图6所示,是基于本实施例视觉图像处理系统的高速目标追踪算法流程。首先利用图像传感器阵列捕获若干帧图像,在PE阵列中按一定规则合成一副背景图像,然后开始正常工作。正常工作时捕获的每一帧图像首先在PE阵列中平滑滤波去噪后减去背景图像,得到一副差分图像,然后利用RP阵列统计该图像的灰度值大致分布,以确定最佳的动态阈值,之后在PE阵列中以该阈值分割差分图像得到一副二值图像,该二值图像就是场景中有明显运动目标的区域。接下来再在PE阵列中利用二值形态学测地变换分割出该二值图像的每一个连通区域,利用RP阵列提取各区域形状特征并在RISC双核子系统中逐一与待追踪目标的特征作比较,在特征空间中欧氏距离或曼哈顿距离最小并且小于某个预先定义的距离时者就可认定为目标特征,据此锁定目标所在的区域和中心坐标,并将这些信息写入I/O存储器输出到片外。最后,将非运动区域的背景按照某种算法模型进行更新,以消除环境缓慢变化对追踪过程的干扰。在该算法中,如果目标和其他运动物体发生碰撞或运动到遮挡物之后,会消失若干帧,此时RISC双核子系统会自动根据目标之前的统计运动来预测输出目标当前所在区域坐标;但是当目标重新出现时,该算法又会立即将其锁定。该算法有较强的适应性和鲁棒性,可以处理复杂动态场景下具有多个不规则高速运动物体情况时的目标追踪。以上所述高速目标追踪算法可以达到1000帧/秒的处理速度。另外,在背景较简单的人工可控环境下,也可以应用专利ZL200510086902.2中妙维等人提出的“自窗捕捉”方法,并在目标追踪开始时手动指定被追踪目标所在的区域。该算法也能达到1000帧/秒的处理速度。As shown in FIG. 6 , it is a high-speed target tracking algorithm flow based on the visual image processing system of this embodiment. First, the image sensor array is used to capture several frames of images, and a background image is synthesized in the PE array according to certain rules, and then it starts to work normally. Each frame of image captured during normal operation is first smoothed and filtered in the PE array to denoise and subtract the background image to obtain a differential image, and then the RP array is used to count the approximate distribution of the gray value of the image to determine the best dynamic Threshold, and then divide the differential image with this threshold in the PE array to obtain a binary image, which is the area with obvious moving objects in the scene. Next, each connected region of the binary image is segmented by binary morphological geodesic transformation in the PE array, the shape features of each region are extracted by the RP array, and the features of the target to be tracked are compared one by one in the RISC dual-core subsystem. Comparison, when the Euclidean distance or Manhattan distance in the feature space is the smallest and less than a certain predefined distance, it can be identified as the target feature, and the area and center coordinates of the target are locked accordingly, and the information is written into the I/O memory output to off-chip. Finally, the background of the non-moving area is updated according to a certain algorithm model to eliminate the interference of the slow change of the environment on the tracking process. In this algorithm, if the target collides with other moving objects or moves to an occluder, several frames will disappear, and the RISC dual-core subsystem will automatically predict the current area coordinates of the output target based on the previous statistical motion of the target; but when When the target reappears, the algorithm immediately locks on to it again. The algorithm has strong adaptability and robustness, and can deal with target tracking when there are multiple irregular high-speed moving objects in complex dynamic scenes. The above-mentioned high-speed target tracking algorithm can reach a processing speed of 1000 frames per second. In addition, in a manually controllable environment with a relatively simple background, the "self-window capture" method proposed by Miaowei et al. in patent ZL200510086902.2 can also be applied, and the area where the tracked target is located can be manually specified when the target tracking starts. The algorithm can also reach a processing speed of 1000 frames per second.
(二)高速手势识别(2) High-speed gesture recognition
如图7所示,是基于本实施例视觉图像处理系统的高速手势识别算法流程。本发明所提出的手势识别算法支持四类手势的识别,主要用于基于自然人机交互的PPT手势控制系统,图8列出了这四类手势的阈值分割后的二值化图像以及相应的控制功能。该手势识别算法中,从背景合成到阈值分割这五步和高速目标追踪算法中的相同,之后在PE阵列中利用二值形态学区域修整算法去除小的杂散区域和填补大块区域中小的孔洞,最后的大块完整区域就是待识别手势所在的区域。之后利用人工神经网络进行识别,人工神经网络必须经过充分的训练才能用于识别,训练时首先提取手势识别区域的归一化致密度特征,即将该区域平均分为若干行和若干列,分别统计每一行和每一列激活像素(即二值图像中值为1的像素)的个数占该区域总面积的比值,这些比值组成一组向量,并且在系统线程标志的监督配合下用于神经网络的学习(即通过外部写线程标志寄存器来指示目前学习的是哪一类手势),学习过程可以在系统内部的RISC双核子系统上完成,也可以在系统外的通用处理器上完成。学习完成之后就是识别过程,注意到待识别手势中的两种特殊情况(即没有待识别区域的“空白”手势和只有一根指头的特殊鼠标移动手势),为了加快特征识别速度,算法采用了基于简单区域特征结合人工神经网络的级联分类器,该分类器首先提取待识别区域的简单特征(比如激活像素总数、形状参数、顶点坐标等)在RISC核上尝试识别出上述特殊手势,若不成功再进一步提取较复杂的完整归一化致密度特征并利用人工神经网络进行统一识别,最后输出识别出的手势类别代码以及手势顶点坐标(顶点坐标仅用于鼠标移动手势)。由于典型应用过程中的大部分时间都是所述两种特殊手势,因此整个处理速度可以得到很大提升,该系统的平均帧率可以达到1000帧以上。高帧率有利于进一步采用RISC核对识别结果进行基于软件的时域低通滤波,抑制环境噪声和手势抖动对识别结果造成的干扰。As shown in FIG. 7 , it is a high-speed gesture recognition algorithm flow based on the visual image processing system of this embodiment. The gesture recognition algorithm proposed by the present invention supports the recognition of four types of gestures, and is mainly used in the PPT gesture control system based on natural human-computer interaction. Figure 8 lists the binary images of these four types of gestures after threshold segmentation and the corresponding control Function. In the gesture recognition algorithm, the five steps from background synthesis to threshold segmentation are the same as those in the high-speed target tracking algorithm, and then the binary morphology area trimming algorithm is used in the PE array to remove small stray areas and fill in small areas in large areas. The hole, the last large complete area is the area where the gesture to be recognized is located. Afterwards, the artificial neural network is used for recognition. The artificial neural network must be fully trained before it can be used for recognition. During training, the normalized density features of the gesture recognition area are first extracted, that is, the area is divided into several rows and columns on average, and statistics are made separately. The ratio of the number of activated pixels in each row and each column (that is, pixels with a value of 1 in the binary image) to the total area of the region, these ratios form a set of vectors, and are used in the neural network under the supervision of the system thread flag Learning (that is, by externally writing the thread flag register to indicate which type of gesture is currently being learned), the learning process can be completed on the RISC dual-core subsystem inside the system, or it can be completed on a general-purpose processor outside the system. After the learning is completed, it is the recognition process. We noticed two special cases in the gestures to be recognized (that is, the "blank" gesture with no area to be recognized and the special mouse movement gesture with only one finger). In order to speed up the feature recognition, the algorithm uses A cascade classifier based on simple region features combined with artificial neural networks, the classifier first extracts simple features of the region to be recognized (such as the total number of activated pixels, shape parameters, vertex coordinates, etc.) and tries to recognize the above-mentioned special gestures on the RISC core. If it is unsuccessful, further extract more complex and complete normalized dense features and use the artificial neural network for unified recognition, and finally output the recognized gesture category code and gesture vertex coordinates (vertex coordinates are only used for mouse movement gestures). Since the above two special gestures spend most of the time in a typical application process, the overall processing speed can be greatly improved, and the average frame rate of the system can reach more than 1000 frames. The high frame rate is conducive to the further use of RISC to check the recognition results and perform software-based time-domain low-pass filtering to suppress the interference caused by environmental noise and gesture jitter on the recognition results.
(三)快速人脸检测(3) Fast face detection
如图9所示,是基于本实施例视觉图像处理系统的快速人脸检测算法流程,该算法可用于特殊场合下的人流量统计。应用本算法时,需要RISC核控制图像传感器每次输出待监测区域的一个64×64分辨率图像。本算法主要采用了Masakazu等人在2003年在IEEE Transactions on NeuralNetworks杂志上发表的An Image Representation Algorithm Compatible WithNeural-Associative-Processor-Based Hardware Recognition Systems一文中提到的PPED特征向量用于人脸检测,PPED特征向量的提取主要分为水平、垂直、正45度和负45度四个方向的5×5模板边缘检测及边缘标志生成,以及按一定规则组合压缩这四个方向边缘标志以形成一个64维的PPED向量这两步,并且在PE阵列和RP阵列上完成,之后利用人工神经网络判断是否是人脸,判断前必须利用标准人脸库中的模板对神经网络进行充分训练。由于特征维数较高,可以划分为特征子空间进行逐一训练和识别,或者在实时性要求较高而正确率不必太高的情况下,将64维的PPED向量进一步压缩为一个16维的向量以提高处理速度。在本实施例中的系统上,采用完整的64维PPED向量,用本算法对每一帧256×256图像中部的256×64区域划分为10个64×64的子区域(因为64×64的子区域之间必须有一定的重叠以尽量减少漏检情况)进行人脸检测所需的处理时间约为18ms,或者说整个系统的帧率可高于50帧/秒,远高于串行处理系统。As shown in FIG. 9 , it is a flow of a fast face detection algorithm based on the visual image processing system of this embodiment, and the algorithm can be used for counting the flow of people in special occasions. When applying this algorithm, it is necessary for the RISC core to control the image sensor to output a 64×64 resolution image of the area to be monitored each time. This algorithm mainly uses the PPED feature vector mentioned in the article An Image Representation Algorithm Compatible With Neural-Associative-Processor-Based Hardware Recognition Systems published by Masakazu et al. in IEEE Transactions on NeuralNetworks in 2003 for face detection. PPED The extraction of feature vectors is mainly divided into 5×5 template edge detection and edge mark generation in four directions: horizontal, vertical, plus 45 degrees and minus 45 degrees, and combining and compressing edge marks in these four directions according to certain rules to form a 64-dimensional The two steps of the PPED vector are completed on the PE array and RP array, and then the artificial neural network is used to judge whether it is a human face. Before the judgment, the neural network must be fully trained with the template in the standard face database. Due to the high feature dimension, it can be divided into feature subspaces for training and recognition one by one, or when the real-time requirements are high and the accuracy rate does not need to be too high, the 64-dimensional PPED vector is further compressed into a 16-dimensional vector to increase processing speed. On the system in this embodiment, the complete 64-dimensional PPED vector is adopted, and the 256×64 region in the middle of each frame of 256×256 image is divided into ten subregions of 64×64 by this algorithm (because the 64×64 There must be a certain overlap between sub-regions to minimize missed detection) The processing time required for face detection is about 18ms, or the frame rate of the entire system can be higher than 50 frames per second, much higher than serial processing system.
以上所述的具体实施例,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施例而已,并不用于限制本发明,凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The specific embodiments described above have further described the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above descriptions are only specific embodiments of the present invention and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.
Claims (41)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201210088420.0A CN102665049B (en) | 2012-03-29 | 2012-03-29 | Programmable visual chip-based visual image processing system |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201210088420.0A CN102665049B (en) | 2012-03-29 | 2012-03-29 | Programmable visual chip-based visual image processing system |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN102665049A true CN102665049A (en) | 2012-09-12 |
| CN102665049B CN102665049B (en) | 2014-09-17 |
Family
ID=46774448
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201210088420.0A Active CN102665049B (en) | 2012-03-29 | 2012-03-29 | Programmable visual chip-based visual image processing system |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN102665049B (en) |
Cited By (71)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103019656A (en) * | 2012-12-04 | 2013-04-03 | 中国科学院半导体研究所 | Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system |
| CN103020890A (en) * | 2012-12-17 | 2013-04-03 | 中国科学院半导体研究所 | Visual processing device based on multi-layer parallel processing |
| CN103542805A (en) * | 2013-10-22 | 2014-01-29 | 中国科学院半导体研究所 | Vision inspection system based on high-speed image sensor and parallel processing |
| WO2014085975A1 (en) * | 2012-12-04 | 2014-06-12 | 中国科学院半导体研究所 | Dynamically reconfigurable multistage parallel single-instruction multi-data array processing system |
| CN105320257A (en) * | 2014-08-04 | 2016-02-10 | 南京理工大学 | Non-touch type remote gesture controller |
| CN105719228A (en) * | 2015-07-29 | 2016-06-29 | 上海磁宇信息科技有限公司 | Camera system and image identification system |
| CN105740946A (en) * | 2015-07-29 | 2016-07-06 | 上海磁宇信息科技有限公司 | Method for realizing neural network calculation by using cell array computing system |
| CN106528172A (en) * | 2016-11-24 | 2017-03-22 | 广州途道信息科技有限公司 | Method for realizing image programming |
| CN106598226A (en) * | 2016-11-16 | 2017-04-26 | 天津大学 | UAV (Unmanned Aerial Vehicle) man-machine interaction method based on binocular vision and deep learning |
| CN106599992A (en) * | 2015-10-08 | 2017-04-26 | 上海兆芯集成电路有限公司 | Neural network unit using processing unit group as recursive neural network for short and long term memory cells for operation |
| CN106716443A (en) * | 2014-09-30 | 2017-05-24 | 高通股份有限公司 | Feature computation in a sensor element array |
| CN107113719A (en) * | 2014-10-08 | 2017-08-29 | 美国亚德诺半导体公司 | Configurable procedure array device |
| CN107133908A (en) * | 2016-02-26 | 2017-09-05 | 谷歌公司 | Compiler for image processor manages memory |
| WO2017181336A1 (en) * | 2016-04-19 | 2017-10-26 | 北京中科寒武纪科技有限公司 | Maxout layer operation apparatus and method |
| CN107409107A (en) * | 2015-04-06 | 2017-11-28 | 索尼公司 | Bus system and communicator |
| CN107408038A (en) * | 2015-02-02 | 2017-11-28 | 优创半导体科技有限公司 | Vector processor configured to operate on variable length vectors using graphics processing instructions |
| CN107423816A (en) * | 2017-03-24 | 2017-12-01 | 中国科学院计算技术研究所 | A kind of more computational accuracy Processing with Neural Network method and systems |
| CN107533667A (en) * | 2015-05-21 | 2018-01-02 | 谷歌公司 | Vector calculation unit in neural network processor |
| CN107578095A (en) * | 2017-09-01 | 2018-01-12 | 中国科学院计算技术研究所 | Neural Network Computing Device and Processor Containing the Computing Device |
| CN107623827A (en) * | 2017-08-15 | 2018-01-23 | 上海集成电路研发中心有限公司 | A kind of intelligent CMOS image sensor chip and its manufacture method |
| CN107680032A (en) * | 2017-08-14 | 2018-02-09 | 西安电子科技大学 | A kind of image gradation data piecemeal storage method |
| CN107836001A (en) * | 2015-06-29 | 2018-03-23 | 微软技术许可有限责任公司 | Convolutional neural networks on hardware accelerator |
| CN107844831A (en) * | 2017-11-10 | 2018-03-27 | 西安电子科技大学 | Purpose Neuro Processor with Digital based on TTA frameworks |
| CN107977662A (en) * | 2017-11-06 | 2018-05-01 | 清华大学深圳研究生院 | A kind of layered calculation method for realizing high speed processing computer visual image |
| CN108255775A (en) * | 2016-12-28 | 2018-07-06 | 上海磁宇信息科技有限公司 | For the cellular array bus broadcast method of cellular array computing system |
| CN108320018A (en) * | 2016-12-23 | 2018-07-24 | 北京中科寒武纪科技有限公司 | A device and method for artificial neural network computing |
| CN108629406A (en) * | 2017-03-24 | 2018-10-09 | 展讯通信(上海)有限公司 | Arithmetic unit for convolutional neural networks |
| CN108984550A (en) * | 2017-05-31 | 2018-12-11 | 西门子公司 | The methods, devices and systems that the signal instructions of data are determined to mark to data |
| CN109409514A (en) * | 2018-11-02 | 2019-03-01 | 广州市百果园信息技术有限公司 | Fixed-point calculation method, apparatus, equipment and the storage medium of convolutional neural networks |
| CN109685209A (en) * | 2018-12-29 | 2019-04-26 | 福州瑞芯微电子股份有限公司 | A kind of device and method for accelerating neural network computing speed |
| CN109902040A (en) * | 2019-02-01 | 2019-06-18 | 京微齐力(北京)科技有限公司 | A kind of System on Chip/SoC of integrated FPGA and artificial intelligence module |
| CN109902835A (en) * | 2019-02-01 | 2019-06-18 | 京微齐力(北京)科技有限公司 | Processing unit is provided with the artificial intelligence module and System on Chip/SoC of general-purpose algorithm unit |
| CN109960673A (en) * | 2017-12-14 | 2019-07-02 | 北京中科寒武纪科技有限公司 | Integrated circuit chip device and Related product |
| CN109978129A (en) * | 2017-12-28 | 2019-07-05 | 北京中科寒武纪科技有限公司 | Dispatching method and relevant apparatus |
| CN110018979A (en) * | 2018-01-09 | 2019-07-16 | 幻视互动(北京)科技有限公司 | It is a kind of based on restructing algorithm collection and accelerate handle mixed reality data flow MR intelligent glasses and method |
| CN110311676A (en) * | 2019-06-24 | 2019-10-08 | 清华大学 | A vision system and data processing method for the Internet of Things using switching current technology |
| CN110336963A (en) * | 2019-06-06 | 2019-10-15 | 上海集成电路研发中心有限公司 | A dynamic image processing system and image processing method |
| CN110520896A (en) * | 2017-02-06 | 2019-11-29 | 比尔贝里有限公司 | Weeding system and method, railway weed killing waggon |
| US10565732B2 (en) | 2015-05-23 | 2020-02-18 | SZ DJI Technology Co., Ltd. | Sensor fusion using inertial and image sensors |
| US10614332B2 (en) | 2016-12-16 | 2020-04-07 | Qualcomm Incorportaed | Light source modulation for iris size adjustment |
| US10708522B2 (en) | 2018-08-10 | 2020-07-07 | International Business Machines Corporation | Image sensor with analog sample and hold circuit control for analog neural networks |
| CN111435977A (en) * | 2019-01-14 | 2020-07-21 | 豪威科技股份有限公司 | Configurable interface alignment buffer between DRAM and logic cells for multi-die image sensor |
| CN111626414A (en) * | 2020-07-30 | 2020-09-04 | 电子科技大学 | Dynamic multi-precision neural network acceleration unit |
| CN111696025A (en) * | 2020-06-11 | 2020-09-22 | 西安电子科技大学 | Image processing device and method based on reconfigurable memory computing technology |
| CN111757038A (en) * | 2020-07-07 | 2020-10-09 | 苏州华兴源创科技股份有限公司 | Pixel data processing method and integrated chip |
| CN112419140A (en) * | 2020-12-02 | 2021-02-26 | 海光信息技术股份有限公司 | Data processing device, data processing method and electronic equipment |
| US10984235B2 (en) | 2016-12-16 | 2021-04-20 | Qualcomm Incorporated | Low power data generation for iris-related detection and authentication |
| CN112954241A (en) * | 2021-02-20 | 2021-06-11 | 南京威派视半导体技术有限公司 | Image data reading system of image sensor and reading and organizing method |
| US11038520B1 (en) | 2020-04-15 | 2021-06-15 | International Business Machines Corporation | Analog-to-digital conversion with reconfigurable function mapping for neural networks activation function acceleration |
| US11068712B2 (en) | 2014-09-30 | 2021-07-20 | Qualcomm Incorporated | Low-power iris scan initialization |
| CN113454984A (en) * | 2018-12-17 | 2021-09-28 | 脸谱科技有限责任公司 | Programmable pixel array |
| CN113794849A (en) * | 2021-11-12 | 2021-12-14 | 深圳比特微电子科技有限公司 | Device and method for synchronizing image data and image acquisition system |
| WO2022001457A1 (en) * | 2020-06-30 | 2022-01-06 | 上海寒武纪信息科技有限公司 | Computing apparatus, chip, board card, electronic device and computing method |
| CN114155562A (en) * | 2022-02-09 | 2022-03-08 | 北京金山数字娱乐科技有限公司 | Gesture recognition method and device |
| CN114187161A (en) * | 2021-12-07 | 2022-03-15 | 浙江大学 | Universal configurable image pipeline processing array architecture |
| CN114697578A (en) * | 2020-12-31 | 2022-07-01 | 清华大学 | Dual-mode image sensor chip based on three-dimensional stacking technology and imaging system |
| CN114693559A (en) * | 2022-04-02 | 2022-07-01 | 深圳创维-Rgb电子有限公司 | Image processing optimization method and device, electronic equipment and readable storage medium |
| CN114979521A (en) * | 2022-04-12 | 2022-08-30 | 昆明物理研究所 | A readout circuit with arbitrary windowing function |
| CN115100016A (en) * | 2015-06-10 | 2022-09-23 | 无比视视觉技术有限公司 | Image processor and method for processing image |
| WO2023050109A1 (en) * | 2021-09-29 | 2023-04-06 | Congying Sui | An imaging method, sensor, 3d shape reconstruction method and system |
| CN115942139A (en) * | 2021-09-29 | 2023-04-07 | 香港物流机械人研究中心有限公司 | Imaging method, sensor, 3D shape reconstruction method and system |
| CN115984964A (en) * | 2022-12-30 | 2023-04-18 | 西安电子科技大学芜湖研究院 | A dynamic gesture recognition method and system based on RISC-V coprocessor |
| CN116546340A (en) * | 2023-07-05 | 2023-08-04 | 华中师范大学 | High-speed CMOS pixel detector |
| CN117130668A (en) * | 2023-10-27 | 2023-11-28 | 南京沁恒微电子股份有限公司 | A processor instruction fetch redirection timing optimization circuit |
| CN117271434A (en) * | 2023-11-15 | 2023-12-22 | 成都维德青云电子有限公司 | On-site programmable system-in-chip |
| CN110300989B (en) * | 2017-05-15 | 2023-12-22 | 谷歌有限责任公司 | Configurable and programmable image processor unit |
| US12034015B2 (en) | 2018-05-25 | 2024-07-09 | Meta Platforms Technologies, Llc | Programmable pixel array |
| US12075175B1 (en) | 2020-09-08 | 2024-08-27 | Meta Platforms Technologies, Llc | Programmable smart sensor with adaptive readout |
| US12108141B2 (en) | 2019-08-05 | 2024-10-01 | Meta Platforms Technologies, Llc | Dynamically programmable image sensor |
| US12244936B2 (en) | 2022-01-26 | 2025-03-04 | Meta Platforms Technologies, Llc | On-sensor image processor utilizing contextual data |
| CN120224818A (en) * | 2025-05-28 | 2025-06-27 | 中国科学院半导体研究所 | Programmable visible light infrared vision chip |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108053361B (en) * | 2017-12-29 | 2021-08-03 | 中国科学院半导体研究所 | Multi-connected vision processor and image processing method using the same |
| FR3149427A1 (en) * | 2023-06-05 | 2024-12-06 | Commissariat A L'energie Atomique Et Aux Energies Alternatives | Smart imager for intensive real-time image analysis |
Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6523018B1 (en) * | 1998-12-29 | 2003-02-18 | International Business Machines Corporation | Neural chip architecture and neural networks incorporated therein |
| CN102004464A (en) * | 2010-12-23 | 2011-04-06 | 合肥工业大学 | Adaline neural network controller (NNC) based on field programmable gate array (FPGA) |
-
2012
- 2012-03-29 CN CN201210088420.0A patent/CN102665049B/en active Active
Patent Citations (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6523018B1 (en) * | 1998-12-29 | 2003-02-18 | International Business Machines Corporation | Neural chip architecture and neural networks incorporated therein |
| CN102004464A (en) * | 2010-12-23 | 2011-04-06 | 合肥工业大学 | Adaline neural network controller (NNC) based on field programmable gate array (FPGA) |
Non-Patent Citations (2)
| Title |
|---|
| YUAN-JIN LI: "A Novel Architecture of Vision Chip for Fast Traffic Lane Detection and FPGA Implementation", 《IEEE 8TH INTERNATIONAL CONFERENCE ON ASIC》, 23 October 2009 (2009-10-23), pages 917 - 920, XP031579141 * |
| 付秋瑜 等: "面向实时视觉芯片的高速CMOS图像传感器", 《光学学报》, vol. 31, no. 8, 31 August 2011 (2011-08-31) * |
Cited By (107)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103019656A (en) * | 2012-12-04 | 2013-04-03 | 中国科学院半导体研究所 | Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system |
| US9449257B2 (en) | 2012-12-04 | 2016-09-20 | Institute Of Semiconductors, Chinese Academy Of Sciences | Dynamically reconstructable multistage parallel single instruction multiple data array processing system |
| WO2014085975A1 (en) * | 2012-12-04 | 2014-06-12 | 中国科学院半导体研究所 | Dynamically reconfigurable multistage parallel single-instruction multi-data array processing system |
| CN103019656B (en) * | 2012-12-04 | 2016-04-27 | 中国科学院半导体研究所 | The multistage parallel single instruction multiple data array processing system of dynamic reconstruct |
| CN103020890B (en) * | 2012-12-17 | 2015-11-04 | 中国科学院半导体研究所 | Vision processing device based on multi-level parallel processing |
| CN103020890A (en) * | 2012-12-17 | 2013-04-03 | 中国科学院半导体研究所 | Visual processing device based on multi-layer parallel processing |
| CN103542805A (en) * | 2013-10-22 | 2014-01-29 | 中国科学院半导体研究所 | Vision inspection system based on high-speed image sensor and parallel processing |
| CN105320257A (en) * | 2014-08-04 | 2016-02-10 | 南京理工大学 | Non-touch type remote gesture controller |
| CN105320257B (en) * | 2014-08-04 | 2018-01-23 | 南京理工大学 | The long-range gesture controller of non-touch |
| US11068712B2 (en) | 2014-09-30 | 2021-07-20 | Qualcomm Incorporated | Low-power iris scan initialization |
| CN106716443B (en) * | 2014-09-30 | 2021-02-26 | 高通股份有限公司 | Feature calculation in an array of sensor elements |
| CN106716443A (en) * | 2014-09-30 | 2017-05-24 | 高通股份有限公司 | Feature computation in a sensor element array |
| CN107113719A (en) * | 2014-10-08 | 2017-08-29 | 美国亚德诺半导体公司 | Configurable procedure array device |
| CN107113719B (en) * | 2014-10-08 | 2020-06-23 | 美国亚德诺半导体公司 | Configurable pre-processing array |
| CN107408038B (en) * | 2015-02-02 | 2021-09-28 | 优创半导体科技有限公司 | Vector processor configured to operate on variable length vectors using graphics processing instructions |
| CN107408038A (en) * | 2015-02-02 | 2017-11-28 | 优创半导体科技有限公司 | Vector processor configured to operate on variable length vectors using graphics processing instructions |
| CN107409107A (en) * | 2015-04-06 | 2017-11-28 | 索尼公司 | Bus system and communicator |
| CN107409107B (en) * | 2015-04-06 | 2021-01-26 | 索尼公司 | Bus system and communication device |
| US11620508B2 (en) | 2015-05-21 | 2023-04-04 | Google Llc | Vector computation unit in a neural network processor |
| US12014272B2 (en) | 2015-05-21 | 2024-06-18 | Google Llc | Vector computation unit in a neural network processor |
| US12277499B2 (en) | 2015-05-21 | 2025-04-15 | Google Llc | Vector computation unit in a neural network processor |
| CN107533667A (en) * | 2015-05-21 | 2018-01-02 | 谷歌公司 | Vector calculation unit in neural network processor |
| CN107533667B (en) * | 2015-05-21 | 2021-07-13 | 谷歌有限责任公司 | Vector Computation Unit in Neural Network Processor |
| US10565732B2 (en) | 2015-05-23 | 2020-02-18 | SZ DJI Technology Co., Ltd. | Sensor fusion using inertial and image sensors |
| CN115100017A (en) * | 2015-06-10 | 2022-09-23 | 无比视视觉技术有限公司 | Image processor and method for processing images |
| CN115100016A (en) * | 2015-06-10 | 2022-09-23 | 无比视视觉技术有限公司 | Image processor and method for processing image |
| US11200486B2 (en) | 2015-06-29 | 2021-12-14 | Microsoft Technology Licensing, Llc | Convolutional neural networks on hardware accelerators |
| CN107836001A (en) * | 2015-06-29 | 2018-03-23 | 微软技术许可有限责任公司 | Convolutional neural networks on hardware accelerator |
| CN105719228B (en) * | 2015-07-29 | 2018-12-18 | 上海磁宇信息科技有限公司 | Camera system and image identification system |
| CN105740946B (en) * | 2015-07-29 | 2019-02-12 | 上海磁宇信息科技有限公司 | A method of applying cell array computing system to realize neural network computing |
| CN105740946A (en) * | 2015-07-29 | 2016-07-06 | 上海磁宇信息科技有限公司 | Method for realizing neural network calculation by using cell array computing system |
| CN105719228A (en) * | 2015-07-29 | 2016-06-29 | 上海磁宇信息科技有限公司 | Camera system and image identification system |
| CN106599992A (en) * | 2015-10-08 | 2017-04-26 | 上海兆芯集成电路有限公司 | Neural network unit using processing unit group as recursive neural network for short and long term memory cells for operation |
| CN106599992B (en) * | 2015-10-08 | 2019-04-09 | 上海兆芯集成电路有限公司 | A neural network unit that operates as a temporal recurrent neural network long short-term memory cell with a group of processing units |
| CN107133908A (en) * | 2016-02-26 | 2017-09-05 | 谷歌公司 | Compiler for image processor manages memory |
| CN107133908B (en) * | 2016-02-26 | 2021-01-12 | 谷歌有限责任公司 | Compiler managed memory for image processor |
| WO2017181336A1 (en) * | 2016-04-19 | 2017-10-26 | 北京中科寒武纪科技有限公司 | Maxout layer operation apparatus and method |
| CN106598226A (en) * | 2016-11-16 | 2017-04-26 | 天津大学 | UAV (Unmanned Aerial Vehicle) man-machine interaction method based on binocular vision and deep learning |
| CN106598226B (en) * | 2016-11-16 | 2019-05-21 | 天津大学 | A kind of unmanned plane man-machine interaction method based on binocular vision and deep learning |
| CN106528172A (en) * | 2016-11-24 | 2017-03-22 | 广州途道信息科技有限公司 | Method for realizing image programming |
| US10614332B2 (en) | 2016-12-16 | 2020-04-07 | Qualcomm Incorportaed | Light source modulation for iris size adjustment |
| US10984235B2 (en) | 2016-12-16 | 2021-04-20 | Qualcomm Incorporated | Low power data generation for iris-related detection and authentication |
| CN108334944A (en) * | 2016-12-23 | 2018-07-27 | 北京中科寒武纪科技有限公司 | A device and method for artificial neural network computing |
| CN108320018A (en) * | 2016-12-23 | 2018-07-24 | 北京中科寒武纪科技有限公司 | A device and method for artificial neural network computing |
| CN108255775A (en) * | 2016-12-28 | 2018-07-06 | 上海磁宇信息科技有限公司 | For the cellular array bus broadcast method of cellular array computing system |
| CN110520896B (en) * | 2017-02-06 | 2023-09-19 | 比尔贝里有限公司 | Weeding system and method, railway weeding vehicle |
| CN110520896A (en) * | 2017-02-06 | 2019-11-29 | 比尔贝里有限公司 | Weeding system and method, railway weed killing waggon |
| CN107423816A (en) * | 2017-03-24 | 2017-12-01 | 中国科学院计算技术研究所 | A kind of more computational accuracy Processing with Neural Network method and systems |
| CN108629406B (en) * | 2017-03-24 | 2020-12-18 | 展讯通信(上海)有限公司 | Arithmetic device for convolutional neural network |
| CN108629406A (en) * | 2017-03-24 | 2018-10-09 | 展讯通信(上海)有限公司 | Arithmetic unit for convolutional neural networks |
| CN110300989B (en) * | 2017-05-15 | 2023-12-22 | 谷歌有限责任公司 | Configurable and programmable image processor unit |
| CN108984550A (en) * | 2017-05-31 | 2018-12-11 | 西门子公司 | The methods, devices and systems that the signal instructions of data are determined to mark to data |
| CN107680032A (en) * | 2017-08-14 | 2018-02-09 | 西安电子科技大学 | A kind of image gradation data piecemeal storage method |
| CN107623827B (en) * | 2017-08-15 | 2020-06-09 | 上海集成电路研发中心有限公司 | A kind of intelligent CMOS image sensor chip and its manufacturing method |
| CN107623827A (en) * | 2017-08-15 | 2018-01-23 | 上海集成电路研发中心有限公司 | A kind of intelligent CMOS image sensor chip and its manufacture method |
| CN107578095B (en) * | 2017-09-01 | 2018-08-10 | 中国科学院计算技术研究所 | Neural computing device and processor comprising the computing device |
| CN107578095A (en) * | 2017-09-01 | 2018-01-12 | 中国科学院计算技术研究所 | Neural Network Computing Device and Processor Containing the Computing Device |
| CN107977662A (en) * | 2017-11-06 | 2018-05-01 | 清华大学深圳研究生院 | A kind of layered calculation method for realizing high speed processing computer visual image |
| CN107977662B (en) * | 2017-11-06 | 2020-12-11 | 清华大学深圳研究生院 | Layered calculation method for realizing high-speed processing of computer visual image |
| CN107844831A (en) * | 2017-11-10 | 2018-03-27 | 西安电子科技大学 | Purpose Neuro Processor with Digital based on TTA frameworks |
| CN109960673A (en) * | 2017-12-14 | 2019-07-02 | 北京中科寒武纪科技有限公司 | Integrated circuit chip device and Related product |
| CN109978129A (en) * | 2017-12-28 | 2019-07-05 | 北京中科寒武纪科技有限公司 | Dispatching method and relevant apparatus |
| CN110018979A (en) * | 2018-01-09 | 2019-07-16 | 幻视互动(北京)科技有限公司 | It is a kind of based on restructing algorithm collection and accelerate handle mixed reality data flow MR intelligent glasses and method |
| US12034015B2 (en) | 2018-05-25 | 2024-07-09 | Meta Platforms Technologies, Llc | Programmable pixel array |
| US10708522B2 (en) | 2018-08-10 | 2020-07-07 | International Business Machines Corporation | Image sensor with analog sample and hold circuit control for analog neural networks |
| CN109409514A (en) * | 2018-11-02 | 2019-03-01 | 广州市百果园信息技术有限公司 | Fixed-point calculation method, apparatus, equipment and the storage medium of convolutional neural networks |
| CN113454984A (en) * | 2018-12-17 | 2021-09-28 | 脸谱科技有限责任公司 | Programmable pixel array |
| CN109685209A (en) * | 2018-12-29 | 2019-04-26 | 福州瑞芯微电子股份有限公司 | A kind of device and method for accelerating neural network computing speed |
| CN109685209B (en) * | 2018-12-29 | 2020-11-06 | 瑞芯微电子股份有限公司 | Device and method for accelerating operation speed of neural network |
| CN111435977A (en) * | 2019-01-14 | 2020-07-21 | 豪威科技股份有限公司 | Configurable interface alignment buffer between DRAM and logic cells for multi-die image sensor |
| CN111435977B (en) * | 2019-01-14 | 2021-09-17 | 豪威科技股份有限公司 | Configurable interface alignment buffer between DRAM and logic cells for multi-die image sensor |
| CN109902040A (en) * | 2019-02-01 | 2019-06-18 | 京微齐力(北京)科技有限公司 | A kind of System on Chip/SoC of integrated FPGA and artificial intelligence module |
| CN109902040B (en) * | 2019-02-01 | 2021-05-14 | 京微齐力(北京)科技有限公司 | System chip integrating FPGA and artificial intelligence module |
| CN109902835A (en) * | 2019-02-01 | 2019-06-18 | 京微齐力(北京)科技有限公司 | Processing unit is provided with the artificial intelligence module and System on Chip/SoC of general-purpose algorithm unit |
| CN110336963A (en) * | 2019-06-06 | 2019-10-15 | 上海集成电路研发中心有限公司 | A dynamic image processing system and image processing method |
| CN110311676A (en) * | 2019-06-24 | 2019-10-08 | 清华大学 | A vision system and data processing method for the Internet of Things using switching current technology |
| US12108141B2 (en) | 2019-08-05 | 2024-10-01 | Meta Platforms Technologies, Llc | Dynamically programmable image sensor |
| US11038520B1 (en) | 2020-04-15 | 2021-06-15 | International Business Machines Corporation | Analog-to-digital conversion with reconfigurable function mapping for neural networks activation function acceleration |
| CN111696025A (en) * | 2020-06-11 | 2020-09-22 | 西安电子科技大学 | Image processing device and method based on reconfigurable memory computing technology |
| CN111696025B (en) * | 2020-06-11 | 2023-03-24 | 西安电子科技大学 | Image processing device and method based on reconfigurable memory computing technology |
| WO2022001457A1 (en) * | 2020-06-30 | 2022-01-06 | 上海寒武纪信息科技有限公司 | Computing apparatus, chip, board card, electronic device and computing method |
| CN111757038A (en) * | 2020-07-07 | 2020-10-09 | 苏州华兴源创科技股份有限公司 | Pixel data processing method and integrated chip |
| CN111626414A (en) * | 2020-07-30 | 2020-09-04 | 电子科技大学 | Dynamic multi-precision neural network acceleration unit |
| US12075175B1 (en) | 2020-09-08 | 2024-08-27 | Meta Platforms Technologies, Llc | Programmable smart sensor with adaptive readout |
| CN112419140B (en) * | 2020-12-02 | 2024-01-23 | 海光信息技术股份有限公司 | Data processing device, data processing method and electronic equipment |
| CN112419140A (en) * | 2020-12-02 | 2021-02-26 | 海光信息技术股份有限公司 | Data processing device, data processing method and electronic equipment |
| CN114697578A (en) * | 2020-12-31 | 2022-07-01 | 清华大学 | Dual-mode image sensor chip based on three-dimensional stacking technology and imaging system |
| CN114697578B (en) * | 2020-12-31 | 2024-03-15 | 清华大学 | Dual-modal image sensor chip and imaging system based on three-dimensional stacking technology |
| CN112954241A (en) * | 2021-02-20 | 2021-06-11 | 南京威派视半导体技术有限公司 | Image data reading system of image sensor and reading and organizing method |
| CN115942139A (en) * | 2021-09-29 | 2023-04-07 | 香港物流机械人研究中心有限公司 | Imaging method, sensor, 3D shape reconstruction method and system |
| WO2023050109A1 (en) * | 2021-09-29 | 2023-04-06 | Congying Sui | An imaging method, sensor, 3d shape reconstruction method and system |
| CN113794849A (en) * | 2021-11-12 | 2021-12-14 | 深圳比特微电子科技有限公司 | Device and method for synchronizing image data and image acquisition system |
| CN114187161B (en) * | 2021-12-07 | 2025-03-18 | 浙江大学 | A universal and configurable image pipeline processing array architecture |
| CN114187161A (en) * | 2021-12-07 | 2022-03-15 | 浙江大学 | Universal configurable image pipeline processing array architecture |
| US12244936B2 (en) | 2022-01-26 | 2025-03-04 | Meta Platforms Technologies, Llc | On-sensor image processor utilizing contextual data |
| CN114155562A (en) * | 2022-02-09 | 2022-03-08 | 北京金山数字娱乐科技有限公司 | Gesture recognition method and device |
| CN114693559A (en) * | 2022-04-02 | 2022-07-01 | 深圳创维-Rgb电子有限公司 | Image processing optimization method and device, electronic equipment and readable storage medium |
| CN114693559B (en) * | 2022-04-02 | 2024-11-26 | 深圳创维-Rgb电子有限公司 | Image processing optimization method, device, electronic device and readable storage medium |
| CN114979521A (en) * | 2022-04-12 | 2022-08-30 | 昆明物理研究所 | A readout circuit with arbitrary windowing function |
| CN115984964A (en) * | 2022-12-30 | 2023-04-18 | 西安电子科技大学芜湖研究院 | A dynamic gesture recognition method and system based on RISC-V coprocessor |
| CN116546340A (en) * | 2023-07-05 | 2023-08-04 | 华中师范大学 | High-speed CMOS pixel detector |
| CN116546340B (en) * | 2023-07-05 | 2023-09-19 | 华中师范大学 | High-speed CMOS pixel detector |
| CN117130668A (en) * | 2023-10-27 | 2023-11-28 | 南京沁恒微电子股份有限公司 | A processor instruction fetch redirection timing optimization circuit |
| CN117130668B (en) * | 2023-10-27 | 2023-12-29 | 南京沁恒微电子股份有限公司 | A processor instruction fetch redirection timing optimization circuit |
| CN117271434A (en) * | 2023-11-15 | 2023-12-22 | 成都维德青云电子有限公司 | On-site programmable system-in-chip |
| CN117271434B (en) * | 2023-11-15 | 2024-02-09 | 成都维德青云电子有限公司 | On-site programmable system-in-chip |
| CN120224818A (en) * | 2025-05-28 | 2025-06-27 | 中国科学院半导体研究所 | Programmable visible light infrared vision chip |
Also Published As
| Publication number | Publication date |
|---|---|
| CN102665049B (en) | 2014-09-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN102665049B (en) | Programmable visual chip-based visual image processing system | |
| US11977971B2 (en) | Data volume sculptor for deep learning acceleration | |
| CN108268943B (en) | Hardware accelerator engine | |
| CN207440765U (en) | System on chip and mobile computing device | |
| CN207517054U (en) | Crossfire switchs | |
| CN103019656A (en) | Dynamically reconfigurable multi-stage parallel single instruction multiple data array processing system | |
| CN104112053B (en) | A kind of reconstruction structure platform designing method towards image procossing | |
| CN113301221B (en) | A kind of deep network camera image processing method and terminal | |
| CN109154975A (en) | For generating the device and method of local binary patterns LBP | |
| CN105611114B (en) | Digital multireel for AER imaging sensors accumulates nuclear convolution processing chip | |
| CN110443107A (en) | Image procossing for object detection | |
| CN104463125A (en) | DSP-based automatic face detecting and tracking device and method | |
| CN118333862B (en) | Satellite precipitation remote sensing image space-time super-resolution reconstruction method and system | |
| WO2022237061A1 (en) | Embedded object cognitive system based on image processing | |
| CN206058228U (en) | Machine Vision Inspection System | |
| CN105825219A (en) | Machine vision detection system | |
| Wasala et al. | Real-time HOG+ SVM based object detection using SoC FPGA for a UHD video stream | |
| Chua et al. | Visual IoT: ultra-low-power processing architectures and implications | |
| CN103345747A (en) | Optimized picture shape feature extraction and structuring description device and method based on horizontal coordinate | |
| CN115883937A (en) | Image processing platform, processing system and processing method | |
| Wu et al. | Parallelism optimized architecture on fpga for real-time traffic light detection | |
| CN108596013A (en) | Pedestrian detection method and device based on the study of more granularity depth characteristics | |
| Bhowmik et al. | Design of a reconfigurable 3d pixel-parallel neuromorphic architecture for smart image sensor | |
| CN110275842A (en) | FPGA-based hyperspectral target tracking system and method | |
| CN110110589A (en) | Face classification method based on FPGA parallel computation |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| C06 | Publication | ||
| PB01 | Publication | ||
| C10 | Entry into substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| C14 | Grant of patent or utility model | ||
| GR01 | Patent grant |