CN117311941B

CN117311941B - Image processing method and related equipment

Info

Publication number: CN117311941B
Application number: CN202311290316.4A
Authority: CN
Inventors: 李嘉昕; 黄彬; 于潇宇; 韩峰; 陈德炜; 章恒
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-09-28
Filing date: 2023-09-28
Publication date: 2024-09-10
Anticipated expiration: 2043-09-28
Also published as: CN117311941A

Abstract

The embodiment of the application provides an image processing method and related equipment, and relates to an integrated chip which can be used for executing: acquiring task information corresponding to an image processing task; the image processing task is a computing task associated with the image, the task information comprises marking results carried by N connected areas in the image, and N is a positive integer; generating a task scheduling strategy corresponding to the image processing task according to the marking result in the task information, the number of the acceleration processing engines and the running information of the acceleration processing engines; and distributing the image processing task to an acceleration processing engine according to the task scheduling strategy, and carrying out hardware acceleration processing on the image processing task through the acceleration processing engine. By the embodiment of the application, the utilization rate of the acceleration processing engine can be improved.

Description

Image processing method and related equipment

Technical Field

The present application relates to the field of computer vision, and in particular, to an image processing method and related devices, and more particularly, to an image processing method, an integrated chip, and a computer device.

Background

With the development of image processing technology, related algorithm tasks related to computer vision, pattern recognition and image analysis processing are more and more diversified; for example, license plate recognition, text recognition, caption recognition, target segmentation and extraction, region of interest extraction, and the like.

Currently, the execution sequence of each task can be determined by setting the priority of the task, for example, the idle computing units in the multiple computing units with corresponding functions can be controlled to execute the corresponding tasks in sequence according to the sequence from high to low of the priority of each task in the scheduling queue. The task with higher priority can acquire the idle computing unit and execute the task faster, and this way cannot guarantee the execution efficiency of the computing unit in a period of time, and the high-priority task load and other task loads with lower priority are not balanced, so that the resource waste of the computing unit is caused.

Disclosure of Invention

The embodiment of the application provides an image processing method and related equipment, which can improve the utilization rate of an acceleration processing engine.

In one aspect, an embodiment of the present application provides an image processing method, including:

acquiring task information corresponding to an image processing task; the image processing task is a computing task associated with the image, the task information comprises marking results carried by N connected areas in the image, and N is a positive integer;

generating a task scheduling strategy corresponding to the image processing task according to the marking result in the task information, the number of the acceleration processing engines and the running information of the acceleration processing engines;

And distributing the image processing task to an acceleration processing engine according to the task scheduling strategy, and carrying out hardware acceleration processing on the image processing task through the acceleration processing engine.

In one aspect, an embodiment of the present application provides an integrated chip, where the integrated chip includes a task scheduling management engine and an acceleration processing engine:

The task scheduling management engine is used for acquiring task information corresponding to the image processing task; the image processing task is a computing task associated with the image, the task information comprises marking results carried by N connected areas in the image, and N is a positive integer;

the task scheduling management engine is also used for generating a task scheduling strategy corresponding to the image processing task according to the marking result in the task information, the quantity of the acceleration processing engines and the running information of the acceleration processing engines;

The task scheduling management engine is also used for distributing the image processing task to the acceleration processing engine according to the task scheduling strategy;

The acceleration processing engine is used for carrying out hardware acceleration processing on the image processing task.

Wherein the integrated chip further comprises a tag processing engine;

the marking processing engine is used for carrying out the connected region marking processing on the image to obtain the label information of N connected regions contained in the image; the label information of each connected region is used for uniquely identifying the corresponding connected region; and

The method comprises the steps of creating an index for label information of each connected area, determining the label information of each connected area and the index created for each label information as a marking result, and writing the marking result into a storage area; the storage area comprises any one of a host memory, an on-chip storage unit and external storage equipment; and

And the method is also used for determining the marked result and the storage position corresponding to the marked image as task information of the image processing task and generating a configuration command carrying the task information.

Wherein, this mark processing engine specifically is used for:

and scanning pixels in the image, forming the pixels which have the same pixel value and are adjacent in position in the image into a connected region, and performing marking processing on the connected region contained in the image to obtain label information corresponding to the connected region contained in the image.

Wherein the integrated chip further comprises a bus unit;

The bus unit is used for sending a configuration command carrying task information to the task scheduling management engine.

The task scheduling management engine is specifically configured to:

Counting the number of the acceleration processing engines, acquiring performance indexes and resource utilization rates of the acceleration processing engines, and determining the performance indexes and the resource utilization rates as running information of the acceleration processing engines;

Counting the number of label information in the marking result to obtain the number of connected areas in the image; the number of connected areas in the image is N;

and generating a task scheduling strategy for the image processing task according to the number of connected areas in the image, the number of acceleration processing engines and the running information of the acceleration processing engines.

The task scheduling management engine comprises a task scheduling strategy generation unit, a command encapsulation unit and an arbitration management unit, wherein the number of acceleration processing engines in the integrated chip is M, and M is an integer greater than 1;

The task scheduling strategy generation unit is used for configuring execution sequences for all the connected areas in the image according to the dependency relationship among all the connected areas in the image;

the task scheduling strategy generating unit is also used for carrying out load balancing on the basis of the execution sequence, the number of connected areas in the image, the number of acceleration processing engines and the running information of the acceleration processing engines, and determining scheduling granularity for the image processing tasks;

The task scheduling strategy generation unit is also used for combining the execution sequence and the scheduling granularity into a task scheduling strategy corresponding to the image processing task and transmitting the task scheduling strategy to the command encapsulation unit;

The command packaging unit is used for splitting the image processing task into N subtasks according to the label information in the marking result, and packaging the N subtasks according to the task scheduling strategy to obtain a task execution command; the N subtasks are in one-to-one correspondence with the N communication areas in the image;

the arbitration management unit is used for distributing task execution commands to the M acceleration processing engines.

Wherein, this arbitration management unit is specifically used for:

Performing format conversion on the task execution command to obtain a task execution command after format conversion, and distributing the task execution command after format conversion to M acceleration processing engines; the task execution command after format conversion is a configuration format supported by an acceleration processing engine;

the N subtasks are distributed to M acceleration processing engines through the task execution command after format conversion, and the M acceleration processing engines are controlled to execute the respectively distributed subtasks;

and receiving subtask processing results returned by each of the M acceleration processing engines.

Wherein the integrated chip further comprises a post-processing engine;

the post-processing engine is used for merging the subtask processing results of the M acceleration processing engines to obtain a task processing result corresponding to the image processing task.

An aspect of an embodiment of the present application provides a computer device, including an integrated chip provided in the foregoing aspect of the embodiment of the present application, so that the computer device performs the method provided in the foregoing aspect of the embodiment of the present application.

In the embodiment of the application, task information corresponding to an image processing task is acquired, wherein the image processing task is a calculation task associated with an image, the task information comprises marking results carried by N connected areas in the image, and N is a positive integer; according to the marking result in the task information, the number of the acceleration processing engines and the running information of the acceleration processing engines, a task scheduling strategy corresponding to the image processing task is generated, so that the characteristics of the connected region marking algorithm can be fully utilized, and the running information and the number of the acceleration processing engines are combined to quickly generate the task scheduling strategy. Distributing the image processing task to an acceleration processing engine according to a task scheduling strategy, and carrying out hardware acceleration processing on the image processing task through the acceleration processing engine; the acceleration processing engine is flexibly distributed to the image processing task through the task scheduling strategy, so that the utilization rate of the acceleration processing engine can be improved to the greatest extent.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an integrated chip according to an embodiment of the present application;

FIG. 2 is a schematic diagram of interactions between multiple engines in a visual/image algorithm engine system provided by an embodiment of the present application;

fig. 3 is a schematic flow chart of an image processing method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a task scheduling management engine according to an embodiment of the present application;

fig. 5 is a second flowchart of an image processing method according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

For ease of understanding, the basic concepts involved in embodiments of the present application are described below:

A communication region (connected component): the connected region refers to an image region composed of pixels having the same pixel value and adjacent positions in the image. For example, if the pixel values of two pixels in an image are the same and the two pixels have adjacent positions in the image, the two pixels may be considered to be connected, and the two pixels may be further divided into the same connected region. In the embodiment of the application, the connected region in the image can be obtained by carrying out connected region analysis on the image, wherein the connected region analysis refers to a process of finding out and marking the connected region in the image, and the connected region analysis can also be called as connected region marking. Pixels belonging to the same connected region have the same label information, and pixels belonging to different connected regions have different label information.

Computer Vision (CV): the computer vision technology is a science of researching how to make a machine "look at", and more specifically, it means that a camera and a computer are used to replace human eyes to perform machine vision such as identification and measurement on a target, and further perform graphic processing, so that the computer is processed into an image more suitable for human eyes to observe or transmit to an instrument to detect. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR (Optical Character Recognition ), video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, map construction, etc., as well as common biometric recognition techniques such as face recognition, fingerprint recognition, etc.

The embodiment of the application can be used for processing related algorithms related to tasks such as computer vision, image processing and the like, such as character segmentation extraction (such as license plate recognition, text recognition, subtitle recognition and the like) in OCR recognition, motion foreground object segmentation and extraction (such as pedestrian detection, vision-based vehicle detection and positioning and the like) in vision positioning, medical image processing (such as region of interest extraction) and the like.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an integrated chip according to an embodiment of the present application. As shown in fig. 1, the integrated Chip may include a Processor Unit (Processor Unit), a Bus Unit (Bus Unit), a visual/image task engine (Vision/IMAGE TASK ENGINE), a visual/image algorithm engine (Vision/Image Algorithm Engine), an I/O (input/output) device Unit, an On-Chip Memory Unit (On-Chip Memory Unit), and the like. Wherein:

The processor unit may include one or more processors, for example, the processor unit shown in fig. 1 may include processor 1, processor 2, … …, and processor D, where D represents the number of processors included in the processor unit, and D is a positive integer, for example, D may take on the value of 1,2, … …. The processor unit is responsible for executing instructions and performing data processing, the performance and function of the processor unit being dependent on its design and implementation. Different processor units may have different architectures and instruction sets to accommodate different application requirements and performance requirements, which the embodiments of the present application are not limited to.

Bus units may be used to handle bus interfaces and data transfers. For example, the data transfer between the processor unit, the on-chip memory unit, the I/O device unit, and the various engines (e.g., vision/image task engine, vision/image algorithm engine) in the integrated chip all need to be implemented via a bus unit.

The on-chip memory unit is a memory unit in the integrated chip for storing and accessing data inside the integrated chip. On-chip memory cells can be divided into several different levels: ① Registers (Registers) are the fastest and lowest latency on-chip memory locations for storing operating instructions and data. ② A Cache (Cache) is a storage hierarchy located between a processor core and a main memory for speeding up accesses to the main memory. ③ The cache (High-SPEED CACHE) is a faster storage unit of larger capacity located above the cache that can be used to provide higher capacity and lower access latency.

The I/O device unit is a bridge for communicating between the integrated chip and the external device, and as shown in fig. 1, the I/O device unit may be used as a bridge for communicating between the integrated chip and the storage device or the host device, that is, the I/O device unit may be connected to the external storage device or the host device. The I/O device unit may include one or more I/O device controllers, for example, the I/O device unit may include I/O device controller 1, I/O device controllers 2, … …, and I/O device controller L, where L represents the number of I/O device controllers included in the I/O device unit, L is a positive integer, for example, L may take on a value of 1,2, … …, and the number of I/O device controllers included in the I/O device unit is not limited in the embodiment of the present application.

The integrated chip may include a plurality of engines that may be used to process various tasks within the integrated chip, where the tasks may be various types of computing tasks related to the technical fields of computer vision, image recognition, image processing, and the like. As shown in fig. 1, the engines in the integrated chip may include a visual/image task engine and a visual/image algorithm engine, where the number of visual image task engines and visual/image algorithm engines in the integrated chip may be one or more, and embodiments of the present application are not limited herein.

Where the vision/image task engine is used to perform various vision/image related tasks, where tasks may include, but are not limited to, image classification, object detection, face recognition, object segmentation, etc. The vision/image task engine may typically use a pre-trained deep learning model or machine learning algorithm to address specific vision problems and provide a simple and easy-to-use interface for the developer to invoke.

The vision/image algorithm engine may be used to implement and optimize vision/image processing algorithms; the vision/image algorithm engine may generally provide underlying image processing and computing capabilities such as image filtering, edge detection, feature extraction, and the like.

It will be appreciated that in practical applications, interactions may be made between the visual/image task engine and the visual/image algorithm engine. For example, a visual/image task engine will typically send a request to a visual/image algorithm engine requesting that a particular visual/image task be performed; the vision/image algorithm engine processes the image data by using a corresponding algorithm and model according to the request, and returns the processing result to the vision/image task engine; such interactions may enable more efficient and flexible image processing and analysis.

As with the integrated chip shown in fig. 1, the vision/image task engine may receive an image processing task (the image processing task herein may be a vision/image task, and may be a computing task) issued by the processor unit, and the image processing task may be sent to the vision/image algorithm engine through the bus unit (specifically through the routing of the bus unit), and call the vision/image algorithm engine to start a corresponding task and complete a corresponding task.

Optionally, the vision/image task engine may also receive an image processing task issued from the host device, where the image processing task may be received by the I/O device controller in the integrated chip, and may further send the image processing task to the vision/image algorithm engine through the bus unit, and call the vision/image algorithm engine to start a corresponding task and complete a corresponding task.

The computing data included in the image processing task may be stored in an on-chip storage unit inside the integrated chip, or may be stored in a storage device outside the integrated chip, or may also be stored in a storage unit running on the host device, which is not limited in the embodiment of the present application.

It should be noted that the integrated chip shown in fig. 1 may be disposed in a server or may be disposed in a terminal device, which is not limited in the present application. The terminal device may include an electronic device such as a smart phone, a tablet computer, a notebook computer, a palm computer, a mobile internet device (mobile INTERNET DEVICE, MID), a wearable device (such as a smart watch, a smart bracelet, etc.), a smart voice interaction device, a smart home appliance (such as a smart television, etc.), a vehicle-mounted device, an aircraft, etc., and the type of the terminal device is not limited in the present application. The server may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, and the application does not limit the type of the server.

It will be appreciated that each engine in the integrated chip shown in fig. 1 may be considered a system, which may likewise include multiple engines, e.g., a visual/image algorithm engine system may include multiple engines. Referring to fig. 2, fig. 2 is a schematic diagram illustrating interactions among a plurality of engines in a visual/image algorithm engine system according to an embodiment of the present application. As shown in fig. 2, the vision/image algorithm engine system may include a marking process engine, a task schedule management engine, an acceleration process engine, and a post-process engine. The embodiment of the application does not limit the number of the marking processing engine, the task scheduling management engine, the acceleration processing engine and the post-processing engine. For example, the number of acceleration processing engines may be M, which may be a positive integer. Wherein:

The marking processing engine is used for carrying out the marking processing of the connected region on the input image to obtain the marking result of the connected region contained in the image; that is, the marking engine may search for each connected region in the image and mark each connected region to obtain a marking result corresponding to each connected region. It may be understood that any image may include at least one connected region, and for convenience of understanding, in this embodiment of the present application, the number of connected regions included in an image may be denoted as N, where N may be a positive integer, for example, N may take the values of 1,2, … ….

The task scheduling management engine is used for acquiring task information corresponding to an image processing task, wherein the image processing task can be a calculation task which is received by the vision/image algorithm engine and is associated with an image, and the task information can comprise marking results carried by N connected areas in the image; generating a task scheduling strategy corresponding to the image processing task according to the marking result in the task information, the number of the acceleration processing engines and the running information of the acceleration processing engines; and splitting and distributing the image processing tasks based on the task scheduling strategy. The task scheduling policy may refer to a rule for allocating and scheduling an image processing task, for example, the task scheduling policy may specify how to split the image processing task, in which order the split task is executed, how to allocate the split task to an acceleration processing engine, and so on.

The acceleration processing engine is configured to perform hardware acceleration processing on its assigned tasks, where the hardware acceleration processing may include, but is not limited to: image size and format conversion, image filtering and enhancement, image binarization, edge extraction, matrix operation, affine transformation, histogram, acquisition and calculation of angles, areas, etc. of the graph. For example, the acceleration processing engine may specifically be a binarization engine, a contour search engine, or the like, and obviously, the binarization engine may be used for performing image binarization processing on the task allocated thereto, and the contour search engine may be used for performing contour search processing on the task allocated thereto, or the like. The number of the acceleration processing engines can be M, and each acceleration processing engine can process the tasks allocated to each acceleration processing engine in parallel, so that parallel hardware acceleration can be performed on the image processing tasks. It can be understood that the value of M can be set according to the requirements of the actual application scenario, for example, the value of M can be set according to the performance index of the integrated chip, if the throughput requirement is higher, the value of M can be relatively larger; conversely, where throughput requirements are low, the value of M may be relatively small.

The post-processing engine can be used for screening, merging, sorting, packaging and the like of the results of each acceleration processing engine.

Referring to fig. 3, fig. 3 is a schematic flow chart of an image processing method according to an embodiment of the present application. It may be appreciated that the image processing method may be performed by a computer device, where an integrated chip shown in fig. 1 may be provided, and the computer device may be a terminal device, or may be a server, which is not limited in this embodiment of the present application; the image processing method may include the following steps S101 to S103:

Step S101, task information corresponding to an image processing task is obtained; the image processing task is a computing task associated with the image, the task information comprises marking results carried by N connected areas in the image, and N is a positive integer.

In the embodiment of the present application, as shown in fig. 1, after receiving an image processing task issued by a processor unit or a host device, a visual/image task engine in an integrated chip may send the visual/image task engine to a visual/image algorithm engine through a bus unit. The image processing task may be a computing task associated with an image, which may also be referred to herein as a vision/image task, such as, but not limited to, image filtering, edge detection, feature extraction, etc. The image may be an image acquired in real time by using an image capturing apparatus (may be a camera integrated in a computer apparatus or may be an external image capturing apparatus), or may be an image directly downloaded in the internet, or may be an image resource or the like stored in advance in a local gallery. The image may be a still image, or may be a dynamic image, or may be any video frame in video data, and the number of images may be one or more, which is not limited in the embodiment of the present application.

The visual/image algorithm engine system can comprise a marking processing engine, and after receiving an image processing task, the visual/image algorithm engine can call the marking processing engine to carry out connected region marking processing on an image associated with the image processing task so as to obtain a marking result corresponding to the image. The marking result comprises label information corresponding to each connected region contained in the image and index information created for each connected region, and the label information of each connected region can be used for uniquely identifying the corresponding connected region; that is, one connected region in the image may correspond to one tag information and one index information, different connected regions may correspond to different tag information, and different tag information may correspond to different index information. In the embodiment of the application, the number of the connected areas contained in the image can be recorded as N, wherein N is a positive integer.

After the marking processing engine is called to obtain the marking result of each connected area in the image, the marking result may be written into a storage area through a bus unit, where the storage area may be any one of an on-chip storage unit in an integrated chip, a storage unit running in a host device (or may be referred to as a host memory), an external storage device, and the embodiment of the present application is not limited to this.

In one possible implementation, the visual/image algorithm engine system of the integrated chip may further include a bus unit and a task scheduling management engine. The marking result of each connected region contained in the image and the storage position corresponding to the marked image can be used as the task information of the image processing task, and a configuration command carrying the task information is generated, and the configuration command can be generated by a marking processing engine in the integrated chip; and then the bus unit in the integrated chip can be called to send a configuration command carrying the task information to the task scheduling management engine. The storage location corresponding to the marking result may be an address storage space in the storage area of the marking result.

After receiving the configuration command, the task scheduling management engine can acquire task information carried by the configuration command, and reads marked images from the storage area according to storage positions contained in the task information, wherein the marked images can be images after finishing the marking of the connected area; further, the positions of the respective connected regions for locating each connected region included in the image may be determined from the marked image based on the marking result in the task information.

Step S102, generating a task scheduling strategy corresponding to the image processing task according to the marking result in the task information, the number of the acceleration processing engines and the running information of the acceleration processing engines.

Specifically, referring to fig. 4, fig. 4 is a schematic structural diagram of a task scheduling management engine according to an embodiment of the present application. As shown in fig. 4, the task scheduling management engine may include a scheduling policy generation unit, a command encapsulation unit, and an arbitration management unit. The scheduling policy generating unit in the task scheduling management engine may receive a configuration command sent by the bus unit, where the configuration command may carry task information corresponding to an image processing task, and the task information may include, but is not limited to, a marking result of each connected area in an image, an image storage location (for example, a storage location corresponding to a marked image) where marking of the connected area is completed, a matching parameter, and the like.

Optionally, a task schedule management engine (specifically, a schedule policy generation unit in the task schedule management engine is called) may be called to count the number of acceleration processing engines, obtain performance indexes and resource utilization rates of the acceleration processing engines, and determine the performance indexes and the resource utilization rates as running information of the acceleration processing engines.

The embodiment of the application can record the number of the acceleration processing engines as M, wherein M is a positive integer; the performance index of the acceleration processing engine may include an index of processing speed, throughput, delay, etc. of the acceleration processing engine, and optionally, the performance index of the acceleration processing engine may further include an accuracy index, such as an error rate, a distortion degree, etc.; the resource utilization of the acceleration processing engine may refer to the utilization of system resources by the acceleration processing engine, such as CPU (Central Processing Unit/Processor) occupancy, memory occupancy, I/O load, etc. during the operation of the acceleration processing engine. The task scheduling strategy is generated in real time based on the operation information of the acceleration processing engine (the task scheduling strategy at the moment can be a dynamic strategy), so that the utilization rate of system resources can be improved, and the excessive occupation of the system resources is avoided.

Invoking a scheduling strategy generation unit in a task scheduling management engine to count the number of tag information in a marking result to obtain the number of connected areas in an image, wherein the number of the connected areas in the image can be recorded as N; invoking a scheduling strategy generating unit in a task scheduling management engine, and configuring an execution sequence for each connected region in the image according to the dependency relationship between each connected region in the image; according to the execution sequence, the number of connected areas in the image, the number of acceleration processing engines and the running information of the acceleration processing engines are subjected to load balancing, and scheduling granularity is determined for the image processing tasks; and then the execution sequence and the scheduling granularity can be combined into a task scheduling strategy corresponding to the image processing task, and the task scheduling strategy is transmitted to the command packaging unit.

In one possible implementation, to increase flexibility, task scheduling policies may include, but are not limited to, various combinations of configuration, scheduling granularity, execution order, load balancing, and the like. Wherein:

① The configuration mode can support static configuration and also can support dynamic configuration. Static configuration refers to allowing software to program and control the scheduling policy of each image processing task; the dynamic configuration refers to a task scheduling strategy generated in real time according to the running information of the acceleration processing engine, and compared with the static configuration, the dynamic configuration is more flexible and intelligent, and can be adjusted according to the real-time running information of the acceleration processing engine so as to improve the efficiency and performance of the acceleration processing engine. It may be appreciated that, in the embodiment of the present application, the task scheduling policy corresponding to the image processing task may be generated in a static configuration or dynamic configuration manner, which is not limited by the present application.

② The scheduling granularity may refer to a unit size of scheduled execution of the image processing task, and may be flexibly configured according to different application scenarios and requirements. Scheduling granularity supports, but is not limited to, multiple scenarios of single instruction, multiple instruction, compound instruction, mixed instruction, etc. It can be appreciated that according to the embodiment of the application, any one of various scenes such as a single instruction, a multi-instruction, a compound instruction, a mixed instruction and the like can be selected according to specific application scenes and requirements to generate a task scheduling strategy corresponding to an image processing task.

In the single instruction scheduling granularity scene, the task scheduling management engine takes the minimum task unit as a basic unit, such as a single instruction and the like; the granularity can realize the execution of data-intensive tasks, and is suitable for tasks with larger data volume and longer processing time, such as vector instruction operation and the like.

In a multi-instruction scheduling granularity scene, a task scheduling management engine schedules by taking a plurality of instructions as basic units, such as a group of related instruction sequences or a plurality of tasks submitted at one time; the granularity can improve the efficiency and parallelism of task scheduling, and is suitable for multi-thread and multi-core computing environments.

Under the scene of compound instruction scheduling granularity, a task scheduling management engine combines a plurality of related instructions into a compound instruction to perform task scheduling; the granularity can reduce the dependency relationship and communication overhead between instructions, improves the instruction level parallelism, and is suitable for instruction level parallel computing and specific application fields.

Under the mixed instruction scheduling granularity scene, the task scheduling management engine can flexibly select different scheduling granularities according to the characteristics and requirements of tasks. For example, single instruction scheduling granularity is used in some tasks, while compound instruction scheduling granularity is used in other tasks; such flexible scheduling granularity may better balance efficiency of task execution and resource utilization.

③ The task scheduling strategy can also be determined according to different execution sequences; for example, a parallel execution strategy may be selected to execute multiple tasks simultaneously; a serial execution strategy can also be selected to sequentially execute single tasks; a policy of hybrid execution may also be selected, an execution order may be determined according to a dependency relationship of tasks, and the like. Optionally, the execution sequence may also support multiple application scenarios such as sequential execution sequence return, out-of-order execution sequence return, sequential execution out-of-order return, out-of-order execution sequence return, and the like. It can be appreciated that the embodiment of the present application may select any one of the above execution sequences to generate a task scheduling policy corresponding to the image processing task

For example, assume that the order among the plurality of tasks is task 1, task 2, and task 3, and in the order execution order return scenario, task 1, task 2, and task 3 are sequentially executed in order, and the return order of the task execution results is still task 1, task 2, and task 3. In the out-of-order execution out-of-order return scenario, the execution order of the task 1, the task 2, and the task 3, and the return order of the execution results are not limited.

In the out-of-order return scenario, task 1, task 2, and task 3 need to be sequentially executed, but the return order of the task execution results may be random, for example, which task is executed first and then the task execution result of the one task is returned first.

In the out-of-order execution sequence return scenario, task 1, task 2 and task 3 may be executed in any order, but after task execution is completed, task execution results need to be returned in the order of task 1, task 2 and task 3; if the execution completion time of the task 2 is earlier than the execution completion time of the task 1, the task execution result of the task 2 needs to wait until the task 1 is completed and returns the corresponding task execution result, and then the task execution of the task 2 cannot be returned, i.e. the return sequence of the task execution results needs to be performed according to the sequence of the task 1, the task 2 and the task 3.

④ Load balancing is a technique to evenly distribute tasks across multiple resources (e.g., M acceleration processing engines) to improve system performance, scalability, and reliability; the purpose of load balancing is to make each acceleration processing engine fully utilized, and avoid resource overload or resource idle. For example, the current load of each acceleration processing engine may be acquired, and the image processing tasks are equally distributed to the M acceleration processing engines based on the current load of each acceleration processing engine; the current load may be the number of tasks that each acceleration processing engine is waiting for processing, or may be a time length that the task waiting for processing needs to spend, which is not limited by the embodiment of the present application. It should be understood that when generating a task scheduling policy for an image processing task according to the load balancing principle, not only the current load of each acceleration search engine needs to be considered, but also the number of connected areas contained in an image (which may also be understood as the workload of the image processing task).

It can be understood that, in the embodiment of the present application, one or more of the foregoing configuration manner, scheduling granularity, execution sequence, and load balancing may be selected to be a task scheduling policy corresponding to an image processing task, so that flexibility of the task scheduling policy may be improved, and the method is simple and efficient.

Step S103, distributing the image processing task to an acceleration processing engine according to the task scheduling strategy, and carrying out hardware acceleration processing on the image processing task through the acceleration processing engine.

Specifically, as shown in fig. 4, after the command encapsulation unit in the task scheduling management engine is invoked to transmit the generated task scheduling policy to the command encapsulation unit, the command encapsulation unit in the task scheduling management engine may be invoked to split the image processing task into N subtasks according to the tag information in the tag result, where the N subtasks are in one-to-one correspondence with the N connected areas in the image.

It can be understood that, since the tag information can be used to uniquely identify the corresponding connected region, the number of tag information in the marking result can be understood as the number of connected regions (which can be denoted as N) included in the image, and the image region formed by the pixels having the same tag information in the image is one connected region, and one connected region can correspond to one subtask; the image processing task is for the whole image, that is, the image processing task includes processing of all connected regions in the image, and when the image processing task is split according to the number of connected regions, the image processing task may be split into N subtasks, one subtask including processing of one connected region.

And calling a command packaging unit in the task scheduling management engine, evaluating the complexity of the image processing task according to the generated task scheduling strategy, and further performing command packaging on N subtasks according to the task scheduling strategy to obtain a task execution command. Wherein the command encapsulation herein may include, but is not limited to, encapsulation of a batch or single instruction of N sub-tasks, a task execution command being a series of commands for N sub-tasks.

For example, if the task scheduling policy includes a single instruction scheduling granularity, each of the N subtasks may be individually packaged with a command, where the task execution command may include execution commands corresponding to the N subtasks, and the number of the execution commands may be denoted as N.

If the task scheduling policy includes multi-instruction scheduling granularity, the N subtasks may be packaged in batch, where the number of task execution commands is smaller than N, for example, the image processing task is split into 9 subtasks, and according to the multi-instruction scheduling granularity, the batch of 3 subtasks may be packaged at a time, and then the task execution commands may include 3 execution commands, that is, the number of the task execution commands is 3.

It can be understood that, when the task scheduling policy includes various combinations of a configuration mode, load balancing, scheduling granularity, execution sequence, and the like, in a process of calling the command encapsulation unit in the task scheduling management engine to encapsulate the N subtasks, the configuration mode, load balancing, scheduling granularity, execution sequence, and the like in the task scheduling policy need to be referred to simultaneously, so as to obtain task execution commands for the N subtasks. The format of the task execution command and the number of commands included in the task execution command are not limited herein, and may be determined by a task scheduling policy, or may be set by a specific application scenario and requirements, etc.

Further, an arbitration management unit in the task scheduling management engine can be called to distribute task execution commands to the M acceleration processing engines; and distributing the N subtasks to the M acceleration processing engines through the task execution command, and controlling the M acceleration processing engines to execute the respectively distributed subtasks. Optionally, when the task execution command does not meet the configuration format supported by the acceleration processing engine, an arbitration management unit in the task scheduling management engine may be called to perform format conversion on the task execution command, so as to obtain a task execution command after format conversion; the task execution command after format conversion is a configuration format supported by an acceleration processing engine; and further, the task execution command after format conversion can be distributed to M acceleration processing engines.

As shown in fig. 4, the arbitration management unit may parse the task execution command to obtain the effective content (for example, the subtasks included in each task execution command, that is, the corresponding connected areas) in the task execution command, and further may perform format conversion on the obtained effective content to generate a task execution command after format conversion, where the command format of the task execution command after format conversion is a configuration format supported by the acceleration processing engine. The arbitration management unit can submit the task execution command after format conversion to M acceleration processing engines through the bus unit, the M acceleration processing engines start to start and calculate corresponding subtasks after receiving the respective task execution commands, and command response results (which can be called subtask processing results) can be returned to the arbitration management unit through the bus unit after the subtask execution is completed. It is understood that the arbitration management unit may be responsible for managing, task allocation, scheduling and reclamation of the M acceleration processing engines, reception and processing of command responses, etc. at the same time. For example, when the number N of connected regions included in the image is a positive integer, the arbitration management unit in the task scheduling management Engine may schedule the subtask (i) to Engine j according to the task scheduling policy, where i e [1, N ], j e [1, m ]. The task (i) represents an ith subtask in N subtasks, the Engine j represents a jth acceleration processing Engine in M acceleration processing engines, and j is a positive integer less than or equal to M.

Optionally, as shown in fig. 4, the task scheduling management engine may further include a post-processing engine, and the call arbitration management unit receives subtask processing results (command response results) returned by each of the M acceleration processing engines; and calling the post-processing engine to combine the subtask processing results of the M acceleration processing engines to obtain a task processing result corresponding to the image processing task.

In the embodiment of the application, task information corresponding to an image processing task is acquired, wherein the image processing task is a calculation task associated with an image, the task information comprises marking results carried by N connected areas in the image, and N is a positive integer; according to the marking result in the task information, the number of the acceleration processing engines and the running information of the acceleration processing engines, a task scheduling strategy corresponding to the image processing task is generated, so that the characteristics of the connected region marking algorithm can be fully utilized, and the running information and the number of the acceleration processing engines are combined to quickly generate the task scheduling strategy. Distributing the image processing task to an acceleration processing engine according to a task scheduling strategy, and carrying out hardware acceleration processing on the image processing task through the acceleration processing engine; the acceleration processing engine is flexibly distributed to the image processing task through the task scheduling strategy, and the task priority is determined without adding additional computing resources, so that the image task processing can be simply and efficiently scheduled, the utilization rate of the acceleration processing engine is improved to the maximum extent, and the throughput and the performance of the integrated chip overall system are further improved.

Referring to fig. 5, fig. 5 is a second flowchart of an image processing method according to an embodiment of the present application; it may be appreciated that the image processing method may be performed by a computer device, where an integrated chip shown in fig. 1 may be provided, and the computer device may be a terminal device, or may be a server, which is not limited in this embodiment of the present application; the image processing method may include steps S201 to S205:

Step S201, a connected region labeling process is performed on the input image.

Specifically, an image processing task associated with an image can be acquired, and a marking processing engine in a vision/image algorithm engine of the integrated chip is called to perform connected region marking processing on the image, so as to obtain label information of N connected regions contained in the image; the label information of each connected region is used for uniquely identifying the corresponding connected region; creating an index for the label information of each connected region, determining the label information of each connected region and the index created for each label information as a marking result, and writing the marking result into a storage area; the storage area comprises any one of a host memory, an on-chip storage unit and an external storage device. The determining process of the tag information of each connected area may include: invoking a marking processing engine to scan pixels in the image, and forming the pixels which have the same pixel value and are adjacent to each other in position in the image into a communication area; and calling a marking processing engine to carry out marking processing on the connected region contained in the image to obtain label information corresponding to the connected region contained in the image.

In one possible embodiment, in the process of performing the connected region labeling process on the image, the determining process of the connected region may include, but is not limited to: ① Performing binarization processing on the image to obtain a binary image; ② And adopting a connected region marking algorithm to carry out connected region marking processing on the binary image. In one specific implementation, a connected region marking algorithm is adopted to determine pixels which have the same pixel value and are adjacent in position in an image, and connected regions in the image are obtained based on the pixels which have the same pixel value and are adjacent in position.

The connected region labeling algorithm herein may include, but is not limited to: 4 adjacency algorithm and 8 adjacency algorithm. Wherein 4 and 8 refer to directions; taking the 4-adjacency algorithm as an example, the 4-adjacency algorithm refers to: if there are white pixels in the binary image that are adjacent above, adjacent below, adjacent right, and adjacent left (4 directions in total), then the white pixels corresponding to the adjacent above, adjacent below, adjacent right, and adjacent left determine to be adjacent to the target white pixel X, and mark the white pixels corresponding to the adjacent above, adjacent below, adjacent right, and adjacent left as the same label as the target white pixel X. And continuing to determine the adjacent white pixels of the corresponding white pixels above the adjacent white pixels, so that the adjacent white pixels of the corresponding white pixels above the adjacent white pixels are marked with the same label as the target white pixels X, and the connected region in the image is formed according to the pixels with the same label. Similar to the 4-adjacency algorithm, the 8-adjacency algorithm is to find whether there are white pixels in the diagonal directions (8 directions in total) of the up-down, left-right and diagonal directions of the target white pixel X in the binary image to determine the connected region, and will not be described in detail herein.

Step S202, judging whether all the connected areas in the image are marked completely, if all the connected areas in the image are marked completely, executing step S203, and if the connected areas in the image are not marked completely, continuing executing step S201 until the marking result of each connected area in the image is obtained.

In step S203, if all the connected areas included in the image are marked, a marking result is obtained.

Specifically, the marking result includes label information marked for each connected region in the image and index information created for each label information; the tag information of each connected region is used to uniquely identify the corresponding connected region. For example, the number of connected regions included in the image is N, N being an integer greater than 1, and tag information marked by each connected region may be represented as label1, label2, … labelN. Index information created for each tag information may be recorded as Label Value, corresponding to the pseudocode:

for(i＝1；i<N+1；i＝i+1)

label_value(i)＝get_label_val(i)。

Where i denotes the i-th tag information, get_label_val (i) denotes index information created for the i-th tag information by obtaining the number value of the i-th tag information, and labeljvalue (i) denotes index information created for the i-th tag information.

Step S204, generating a task scheduling strategy corresponding to the image processing task according to the marking result in the task information, the number of the acceleration processing engines and the running information of the acceleration processing engines.

Step S205, the image processing task is distributed to an acceleration processing engine according to the task scheduling strategy, and the acceleration processing engine is used for carrying out hardware acceleration processing on the image processing task.

The specific implementation process of step S204 and step S205 may refer to step S102 and step S103 in the embodiment corresponding to fig. 3, which are not described herein.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the application. As shown in fig. 6, the computer device 1000 may be a server or a terminal device, which will not be limited herein. For easy understanding, taking a terminal device as an example, the computer device 1000 may include: integrated chip 1001, network interface 1004 and memory 1005, and further, the computer device 1000 may also include a user interface 1003, and at least one communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display (Display), a Keyboard (Keyboard), and optionally, the user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may also optionally be at least one memory device located remotely from the integrated chip 1001. As shown in fig. 6, an operating system, a network communication module, a user interface module, and a device control application may be included in the memory 1005, which is a type of computer-readable storage medium.

In the computer device 1000 shown in fig. 6, the network interface 1004 may also provide a network communication function, and the user interface 1003 is mainly used as an interface for providing input for a user; while the integrated chip 1001 may be used to invoke the device control application stored in the memory 1005, the integrated chip 1001 may include a task schedule management engine and an acceleration processing engine:

In one possible embodiment, the integrated chip 1001 further includes a tag processing engine;

In one possible embodiment, the marking engine is specifically configured to:

In a possible embodiment, the integrated chip 1001 further comprises a bus unit;

In one possible embodiment, the task scheduling management engine is specifically configured to:

In a possible embodiment, the task scheduling management engine includes a task scheduling policy generating unit, a command packaging unit, and an arbitration management unit, where the number of acceleration processing engines in the integrated chip 1001 is M, and M is an integer greater than 1;

In a possible embodiment, the arbitration management unit is specifically configured to:

In one possible embodiment, the integrated chip 1001 further includes a post-processing engine;

It should be understood that the computer device 1000 described in the embodiments of the present application may perform the description of the image processing method in the embodiments corresponding to any one of fig. 3 and fig. 5, which is not repeated herein. In addition, the description of the beneficial effects of the same method is omitted.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of action described, as some steps may be performed in other order or simultaneously according to the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program stored in a computer-readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.

The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims

1. An image processing method, comprising:

acquiring task information corresponding to an image processing task; the image processing task is a computing task associated with an image, the task information comprises marking results carried by N connected areas in the image, and N is a positive integer;

Distributing the image processing task to the acceleration processing engine according to the task scheduling strategy, and carrying out hardware acceleration processing on the image processing task through the acceleration processing engine;

The method is executed by calling an integrated chip, and the integrated chip comprises the acceleration processing engine and a task scheduling management engine; the generating a task scheduling policy corresponding to the image processing task according to the marking result in the task information, the number of acceleration processing engines and the operation information of the acceleration processing engines comprises the following steps:

Invoking the task scheduling management engine to count the number of the acceleration processing engines, acquiring performance indexes and resource utilization rates of the acceleration processing engines, and determining the performance indexes and the resource utilization rates as running information of the acceleration processing engines;

Invoking the task scheduling management engine to count the number of label information in the marking result to obtain the number of connected areas in the image; the number of the connected areas in the image is N;

And calling the task scheduling management engine, and generating a task scheduling strategy for the image processing task according to the number of connected areas in the image, the number of the acceleration processing engines and the running information of the acceleration processing engines.

2. The method of claim 1, wherein the integrated chip further comprises a tag processing engine;

the method further comprises the steps of:

invoking the marking engine to carry out connected region marking processing on the image to obtain label information of N connected regions contained in the image; the label information of each connected region is used for uniquely identifying the corresponding connected region;

Creating an index for the label information of each connected region, determining the label information of each connected region and the index created for each label information as a marking result, and writing the marking result into a storage area; the storage area comprises any one of a host memory, an on-chip storage unit and an external storage device.

3. The method according to claim 2, wherein the calling the marking engine to perform the connected region marking process on the image to obtain tag information of N connected regions included in the image includes:

invoking the marking processing engine to scan pixels in the image, and forming the pixels which have the same pixel value and are adjacent in position in the image into a communication area;

And calling the marking processing engine to carry out marking processing on the connected region contained in the image to obtain label information corresponding to the connected region contained in the image.

4. The method of claim 1, wherein the integrated chip further comprises a tag processing engine and a bus unit;

the method further comprises the steps of:

invoking the marking processing engine to determine the marking result and the storage position corresponding to the marked image as task information of the image processing task, and generating a configuration command carrying the task information;

And calling the bus unit to send a configuration command carrying the task information to the task scheduling management engine.

5. The method according to claim 1, wherein the task scheduling management engine includes a task scheduling policy generation unit and a command encapsulation unit;

the step of calling the task scheduling management engine, the step of generating a task scheduling policy for the image processing task according to the number of connected areas in the image, the number of acceleration processing engines and the running information of the acceleration processing engines, comprises the following steps:

Invoking the task scheduling strategy generation unit, and configuring an execution sequence for each connected region in the image according to the dependency relationship between each connected region in the image;

invoking the task scheduling strategy generation unit, carrying out load balancing on the number of the connected areas in the image, the number of the acceleration processing engines and the running information of the acceleration processing engines according to the execution sequence, and determining scheduling granularity for the image processing task;

and calling the task scheduling strategy generation unit to combine the execution sequence and the scheduling granularity into a task scheduling strategy corresponding to the image processing task, and transmitting the task scheduling strategy to the command encapsulation unit.

6. The method according to claim 1, wherein the number of the acceleration processing engines is M, the task scheduling management engine includes a command encapsulation unit and an arbitration management unit, M is an integer greater than 1;

The image processing task is distributed to the acceleration processing engine according to the task scheduling strategy, and the acceleration processing engine is used for carrying out hardware acceleration processing on the image processing task, and the method comprises the following steps:

Invoking a command encapsulation unit in the task scheduling management engine, and splitting the image processing task into N subtasks according to the label information in the label result; the N subtasks are in one-to-one correspondence with N communication areas in the image;

Invoking the command encapsulation unit, and encapsulating the commands of the N subtasks according to the task scheduling strategy to obtain task execution commands;

Invoking an arbitration management unit in the task scheduling management engine to distribute the task execution command to M acceleration processing engines;

and distributing the N subtasks to the M acceleration processing engines through the task execution command, and controlling the M acceleration processing engines to execute the respectively distributed subtasks.

7. The method of claim 6, wherein said invoking an arbitration management unit in said task scheduling management engine to distribute said task execution command to M acceleration processing engines comprises:

Invoking an arbitration management unit in the task scheduling management engine to perform format conversion on the task execution command to obtain a task execution command after format conversion; the task execution command after format conversion is a configuration format supported by the M acceleration processing engines;

and calling the arbitration management unit to distribute the task execution command after format conversion to the M acceleration processing engines.

8. The method of claim 6 or 7, wherein the integrated chip further comprises a post-processing engine;

the method further comprises the steps of:

Invoking the arbitration management unit to receive subtask processing results returned by the M acceleration processing engines respectively;

And calling the post-processing engine to combine the subtask processing results of the M acceleration processing engines to obtain a task processing result corresponding to the image processing task.

9. An integrated chip, comprising a task scheduling management engine and an acceleration processing engine:

The task scheduling management engine is used for acquiring task information corresponding to the image processing task; the image processing task is a computing task associated with an image, the task information comprises marking results carried by N connected areas in the image, and N is a positive integer;

The task scheduling management engine is also used for generating a task scheduling strategy corresponding to the image processing task according to the marking result in the task information, the number of the acceleration processing engines and the running information of the acceleration processing engines;

the task scheduling management engine is further used for distributing the image processing task to the acceleration processing engine according to the task scheduling strategy;

The acceleration processing engine is used for carrying out hardware acceleration processing on the image processing task;

The task scheduling management engine is specifically configured to:

counting the number of label information in the marking result to obtain the number of connected areas in the image; the number of the connected areas in the image is N;

and generating a task scheduling strategy for the image processing task according to the number of the connected areas in the image, the number of the acceleration processing engines and the running information of the acceleration processing engines.

10. The integrated chip of claim 9, further comprising a tag processing engine;

The marking processing engine is used for carrying out the connected region marking processing on the image to obtain label information of N connected regions contained in the image; the label information of each connected region is used for uniquely identifying the corresponding connected region; and

The method is also used for creating an index for the label information of each connected region, determining the label information of each connected region and the index created for each label information as a marking result, and writing the marking result into a storage area; the storage area comprises any one of a host memory, an on-chip storage unit and external storage equipment; and

And the marking result and the storage position corresponding to the marked image are determined to be the task information of the image processing task, and a configuration command carrying the task information is generated.

11. The integrated chip of claim 10, wherein the tag processing engine is specifically configured to:

12. The integrated chip of claim 10, further comprising a bus unit;

the bus unit is used for sending a configuration command carrying the task information to the task scheduling management engine.

13. The integrated chip of claim 9, wherein the task scheduling management engine comprises a task scheduling policy generation unit, a command encapsulation unit, and an arbitration management unit, the number of acceleration processing engines in the integrated chip being M, M being an integer greater than 1;

the task scheduling policy generating unit is further configured to perform load balancing on the number of connected areas in the image, the number of acceleration processing engines, and operation information of the acceleration processing engines according to the execution sequence, and determine scheduling granularity for the image processing task;

the task scheduling strategy generation unit is further used for combining the execution sequence and the scheduling granularity into a task scheduling strategy corresponding to the image processing task, and transmitting the task scheduling strategy to the command encapsulation unit;

The command packaging unit is used for splitting the image processing task into N subtasks according to the label information in the marking result, and packaging the N subtasks according to the task scheduling strategy to obtain a task execution command; the N subtasks are in one-to-one correspondence with N communication areas in the image;

the arbitration management unit is used for distributing the task execution command to M acceleration processing engines.

14. The integrated chip of claim 13, wherein the arbitration management unit is specifically configured to:

Performing format conversion on the task execution command to obtain a task execution command after format conversion, and distributing the task execution command after format conversion to the M acceleration processing engines; the task execution command after format conversion is a configuration format supported by an acceleration processing engine;

The N subtasks are distributed to the M acceleration processing engines through the task execution command after format conversion, and the M acceleration processing engines are controlled to execute the respectively distributed subtasks;

And receiving subtask processing results returned by the M acceleration processing engines respectively.

15. The integrated chip of claim 13 or 14, further comprising a post-processing engine;

and the post-processing engine is used for merging the subtask processing results of the M acceleration processing engines to obtain a task processing result corresponding to the image processing task.

16. A computer device comprising the integrated chip of any one of claims 9 to 15, such that the computer device performs the method of any one of claims 1 to 8.