CN117642774A

CN117642774A - Method, device and computer storage medium for automatically generating annotated images

Info

Publication number: CN117642774A
Application number: CN202180100664.3A
Authority: CN
Inventors: 王海峰; 费涛; 邹文超
Original assignee: Siemens Ltd China
Current assignee: Siemens Ltd China
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2024-03-01
Also published as: WO2023028777A1

Abstract

Methods, devices and computer storage media for automatically generating annotated images, the method includes: generating a three-dimensional scene model (201) based on one or more CAD models corresponding to one or more objects; based on the three-dimensional scene model, generating A set of two-dimensional projected images about one or more objects (202); automatically annotating each image in the two-dimensional projected image set to generate annotation information associated with unoccluded objects (203); summing the annotation information with Background images related to the production line of one or more objects are fused to generate an annotation image (204). The process of image generation and annotation can be automated by utilizing CAD models. Furthermore, the method can be automatically performed by a computer, thus avoiding errors, inaccuracies and lost markups.

Description

Method, device and computer storage medium for automatically generating annotated images

Technical field

本发明涉及图像处理领域，更具体地说，涉及用于自动生成标注图像的方法、装置、计算设备、计算机可读存储介质和计算机程序产品。The present invention relates to the field of image processing, and more specifically, to methods, devices, computing devices, computer-readable storage media and computer program products for automatically generating annotated images.

Background technique

近年来，人工智能(AI)得到了广泛的关注和巨大的发展。包括工业在内的各个领域都试图利用AI的力量和领域知识来解决不同的现实世界问题，例如不同场景中的对象识别或检测。然而，AI应用的发展总是远远落后于产线建设。因此，AI的价值受到这种延迟的极大限制，导致高时间成本和低容量利用率。根本原因之一是费力的数据准备，包括数据采集和数据标注。数据准备作为AI应用流程中的重要环节起着至关重要的作用，然而数据准备中的许多因素会阻碍现实的AI应用。In recent years, artificial intelligence (AI) has received widespread attention and tremendous development. Various fields, including industry, are trying to leverage the power and domain knowledge of AI to solve different real-world problems, such as object recognition or detection in different scenarios. However, the development of AI applications always lags far behind the construction of production lines. Therefore, the value of AI is greatly limited by this delay, resulting in high time costs and low capacity utilization. One of the root causes is laborious data preparation, including data acquisition and data annotation. Data preparation plays a vital role as an important link in the AI application process. However, many factors in data preparation can hinder real-life AI applications.

一个因素是缺乏特定场景的图像数据。这在工业应用中尤为明显，因为行业非常关心隐私。One factor is the lack of scene-specific image data. This is especially true in industrial applications, where privacy is of paramount concern.

另一个因素是获取标注图像数据费时费力。由于场景特定的图像数据难以轻松获取，因此需要使用不同的相机捕获图像数据。这往往会花费许多劳动力资源和大量时间。此外，在捕获图像数据后，标注也是一个问题，因为对于不同的视觉任务，人们需要手动标注图像以分辨不同物体在图像中的位置或图像中的物体是什么。实际上，标注通常花费的时间不比图像捕获过程少。Another factor is that obtaining annotated image data is time-consuming and laborious. Since scene-specific image data is difficult to obtain easily, different cameras are used to capture the image data. This often consumes many labor resources and a lot of time. In addition, annotation is also a problem after capturing image data, because for different vision tasks, people need to manually annotate images to tell where different objects are in the image or what the objects are in the image. In fact, annotation usually takes no less time than the image capture process.

又一个因素是不正确、不精确或缺失的标记。由于标注者在遇到大量图像时的粗心大意，标注者在标注过程中可能会出错、漏标注或标注不准确，这会影响训练后的AI模型的最终性能，从而阻碍AI的实际应用。Yet another factor is incorrect, imprecise or missing markup. Due to the carelessness of annotators when encountering a large number of images, annotators may make errors, miss annotations, or have inaccurate annotations during the annotation process, which will affect the final performance of the trained AI model and hinder the practical application of AI.

因此，亟需一种改进的用于获得标注图像的解决方案。Therefore, an improved solution for obtaining annotated images is urgently needed.

发明内容Contents of the invention

对于工业应用中的特定场景来说，捕获图像和标注图像是费时费力的过程。虽然已经出现了多种人工标注工具，但都没有解决如何获取图像的问题，仍然需要用户在标注上花费大量的时间。而且，由于人工标注，无法确保标注的准确性。此外，一些公司正在尝试自动获取用于训练的图像数据集，利用机器人自动捕获图像，但并没有解决图像标注问题。有鉴于此，本发明提出了用于自动生成标注图像的解决方案。For specific scenarios in industrial applications, capturing and annotating images is a time-consuming and labor-intensive process. Although a variety of manual annotation tools have emerged, they have not solved the problem of how to obtain images, and users still need to spend a lot of time on annotation. Moreover, due to manual annotation, the accuracy of the annotation cannot be ensured. In addition, some companies are trying to automatically obtain image data sets for training and use robots to automatically capture images, but this does not solve the image annotation problem. In view of this, the present invention proposes a solution for automatically generating annotated images.

本公开的第一实施例提出了一种用于自动生成标注图像的方法，该方法包括：The first embodiment of the present disclosure proposes a method for automatically generating annotated images, which method includes:

基于一个或多个CAD模型，生成三维场景模型，其中，一个或多个CAD模型中的每一个CAD模型对应于一个或多个物体中的一个；Generate a three-dimensional scene model based on one or more CAD models, where each CAD model of the one or more CAD models corresponds to one of the one or more objects;

基于三维场景模型，生成关于一个或多个物体的二维投影图像集合；Based on the three-dimensional scene model, generate a two-dimensional projection image collection about one or more objects;

对二维投影图像集合中的每个图像进行自动标注以生成与未遮挡物体相关联的标注信息，其中，未遮挡物体包括在一个或多个物体中；以及Automatically annotate each image in the set of two-dimensional projected images to generate annotation information associated with unoccluded objects, where the unoccluded objects are included in one or more objects; and

将标注信息和与一个或多个物体的产线有关的背景图像进行融合以生成标注图像。The annotation information is fused with background images related to the production line of one or more objects to generate an annotation image.

本公开的第二实施例提出了一种用于自动生成标注图像的装置，该装置包括：A second embodiment of the present disclosure proposes a device for automatically generating annotated images, which device includes:

模型单元，被配置为基于一个或多个CAD模型，生成三维场景模型，其中，一个或多个CAD模型中的每一个CAD模型对应于一个或多个物体中的一个；a model unit configured to generate a three-dimensional scene model based on one or more CAD models, where each CAD model of the one or more CAD models corresponds to one of the one or more objects;

投影单元，被配置为基于三维场景模型，生成关于一个或多个物体的二维投影图像集合；a projection unit configured to generate a set of two-dimensional projection images about one or more objects based on the three-dimensional scene model;

标注单元，被配置为对二维投影图像集合中的每个图像进行自动标注以生成与未遮挡物体相关联的标注信息，其中，未遮挡物体包括在一个或多个物体中；以及an annotation unit configured to automatically annotate each image in the two-dimensional projection image set to generate annotation information associated with an unoccluded object, wherein the unoccluded object is included in one or more objects; and

融合单元，被配置为将标注信息和与一个或多个物体的产线有关的背景图像进行融合以生成标注图像。The fusion unit is configured to fuse the annotation information and the background image related to the production line of one or more objects to generate an annotation image.

本公开的第三实施例提出了一种计算设备，该计算设备包括：处理器；以及存储器，其用于存储计算机可执行指令，当计算机可执行指令被执行时使得处理器执行第一实施例的方法。A third embodiment of the present disclosure provides a computing device, which includes: a processor; and a memory for storing computer-executable instructions, which when executed, causes the processor to execute the first embodiment Methods.

本公开的第四实施例提出了一种计算机可读存储介质，该计算机可读存储介质具有存储在其上的计算机可执行指令，计算机可执行指令用于执行第一实施例的方法。A fourth embodiment of the present disclosure provides a computer-readable storage medium having computer-executable instructions stored thereon, the computer-executable instructions being used to perform the method of the first embodiment.

本公开的第五实施例提出了一种计算机程序产品，该计算机程序产品被有形地存储在计算机可读存储介质上，并且包括计算机可执行指令，计算机可执行指令在被执行时使至少一个处理器执行第一实施例的方法。A fifth embodiment of the present disclosure proposes a computer program product that is tangibly stored on a computer-readable storage medium and includes computer-executable instructions that, when executed, cause at least one process The device executes the method of the first embodiment.

从上述方案中可以看出，由于本发明通过利用CAD模型可以自动实现图像生成和图像标注的过程，显著节省时间和人力资源，并可以容易地获得大量定制的图像数据，从而为AI应用准备充足的数据来训练AI模型。此外，图像生成和标注过程可由计算机自动执行，从而避免错误、不精确和丢失的标记。It can be seen from the above solution that because the present invention can automatically realize the process of image generation and image annotation by using CAD models, it significantly saves time and human resources, and can easily obtain a large amount of customized image data, thereby fully preparing for AI applications. data to train the AI model. Additionally, the image generation and annotation process can be automated by computers, thus avoiding errors, inaccuracies, and lost labeling.

Description of drawings

结合附图并参考以下详细说明，本公开的各实施例的特征、优点及其他方面将变得更加明显，在此以示例性而非限制性的方式示出了本公开的若干实施例，在附图中：Features, advantages, and other aspects of embodiments of the present disclosure will become more apparent with reference to the following detailed description, taken in conjunction with the accompanying drawings, several embodiments of which are shown by way of illustration and not limitation, in which In the attached picture:

图1示出了可应用于本公开的实施例中的示例性三维场景模型；Figure 1 illustrates an exemplary three-dimensional scene model applicable in embodiments of the present disclosure;

图2示出根据本公开的实施例的用于自动生成标注图像的示例性方法的流程图；2 illustrates a flowchart of an exemplary method for automatically generating annotated images according to an embodiment of the present disclosure;

图3示出了根据本公开的实施例的用于自动生成标注图像的示例性装置；3 illustrates an exemplary apparatus for automatically generating annotated images according to an embodiment of the present disclosure;

图4示出了根据本公开的实施例的用于自动生成标注图像的示例性计算设备。4 illustrates an exemplary computing device for automatically generating annotated images in accordance with embodiments of the present disclosure.

其中，附图标记如下：Among them, the reference signs are as follows:

100 三维场景模型100 3D scene models

101 CAD模型101 CAD models

102 CAD模型102 CAD models

103 CAD模型103 CAD models

200 方法200 methods

201 步骤201 steps

202 步骤202 steps

203 步骤203 steps

204 步骤204 steps

300 装置300 devices

301 模型单元301 model unit

302 投影单元302 projection unit

303 标注单元303 Label unit

304 融合单元304 Fusion Unit

400 计算设备400 computing devices

401 处理器401 processor

402 存储器402 memory

Detailed ways

以下参考附图详细描述本公开的各个示例性实施例。虽然以下所描述的示例性方法、装置包括在其它组件当中的硬件上执行的软件和/或固件，但是应当注意，这些示例仅仅是说明性的，而不应看作是限制性的。例如，考虑在硬件中独占地、在软件中独占地、或在硬件和软件的任何组合中可以实施任何或所有硬件、软件和固件组件。因此，虽然以下已经描述了示例性的方法和装置，但是本领域的技术人员应容易理解，所提供的示例并不用于限制用于实现这些方法和装置的方式。Various exemplary embodiments of the present disclosure are described in detail below with reference to the accompanying drawings. Although the exemplary methods and apparatus described below include software and/or firmware executed on hardware among other components, it should be noted that these examples are only illustrative and should not be viewed as limiting. For example, consider that any or all hardware, software and firmware components may be implemented exclusively in hardware, exclusively in software, or in any combination of hardware and software. Therefore, although exemplary methods and apparatuses have been described below, those skilled in the art will readily understand that the examples provided are not intended to limit the manner in which these methods and apparatuses may be implemented.

此外，附图中的流程图和框图示出了根据本公开的各个实施例的方法和系统的可能实现的体系架构、功能和操作。应当注意，方框中所标注的功能也可以按照不同于附图中所标注的顺序发生。例如，两个接连地表示的方框实际上可以基本并行地执行，或者它们有时也可以按照相反的顺序执行，这取决于所涉及的功能。同样应当注意的是，流程图和/或框图中的每个方框、以及流程图和/或框图中的方框的组合，可以使用执行规定的功能或操作的专用的基于硬件的系统来实现，或者可以使用专用硬件与计算机指令的组合来实现。Furthermore, the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operations of possible implementations of methods and systems according to various embodiments of the present disclosure. It should be noted that the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after the other may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved. It should also be noted that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented using special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.

本文所使用的术语“包括”、“包含”及类似术语是开放性的术语，即“包括/包含但不限于”，表示还可以包括其他内容。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”；术语“另一实施例”表示“至少一个另外的实施例”等等。The terms "including", "includes" and similar terms used herein are open-ended terms, that is, "includes/includes but is not limited to", indicating that other contents may also be included. The term "based on" means "based at least in part on." The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment" and so on.

如前所述，当前AI应用的发展总是远远落后于产线建设。一方面，现有的AI应用通常需要在产线实际投产后才可被实施以替代人工操作，例如实现产品的自动识别和分拣等。另一方面，不同于互联网上开放的图像数据集(例如，PASCAL VOC、MS COCO、ImageNet等)，工业场景图像是私密性数据，这意味着公司只能基于自身的生产线获取他们想要用来训练他们的AI模型的图像资源。这往往需要使用不同的相机捕获图像数据并通过人工对图像进行标注，花费了许多劳动力资源和大量时间，并且手工标注容易发生出错和遗漏标记，特别是处理大量图像数据时。有鉴于此，本发明提供了一种用于自动生成标注图像的解决方案以解决现有技术中的上述缺陷。As mentioned before, the current development of AI applications always lags far behind the construction of production lines. On the one hand, existing AI applications usually need to be implemented to replace manual operations after the production line is actually put into production, such as automatic identification and sorting of products. On the other hand, unlike open image data sets on the Internet (for example, PASCAL VOC, MS COCO, ImageNet, etc.), industrial scene images are private data, which means that companies can only obtain what they want to use based on their own production lines. Image resources to train their AI models. This often requires using different cameras to capture image data and annotating the images manually, which consumes a lot of labor resources and a lot of time, and manual annotation is prone to errors and missed tags, especially when processing large amounts of image data. In view of this, the present invention provides a solution for automatically generating annotation images to solve the above-mentioned deficiencies in the prior art.

图1示出了可应用于本公开的实施例中的示例性三维场景模型100。三维场景模型100可以基于一个或多个CAD模型来生成，其中每个CAD模型分别对应于一个或多个物体(例如，零件、部件、子组件、组件、设备等)中的一个。例如，如图1所示，可以基于三个CAD模型101、102和103来生成三维场景模型，其中，每个CAD模型101～103分别对应于三个物体中的一个。三个CAD模型101～103可以包括至少两个相同的CAD模型或者包括完全不同的CAD模型。应当理解，图1中的CAD模型的数量和形状仅是示例性的，而非限制，三维场景模型100可以由更多或更少的相同或不同的CAD模型来生成。例如，可以通过对单个CAD模型进行复制来生成三维场景模型100，或者可以通过多个不同的CAD模型来生成三维场景模型100。在一些示例中，三维场景模型100可以对应于产品、产线的不同阶段(例如，对零件、部件、子组件、组件、设备等的不同处理过程)等等。FIG. 1 illustrates an exemplary three-dimensional scene model 100 applicable in embodiments of the present disclosure. The three-dimensional scene model 100 may be generated based on one or more CAD models, where each CAD model respectively corresponds to one of one or more objects (eg, parts, components, subassemblies, assemblies, equipment, etc.). For example, as shown in Figure 1, a three-dimensional scene model can be generated based on three CAD models 101, 102 and 103, where each CAD model 101-103 respectively corresponds to one of the three objects. The three CAD models 101 to 103 may include at least two identical CAD models or completely different CAD models. It should be understood that the number and shape of the CAD models in FIG. 1 are only exemplary and not limiting, and the three-dimensional scene model 100 can be generated by more or less of the same or different CAD models. For example, the three-dimensional scene model 100 may be generated by copying a single CAD model, or the three-dimensional scene model 100 may be generated by multiple different CAD models. In some examples, the three-dimensional scene model 100 may correspond to products, different stages of a production line (eg, different processes for parts, components, subassemblies, assemblies, equipment, etc.), and the like.

图2示出了根据公开的实施例的用于自动生成标注图像的示例性方法200的流程图。2 illustrates a flowchart of an exemplary method 200 for automatically generating annotated images in accordance with disclosed embodiments.

参考图2，方法200从步骤201开始。在步骤201中，基于一个或多个CAD模型，生成三维场景模型，其中，该一个或多个CAD模型中的每一个CAD模型对应于一个或多个物体中的一个。在该步骤中，例如，可以在设计物体(例如，产品，诸如零件、部件、子组件、组件、设备等)的CAD模型后直接生成三维场景模型，而无需等待产线的实际投产，因此可以伴随整个产品生命周期来容易构建AI模型的应用场景。Referring to Figure 2, method 200 begins with step 201. In step 201, a three-dimensional scene model is generated based on one or more CAD models, where each CAD model of the one or more CAD models corresponds to one of the one or more objects. In this step, for example, the three-dimensional scene model can be generated directly after designing the CAD model of the object (for example, product, such as parts, components, subassemblies, components, equipment, etc.) without waiting for the actual production of the production line, so it can It is easy to build application scenarios of AI models along with the entire product life cycle.

接着，方法200行进到步骤202。在步骤202中，基于三维场景模型，生成关于一个或多个物体的二维投影图像集合。在该步骤中，三维场景模型与一个或多个物体相关，通过投影来获得二维平面图像，其中，一个或多个物体在成像平面的投影可能相交或不相交。Next, method 200 proceeds to step 202. In step 202, based on the three-dimensional scene model, a two-dimensional projection image set about one or more objects is generated. In this step, the three-dimensional scene model is related to one or more objects, and a two-dimensional plane image is obtained through projection, where the projections of one or more objects on the imaging plane may or may not intersect.

接着，方法200行进到步骤203。在步骤203中，对二维投影图像集合中的每个图像进行自动标注以生成与未遮挡物体相关联的标注信息，其中，该未遮挡物体包括在一个或多个物体中。在该步骤中，由于一个或多个物体在成像平面的投影可能相交或不相交，对于平面图像来说，未遮挡物体可被完全识别和标注。例如，标注信息可以是各种类型，例如类别标注信息、边界框(bounding box)标注信息、多边形标注信息(或掩膜(mask)信息)、线标注信息、长方体标注信息、关键点标注信息等。Next, method 200 proceeds to step 203. In step 203, each image in the two-dimensional projection image set is automatically annotated to generate annotation information associated with an unoccluded object, where the unoccluded object is included in one or more objects. In this step, since the projections of one or more objects on the imaging plane may or may not intersect, unoccluded objects can be fully identified and annotated for planar images. For example, the annotation information can be of various types, such as category annotation information, bounding box annotation information, polygon annotation information (or mask information), line annotation information, cuboid annotation information, key point annotation information, etc. .

接着，方法200行进到步骤204。在步骤204中，将标注信息和与一个或多个物体的产线有关的背景图像进行融合以生成标注图像。在该步骤中，通过将标注信息和涉及一个或多个物体的产线有关的背景图像进行融合可以容易地扩充图像数据集合，从而获得各种标注图像集合以用于AI模型的训练。例如，涉及一个或多个物体的产线有关的背景图像可以是事先捕获的各种真实场景图像(例如，车间、操作台等)。Next, method 200 proceeds to step 204. In step 204, the annotation information and the background image related to the production line of one or more objects are fused to generate an annotation image. In this step, the image data set can be easily expanded by fusing the annotation information with background images related to the production line involving one or more objects, thereby obtaining various annotation image sets for training of the AI model. For example, the background image related to the production line involving one or more objects may be various real scene images captured in advance (for example, workshop, operation platform, etc.).

不同于现有技术中从真实场景捕获图像并对图像进行标注的方案，方法200通过利用CAD模型自动实现图像生成和标注的过程，自动实现图像生成和标注的过程。此外，该方法可由计算机自动执行，从而避免错误、不精确和丢失的标记，即使图像数据集合的规模非常大时。Different from the prior art solution of capturing an image from a real scene and annotating the image, the method 200 automatically realizes the image generation and annotation process by using a CAD model to automatically realize the image generation and annotation process. Furthermore, the method can be automatically performed by a computer, thus avoiding errors, inaccuracies, and missing labels, even when the size of the image data collection is very large.

在一些实施例中，步骤201可以包括：将一个或多个CAD模型导入三维场景模型，其中，一个或多个CAD模型中的每一个CAD模型包括模型名称信息、结构信息和材料信息。在该步骤中，通过导入一个或多个CAD模型来构建三维场景模型，同时物体的每个CAD模型包括模型名称信息(例如，用于区别每个CAD模型)、结构信息(例如，几何结构，包括点、线、面信息等)和材料信息(例如，材质，以使呈现的模型逼近真实物体)。In some embodiments, step 201 may include: importing one or more CAD models into the three-dimensional scene model, where each CAD model of the one or more CAD models includes model name information, structure information, and material information. In this step, a three-dimensional scene model is constructed by importing one or more CAD models. At the same time, each CAD model of the object includes model name information (for example, used to distinguish each CAD model), structural information (for example, geometric structure, Including point, line, surface information, etc.) and material information (for example, material, so that the presented model approximates the real object).

在一些实施例中，步骤202可以包括：加载包括模型名称信息、结构信息和材料信息的三维场景模型；基于不同的渲染配置参数对所加载的三维场景模型进行渲染以获得三维渲染模型集合；对三维渲染模型集合中的每个模型进行投影以生成关于一个或多个物体的二维投影图像集合。在该步骤中，可以将三维场景模型加载到渲染引擎，基于结构信息和材料信息来实现三维场景模型的多边形网格化(例如，三角网格或其他多边形网格等)，并由渲染引擎基于不同的渲染配置参数(或者说渲染参数)对三维场景模型进行渲染以获得三维渲染模型集合。例如，可以由视点从不同角度对三维渲染模型集合中的每个模型进行投影以生成关于一个或多个物体的二维投影图像集合。在一些示例中，可以基于投影变换矩阵来实现三维渲染模型到成像平面的二维投影图像的变换。In some embodiments, step 202 may include: loading a three-dimensional scene model including model name information, structure information, and material information; rendering the loaded three-dimensional scene model based on different rendering configuration parameters to obtain a three-dimensional rendering model set; Each model in the collection of three-dimensional rendering models is projected to generate a collection of two-dimensional projected images about one or more objects. In this step, the three-dimensional scene model can be loaded into the rendering engine, and the polygon meshing of the three-dimensional scene model (for example, triangular mesh or other polygon mesh, etc.) is implemented based on the structural information and material information, and the rendering engine is based on Different rendering configuration parameters (or rendering parameters) render the three-dimensional scene model to obtain a three-dimensional rendering model set. For example, each model in a set of three-dimensional rendering models can be projected from different angles by a viewpoint to generate a set of two-dimensional projected images about one or more objects. In some examples, the transformation of the three-dimensional rendering model into the two-dimensional projection image of the imaging plane can be implemented based on the projection transformation matrix.

在一些实施例中，渲染配置参数可以包括以下中的至少一个：光照参数、深度参数、姿态参数、或纹理参数。例如，光照参数可以包括光源类型、光源位置、光照方向、光照强度等。例如，深度参数是指图像中每个像素点离视点的距离。例如，姿态参数可以包括平移参数(例如，相对于x、y、z轴的平移距离)和旋转参数(例如，相对于x、y、z轴的旋转角度)。例如，纹理参数可以是指用于描述物体表面的外观信息，例如颜色信息、法线、粗糙度等。基于渲染配置参数来渲染的三维场景模型可以获得各种参数下的逼近现实场景中物体的效果，以容易地获得大量的渲染模型。In some embodiments, the rendering configuration parameters may include at least one of: lighting parameters, depth parameters, pose parameters, or texture parameters. For example, lighting parameters may include light source type, light source position, light direction, light intensity, etc. For example, the depth parameter refers to the distance of each pixel in the image from the viewpoint. For example, the posture parameters may include translation parameters (eg, translation distances relative to the x, y, and z axes) and rotation parameters (eg, rotation angles relative to the x, y, and z axes). For example, texture parameters may refer to appearance information used to describe the surface of an object, such as color information, normals, roughness, etc. The three-dimensional scene model rendered based on the rendering configuration parameters can obtain the effect of approximating the objects in the real scene under various parameters, so as to easily obtain a large number of rendering models.

在一些实施例中，步骤203可以包括：确定每个图像中的一个或多个投影区域，该一个或多个投影区域中的每一个对应于一个或多个CAD模型中的一个，该一个或多个投影区域具有相应的深度值；确定一个或多个投影区域中与未遮挡物体相关联的投影区域，其中，所述未遮挡物体包括在一个或多个物体中；根据视觉任务类型，基于元数据对每个图像中与未遮挡物体相关联的投影区域进行自动标注以生成与未遮挡物体相关联的标注信息，该元数据包括CAD模型的信息和/或渲染配置参数中的一个或多个。例如，当以不同角度从视点对包括一个或多个CAD模型的三维场景模型进行投影时，由于深度不同，这些投影区域有的会被部分遮挡或完全遮挡，通常未遮挡物体的投影区域可被完全识别和标注。对于图像标注需求来说，可能存在不同的视觉标注任务(例如，识别、检测、实例分割、姿态估计等)，不同的视觉任务需要例如(从渲染引擎)获得各种元数据，包括CAD模型的信息(例如，CAD模型的名称信息)和/或渲染配置参数(例如，深度、姿态等)中的一个或多个。例如，对于识别而言，元数据至少包括CAD模型的名称信息；对于检测/实例分割/姿态估计任务来说，除了CAD模型的名称信息，三维场景模型的投影之外，还需要渲染配置参数中的深度信息和姿态信息。In some embodiments, step 203 may include determining one or more projection areas in each image, each of the one or more projection areas corresponding to one of one or more CAD models, the one or Multiple projection areas have corresponding depth values; determine a projection area associated with an unoccluded object in one or more projection areas, wherein the unoccluded object is included in one or more objects; according to the vision task type, based on The metadata automatically annotates the projected area associated with the unoccluded object in each image to generate annotation information associated with the unoccluded object. The metadata includes information about the CAD model and/or one or more of the rendering configuration parameters. indivual. For example, when a three-dimensional scene model including one or more CAD models is projected from the viewpoint at different angles, some of these projected areas will be partially or completely blocked due to different depths. Usually, the projected areas of unoccluded objects can be Fully identified and labeled. For image annotation requirements, there may be different visual annotation tasks (e.g., recognition, detection, instance segmentation, pose estimation, etc.). Different visual tasks require, for example, various metadata (from the rendering engine), including CAD model One or more of the information (eg, name information of the CAD model) and/or rendering configuration parameters (eg, depth, pose, etc.). For example, for recognition, metadata at least includes the name information of the CAD model; for detection/instance segmentation/pose estimation tasks, in addition to the name information of the CAD model and the projection of the three-dimensional scene model, the rendering configuration parameters also need to be included. depth information and attitude information.

在一些实施例中，步骤203可以进一步包括：确定一个或多个投影区域是否存在一个或多个相交部分；如果存在一个或多个相交部分，则针对与多个投影区域相关联的每一个相交部分，保留多个投影区域中具有最小深度值的投影区域，并去除多个投影区域中的其余投影区域，以生成与未遮挡物体相关联的投影区域；或者如果不存在一个或多个相交部分，则将所述一个或多个投影区域确定为与未遮挡物体相关联的投影区域。在该步骤中，由于不同投影区域的像素点具有不同的深度值，具有最小深度值的投影区域距离视点最近，可被识别为未被遮挡。In some embodiments, step 203 may further include: determining whether one or more intersection portions exist for one or more projection areas; and if one or more intersection portions exist, for each intersection associated with the plurality of projection areas part, retaining the projection area with the smallest depth value among the multiple projection areas and removing the remaining projection areas from the multiple projection areas to generate a projection area associated with the unoccluded object; or if one or more intersecting parts do not exist , then the one or more projection areas are determined as projection areas associated with the unoccluded object. In this step, since pixels in different projection areas have different depth values, the projection area with the smallest depth value is closest to the viewpoint and can be identified as unoccluded.

在一些实施例中，步骤203可以进一步包括：根据视觉任务类型，选择视觉处理算法；利用视觉处理算法，基于元数据确定与未遮挡物体相关联的投影区域的特征信息以生成标注信息。在该步骤中，对于图像标注需求来说，可能存在不同的视觉标注任务，因此需要确定合适的视觉处理算法，例如可以从开源的计算机视觉库提供的函数中进行选择，这些函数可以高效地实现了各种计算机视觉算法，并且利用所选择的函数和元数据来确定与未遮挡物体相关联的投影区域的特征信息以生成标注信息。In some embodiments, step 203 may further include: selecting a visual processing algorithm according to the type of visual task; using the visual processing algorithm to determine characteristic information of the projection area associated with the unoccluded object based on metadata to generate annotation information. In this step, for image annotation needs, there may be different visual annotation tasks, so it is necessary to determine a suitable visual processing algorithm. For example, you can choose from the functions provided by open source computer vision libraries, which can be implemented efficiently. Various computer vision algorithms are used, and the selected functions and metadata are used to determine the characteristic information of the projection area associated with the unoccluded object to generate annotation information.

在一些实施例中，步骤203可以进一步包括：确定与未遮挡物体相关联的投影区域的轮廓点信息作为特征信息；基于该轮廓点信息生成与未遮挡物体相关联的边界框信息和/或掩膜信息作为标注信息。在该步骤中，对于检测和实例分割的视觉任务类型来说，可以提取与未遮挡物体相关联的投影区域的轮廓点信息，并且例如基于轮廓点在坐标系中的最左上角和最右下角位置的坐标来确定边界框信息，以及例如根据轮廓点对投影区域进行掩膜以获得掩膜信息。In some embodiments, step 203 may further include: determining contour point information of the projection area associated with the unoccluded object as feature information; generating bounding box information and/or mask associated with the unoccluded object based on the contour point information. Membrane information is used as annotation information. In this step, for vision task types of detection and instance segmentation, the contour point information of the projection area associated with the unoccluded object can be extracted and, for example, based on the upper-left corner and the lower-right corner of the contour point in the coordinate system The coordinates of the location are used to determine the bounding box information, and the projection area is masked, for example, based on the contour points to obtain the mask information.

在一些实施例中，步骤204可以进一步包括：从二维投影图像集合中的每个图像中提取与未遮挡物体相关联的投影区域；将与未遮挡物体相关联的投影区域作为前景和与所述一个或多个物体的产线有关的背景图像进行融合以获得融合图像；在融合图像上叠加标注信息以生成标注图像。在该步骤中，通过获取未遮挡物体的投影区域，可以容易地将其与涉及一个或多个物体的产线有关的背景图像(例如，事先捕获的各种真实场景图像(例如，车间、操作台等)等)进行融合来获得扩充的图像数据集合。In some embodiments, step 204 may further include: extracting the projection area associated with the unoccluded object from each image in the two-dimensional projection image set; using the projection area associated with the unoccluded object as the foreground and the projection area associated with the unoccluded object. The background images related to the production line of one or more objects are fused to obtain a fused image; the annotation information is superimposed on the fused image to generate an annotation image. In this step, by obtaining the projection area of the unoccluded object, it can be easily compared with the background image related to the production line involving one or more objects (e.g., various real scene images captured in advance (e.g., workshop, operation Taiwan, etc.) are fused to obtain an expanded image data set.

在一些实施例中，标注图像用于训练与一个或多个物体的产线相关联的AI应用模型。如前所述，与一个或多个物体的产线相关联的AI应用模型可以包括例如物体检测、物体识别和物体分拣等，通过融合获得的带有标注信息的大量标注图像(例如，将掩膜的物体CAD模型与背景图像融合，使得背景图像中看起来真实存在该物体)可以作为准备的图像训练数据来训练AI应用模型。In some embodiments, the annotated images are used to train an AI application model associated with the production line of one or more objects. As mentioned before, the AI application model associated with the production line of one or more objects can include, for example, object detection, object recognition, object sorting, etc., by fusing a large number of annotated images with annotation information (for example, The masked object CAD model is fused with the background image so that the object appears to actually exist in the background image) can be used as prepared image training data to train the AI application model.

在一些实施例中，一个或多个CAD模型可以包括至少两个相同的CAD 模型或者包括完全不同的CAD模型。例如，可以构建不同的三维场景模型，在一些情况下，可以是包括多个同类别物体的场景，而在一些情况下，可以是包括多个不同类别物体的场景。In some embodiments, one or more CAD models may include at least two identical CAD models or may include completely different CAD models. For example, different three-dimensional scene models can be constructed. In some cases, they can be scenes that include multiple objects of the same category, and in some cases, they can be scenes that include multiple objects of different categories.

图3示出了根据本公开实施例的用于自动生成标注图像的示例性装置300。FIG. 3 illustrates an exemplary apparatus 300 for automatically generating annotated images according to an embodiment of the present disclosure.

参考图3，装置300包括模型单元301、投影单元302、标注单元303和融合单元304。Referring to FIG. 3 , the device 300 includes a model unit 301 , a projection unit 302 , an annotation unit 303 and a fusion unit 304 .

模型单元301被配置为基于一个或多个CAD模型，生成三维场景模型，其中，一个或多个CAD模型中的每一个CAD模型对应于一个或多个物体中的一个。The model unit 301 is configured to generate a three-dimensional scene model based on one or more CAD models, where each CAD model of the one or more CAD models corresponds to one of the one or more objects.

投影单元302被配置为基于所述三维场景模型，生成关于一个或多个物体的二维投影图像集合。The projection unit 302 is configured to generate a set of two-dimensional projection images about one or more objects based on the three-dimensional scene model.

标注单元303被配置为对二维投影图像集合中的每个图像进行自动标注以生成与未遮挡物体相关联的标注信息，其中，未遮挡物体包括在所述一个或多个物体中。The annotation unit 303 is configured to automatically annotate each image in the two-dimensional projection image set to generate annotation information associated with unoccluded objects, wherein the unoccluded objects are included in the one or more objects.

融合单元304被配置为将标注信息和与一个或多个物体的产线有关的背景图像进行融合以生成标注图像。The fusion unit 304 is configured to fuse the annotation information and the background image related to the production line of one or more objects to generate an annotation image.

在一些实施例中，模型单元301可以被配置为：将一个或多个CAD模型导入三维场景模型，其中，一个或多个CAD模型中的每一个CAD模型包括模型名称信息、结构信息和材料信息。In some embodiments, the model unit 301 may be configured to import one or more CAD models into the three-dimensional scene model, where each CAD model of the one or more CAD models includes model name information, structure information, and material information. .

在一些实施例中，投影单元302可以被配置为：加载包括模型名称信息、结构信息和材料信息的三维场景模型；基于不同的渲染配置参数对所加载的三维场景模型进行渲染以获得三维渲染模型集合；对三维渲染模型集合中的每个模型进行投影以生成关于一个或多个物体的二维投影图像集合。In some embodiments, the projection unit 302 may be configured to: load a three-dimensional scene model including model name information, structure information, and material information; render the loaded three-dimensional scene model based on different rendering configuration parameters to obtain a three-dimensional rendering model Collection; projects each model in a collection of 3D rendering models to produce a collection of 2D projected images about one or more objects.

在一些实施例中，渲染配置参数可以包括以下中的至少一个：光照参数、深度参数、姿态参数、或纹理参数。In some embodiments, the rendering configuration parameters may include at least one of: lighting parameters, depth parameters, pose parameters, or texture parameters.

在一些实施例中，标注单元303可以被配置为：确定每个图像中的一个或多个投影区域，一个或多个投影区域中的每一个对应于一个或多个CAD模型中的一个，一个或多个投影区域具有相应的深度值；确定一个或多个投影区域中与未遮挡物体相关联的投影区域，其中，未遮挡物体包括在一个或多个物体中；根据视觉任务类型，基于元数据对每个图像中与未遮挡物体相关联的投影区域进行自动标注以生成与未遮挡物体相关联的标注信息，元数据包括CAD模型的信息和/或渲染配置参数中的一个或多个。In some embodiments, the annotation unit 303 may be configured to: determine one or more projection areas in each image, each of the one or more projection areas corresponding to one of the one or more CAD models, a or multiple projection areas have corresponding depth values; determine the projection area associated with the unoccluded object in one or more projection areas, where the unoccluded object is included in one or more objects; according to the vision task type, based on the element The data automatically annotates the projection area associated with the unoccluded object in each image to generate annotation information associated with the unoccluded object, and the metadata includes information of the CAD model and/or one or more of the rendering configuration parameters.

在一些实施例中，标注单元303可以被进一步配置为：确定一个或多个投影区域是否存在一个或多个相交部分；如果存在一个或多个相交部分，则针对与多个投影区域相关联的每一个相交部分，保留多个投影区域中具有最小深度值的投影区域，并去除多个投影区域中的其余投影区域，以生成与未遮挡物体相关联的投影区域；或者如果不存在一个或多个相交部分，则将一个或多个投影区域确定为与未遮挡物体相关联的投影区域。In some embodiments, the annotation unit 303 may be further configured to: determine whether one or more intersection portions exist in one or more projection areas; if there are one or more intersection portions, for For each intersection, retain the projection area with the smallest depth value among the multiple projection areas and remove the remaining projection areas from the multiple projection areas to generate a projection area associated with the unoccluded object; or if one or more projection areas are not present intersecting parts, one or more projection areas are determined as projection areas associated with the unoccluded object.

在一些实施例中，标注单元303可以被进一步配置为：根据视觉任务类型，选择视觉处理算法；利用视觉处理算法，基于元数据确定与未遮挡物体相关联的投影区域的特征信息以生成标注信息。In some embodiments, the annotation unit 303 may be further configured to: select a visual processing algorithm according to the type of visual task; use the visual processing algorithm to determine the characteristic information of the projection area associated with the unoccluded object based on metadata to generate annotation information .

在一些实施例中，标注单元303可以被进一步配置为：确定与未遮挡物体相关联的投影区域的轮廓点信息作为特征信息；基于轮廓点信息生成与未遮挡物体相关联的边界框信息和/或掩膜信息作为标注信息。In some embodiments, the annotation unit 303 may be further configured to: determine contour point information of the projection area associated with the unoccluded object as feature information; generate bounding box information associated with the unoccluded object based on the contour point information and/ Or mask information as annotation information.

在一些实施例中，融合单元304可以被配置为：从二维投影图像集合中的每个图像中提取与未遮挡物体相关联的投影区域；将与未遮挡物体相关联的投影区域作为前景和与一个或多个物体的产线有关的背景图像进行融合以获得融合图像；在融合图像上叠加标注信息以生成标注图像。In some embodiments, the fusion unit 304 may be configured to: extract the projection area associated with the unoccluded object from each image in the two-dimensional projection image set; use the projection area associated with the unoccluded object as the foreground and Background images related to the production lines of one or more objects are fused to obtain a fused image; annotation information is superimposed on the fused image to generate an annotation image.

在一些实施例中，标注图像用于训练与一个或多个物体的产线相关联的AI应用模型。In some embodiments, the annotated images are used to train an AI application model associated with the production line of one or more objects.

此外，模型单元301还可以被配置为执行如上文关于方法200中的步骤201描述的其他过程，投影单元302还可以被配置为执行如上文关于方法200中的步骤202描述的其他过程，标注单元303还可以被配置为执行如上文关于方法200中的步骤203描述的其他过程，以及融合单元304还可以被配置为执行如上文关于方法200中的步骤204描述的其他过程。In addition, the model unit 301 may also be configured to perform other processes as described above with respect to step 201 in the method 200, the projection unit 302 may also be configured to perform other processes as described above with respect to step 202 in the method 200, the annotation unit 303 may also be configured to perform other processes as described above with respect to step 203 in method 200, and fusion unit 304 may also be configured to perform other processes as described above with respect to step 204 in method 200.

图4出了根据本公开的实施例的用于监控封闭空间环境的示例性计算设备400的框图。计算设备400包括处理器401和与处理器401耦合的存储器402。存储器402用于存储计算机可执行指令，当计算机可执行指令被执行时使得处理器401执行以上实施例中的方法(例如，前述的方法200的任何一个或多个步骤)。4 illustrates a block diagram of an exemplary computing device 400 for monitoring an enclosed space environment, in accordance with embodiments of the present disclosure. Computing device 400 includes a processor 401 and a memory 402 coupled to processor 401 . The memory 402 is used to store computer-executable instructions. When executed, the computer-executable instructions cause the processor 401 to perform the methods in the above embodiments (for example, any one or more steps of the aforementioned method 200).

与现有技术相比，根据如上所述的本公开的实施例的自动生成标注图像的解决方案可以具有如下优点：第一、产品设计后可以直接进行图像生成和标注过程，因此训练好的AI模型可以贯穿整个产品生命周期；第二、图像生成和标记过程可以应用于每个生产阶段，如零件级、组件级、子组件或组件级产品；第三、图像生成和标注过程具有高的自动化程度，将大大节省时间和人力资源，可以显著加快大规模定制的实现；第四、图像生成和标注过程是自动的，因此图像数据集合的大小可以自由扩展，这可以大大提高AI模型的性能；第五、图像生成和标注过程由计算机进行，因此即使图像数据集合很大，也可以避免错误、不精确和丢失的标记。Compared with the existing technology, the solution for automatically generating annotated images according to the embodiments of the present disclosure as described above can have the following advantages: First, the image generation and annotation process can be performed directly after product design, so the trained AI The model can run through the entire product life cycle; second, the image generation and labeling process can be applied to each production stage, such as part-level, component-level, sub-assembly or component-level products; third, the image generation and labeling process has a high degree of automation degree, will greatly save time and human resources, and can significantly speed up the implementation of mass customization; fourth, the image generation and annotation process is automatic, so the size of the image data collection can be freely expanded, which can greatly improve the performance of the AI model; Fifth, the image generation and annotation process is performed by computers, so errors, inaccuracies, and missing labels can be avoided even for large image data sets.

不同于各种人工标注工具，如果用户使用本公开的解决方案自动生成标注图像进行训练，只需要加载CAD模型，配置渲染参数，设置自己想要的背景图像，就可以自动生成带有标注信息的图像，可以用来训练针对不同行业场景(例如，智能生产等)的不同AI模型。例如，在获得产品设计后，无需任何实际生产数据，即可自动生成用于AI模型训练的图像数据集，这些合成的生产数据可以在生产和制造阶段之前加速各种业务场景的实现，例如产品设计、产品验证、制造工艺、生产计划管理、供应链管理、产线规划、产线设计、产品维护等等。Different from various manual annotation tools, if the user uses the disclosed solution to automatically generate annotated images for training, he only needs to load the CAD model, configure the rendering parameters, and set the background image he wants, and then he can automatically generate annotated images with annotated information. Images can be used to train different AI models for different industry scenarios (for example, smart production, etc.). For example, after obtaining the product design, image data sets for AI model training can be automatically generated without any actual production data. These synthetic production data can accelerate the realization of various business scenarios before the production and manufacturing stages, such as product Design, product verification, manufacturing process, production plan management, supply chain management, production line planning, production line design, product maintenance, etc.

此外，替代地，上述方法能够通过计算机可读存储介质来实现。计算机可读存储介质上载有用于执行本公开的各个实施例的计算机可读程序指令。计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是但不限于电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括：便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身，诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如，通过光纤电缆的光脉冲)、或者通过电线传输的电信号。Furthermore, alternatively, the above method can be implemented by a computer-readable storage medium. Computer-readable storage media carries thereon computer-readable program instructions for executing various embodiments of the present disclosure. Computer-readable storage media may be tangible devices that can retain and store instructions for use by an instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the above. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) or Flash memory), Static Random Access Memory (SRAM), Compact Disk Read Only Memory (CD-ROM), Digital Versatile Disk (DVD), Memory Stick, Floppy Disk, Mechanical Coding Device, such as a printer with instructions stored on it. Protruding structures in hole cards or grooves, and any suitable combination of the above. As used herein, computer-readable storage media are not to be construed as transient signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or through electrical wires. transmitted electrical signals.

因此，在另一个实施例中，本公开提出了一种计算机可读存储介质，该计算机可读存储介质具有存储在其上的计算机可执行指令，计算机可执行指令用于执行本公开的各个实施例中的方法。Accordingly, in another embodiment, the present disclosure provides a computer-readable storage medium having computer-executable instructions stored thereon for performing various implementations of the disclosure. method in the example.

在另一个实施例中，本公开提出了一种计算机程序产品，该计算机程序产品被有形地存储在计算机可读存储介质上，并且包括计算机可执行指令，该计算机可执行指令在被执行时使至少一个处理器执行本公开的各个实施例中的方法。In another embodiment, the present disclosure provides a computer program product tangibly stored on a computer-readable storage medium and including computer-executable instructions that, when executed, cause At least one processor performs methods in various embodiments of the present disclosure.

一般而言，本公开的各个示例实施例可以在硬件或专用电路、软件、固件、逻辑，或其任何组合中实施。某些方面可以在硬件中实施，而其他方面可以在可以由控制器、微处理器或其他计算设备执行的固件或软件中实施。当本公开的实施例的各方面被图示或描述为框图、流程图或使用某些其他图形表示时，将理解此处描述的方框、装置、系统、技术或方法可以作为非限制性的示例在硬件、软件、固件、专用电路或逻辑、通用硬件或控制器或其他计算设备，或其某些组合中实施。Generally speaking, various example embodiments of the present disclosure may be implemented in hardware or special purpose circuits, software, firmware, logic, or any combination thereof. Some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor, or other computing device. While aspects of embodiments of the present disclosure are illustrated or described as block diagrams, flowcharts, or using some other graphical representation, it will be understood that the blocks, devices, systems, techniques, or methods described herein may be used as non-limiting Examples are implemented in hardware, software, firmware, special purpose circuitry or logic, general purpose hardware or controllers, or other computing devices, or some combination thereof.

用于执行本公开的各个实施例的计算机可读程序指令或者计算机程序产品也能够存储在云端，在需要调用时，用户能够通过移动互联网、固网或者其他网络访问存储在云端上的用于执行本公开的一个实施例的计算机可读程序指令，从而实施依据本公开的各个实施例所公开的技术方案。Computer-readable program instructions or computer program products used to execute various embodiments of the present disclosure can also be stored in the cloud. When needed to be called, the user can access the instructions stored in the cloud for execution through the mobile Internet, fixed network or other networks. The computer-readable program instructions of an embodiment of the present disclosure are used to implement the technical solutions disclosed according to various embodiments of the present disclosure.

虽然已经参考若干具体实施例描述了本公开的实施例，但是应当理解，本公开的实施例并不限于所公开的具体实施例。本公开的实施例旨在涵盖在所附权利要求的精神和范围内所包括的各种修改和等同布置。权利要求的范围符合最宽泛的解释，从而包含所有这样的修改及等同结构和功能。While embodiments of the present disclosure have been described with reference to a number of specific embodiments, it is to be understood that embodiments of the present disclosure are not limited to the specific embodiments disclosed. The disclosed embodiments are intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims. The scope of the claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.

Claims

A method for automatically generating an annotated image, the method comprising the steps of:

generating a three-dimensional scene model based on one or more CAD models, wherein each of the one or more CAD models corresponds to one of the one or more objects;

generating a set of two-dimensional projection images for the one or more objects based on the three-dimensional scene model;

automatically labeling each image in the set of two-dimensional projection images to generate labeling information associated with a non-occluded object, wherein the non-occluded object is included in the one or more objects; and

the annotation information is fused with a background image associated with the line of production of the one or more objects to generate an annotation image.
The method of claim 1, wherein generating the three-dimensional scene model comprises:

and importing the one or more CAD models into the three-dimensional scene model, wherein each of the one or more CAD models includes model name information, structure information, and material information.
The method of claim 1 or 2, wherein generating a set of two-dimensional projection images for the one or more objects comprises:

loading a three-dimensional scene model comprising model name information, structure information and material information;

rendering the loaded three-dimensional scene model based on different rendering configuration parameters to obtain a three-dimensional rendering model set;

each model in the set of three-dimensional rendering models is projected to generate a set of two-dimensional projection images for the one or more objects.
A method according to claim 3, wherein the rendering configuration parameters comprise at least one of: illumination parameters, depth parameters, pose parameters, or texture parameters.
The method of claim 1, wherein automatically labeling each image in the set of two-dimensional projection images to generate labeling information associated with a non-occluding object comprises:

determining one or more projection areas in each image, each of the one or more projection areas corresponding to one of the one or more CAD models, the one or more projection areas having respective depth values;

determining a projected area of the one or more projected areas associated with a non-occluding object, wherein the non-occluding object is included in the one or more objects;

according to the visual task type, the projected areas associated with the non-occluded object in each image are automatically annotated based on metadata including one or more of information of the CAD model and/or rendering configuration parameters to generate annotation information associated with the non-occluded object.
The method of claim 5, wherein determining a projected area of the one or more projected areas associated with a non-occluding object comprises:

determining whether one or more intersection portions exist for the one or more projection areas;

if the one or more intersections exist, reserving a projection region of the plurality of projection regions having a smallest depth value for each intersection associated with the plurality of projection regions, and removing remaining projection regions of the plurality of projection regions to generate a projection region associated with the non-occluding object; or alternatively

If the one or more intersections are not present, the one or more projection areas are determined to be projection areas associated with a non-occluding object.
The method of claim 5 or 6, wherein automatically labeling the projected area associated with the non-occluded object in each image based on metadata to generate labeling information associated with the non-occluded object according to the visual task type comprises:

selecting a visual processing algorithm according to the visual task type;

feature information of a projection region associated with the non-occluded object is determined based on the metadata to generate the annotation information using the vision processing algorithm.
The method of claim 7, wherein determining characteristic information of a projection region associated with a non-occluding object to generate annotation information comprises:

determining contour point information of a projection area associated with a non-occluded object as characteristic information;

and generating boundary box information and/or mask information associated with the non-occluded object based on the contour point information as the labeling information.
The method of claim 7, wherein fusing the annotation information with a background image related to a line of production of the one or more objects to generate an annotation image comprises:

extracting a projection region associated with the non-occluding object from each image in the set of two-dimensional projection images;

fusing a projection area associated with the non-occluded object as a foreground and a background image related to a production line of the one or more objects to obtain a fused image;

and superposing the annotation information on the fusion image to generate the annotation image.
The method of claim 1 or 9, wherein the annotation image is used to train an AI application model associated with a production line of the one or more objects.
The method of claim 1, wherein the one or more CAD models comprise at least two identical CAD models or comprise disparate CAD models.
An apparatus for automatically generating an annotation image, the apparatus comprising:

a model unit configured to generate a three-dimensional scene model based on one or more CAD models, wherein each of the one or more CAD models corresponds to one of the one or more objects;

a projection unit configured to generate a set of two-dimensional projection images for the one or more objects based on the three-dimensional scene model;

an annotating unit configured to automatically annotate each image in the set of two-dimensional projection images to generate annotation information associated with a non-occluded object, wherein the non-occluded object is included in the one or more objects; and

and a fusion unit configured to fuse the labeling information with a background image related to the production line of the one or more objects to generate a labeling image.
The apparatus of claim 12, wherein the model unit is configured to:

and importing the one or more CAD models into the three-dimensional scene model, wherein each of the one or more CAD models includes model name information, structure information, and material information.
The apparatus of claim 12 or 13, wherein the projection unit is configured to:

loading a three-dimensional scene model comprising model name information, structure information and material information;

rendering the loaded three-dimensional scene model based on different rendering configuration parameters to obtain a three-dimensional rendering model set;

each model in the set of three-dimensional rendering models is projected to generate a set of two-dimensional projection images for the one or more objects.
The apparatus of claim 14, wherein the rendering configuration parameters comprise at least one of: illumination parameters, depth parameters, pose parameters, or texture parameters.
The apparatus of claim 12, wherein the labeling unit is configured to:

determining one or more projection areas in each image, each of the one or more projection areas corresponding to one of the one or more CAD models, the one or more projection areas having respective depth values;

determining a projected area of the one or more projected areas associated with a non-occluding object, wherein the non-occluding object is included in the one or more objects;

according to the visual task type, the projected areas associated with the non-occluded object in each image are automatically annotated based on metadata including one or more of information of the CAD model and/or rendering configuration parameters to generate annotation information associated with the non-occluded object.
The apparatus of claim 16, wherein the labeling unit is further configured to:

determining whether one or more intersection portions exist for the one or more projection areas;

if the one or more intersections exist, reserving a projection region of the plurality of projection regions having a smallest depth value for each intersection associated with the plurality of projection regions, and removing remaining projection regions of the plurality of projection regions to generate a projection region associated with the non-occluding object; or alternatively

If the one or more intersections are not present, the one or more projection areas are determined to be projection areas associated with a non-occluding object.
The apparatus according to claim 16 or 17, wherein the labeling unit is further configured to:

selecting a visual processing algorithm according to the visual task type;

feature information of a projection region associated with the non-occluded object is determined based on the metadata to generate the annotation information using the vision processing algorithm.
The apparatus of claim 18, wherein the labeling unit is further configured to:

determining contour point information of a projection area associated with a non-occluded object as characteristic information;

and generating boundary box information and/or mask information associated with the non-occluded object based on the contour point information as the labeling information.
The apparatus of claim 18, wherein the fusion unit is configured to:

extracting a projection region associated with the non-occluding object from each image in the set of two-dimensional projection images;

fusing a projection area associated with the non-occluded object as a foreground and a background image related to a production line of the one or more objects to obtain a fused image;

and superposing the annotation information on the fusion image to generate the annotation image.
The apparatus of claim 12 or 20, wherein the annotation image is used to train an AI application model associated with a production line of the one or more objects.
The apparatus of claim 12, wherein the one or more CAD models comprise at least two identical CAD models or comprise disparate CAD models.
A computing device, the computing device comprising:

a processor; and

a memory for storing computer-executable instructions that, when executed, cause the processor to perform the method of any of claims 1-11.
A computer-readable storage medium having stored thereon computer-executable instructions for performing the method according to any of claims 1-11.
A computer program product tangibly stored on a computer-readable storage medium and comprising computer-executable instructions that, when executed, cause at least one processor to perform the method of any one of claims 1-11.