KR102163573B1

KR102163573B1 - Apparatus and method on generating synthetic data for training real-time object detection system

Info

Publication number: KR102163573B1
Application number: KR1020180146693A
Authority: KR
Inventors: 이상훈; 허정우
Original assignee: 연세대학교 산학협력단
Priority date: 2018-11-23
Filing date: 2018-11-23
Publication date: 2020-10-12
Anticipated expiration: 2038-11-23
Also published as: KR20200061478A

Abstract

본 발명은 다수의 배경 영상 및 다수의 객체 마스크가 저장된 데이터베이스부, 데이터베이스부에서 배경 영상을 선택하여 획득하는 배경 선택부, 데이터베이스부에서 적어도 하나의 객체 마스크를 선택하고, 선택된 적어도 하나의 객체 마스크에 대해 블러링, 크기 조절 및 회전각 조절 중 적어도 하나를 적용하여 변형하는 객체 마스크 설정부 및 배경 선택부에서 선택된 배경 영상에 변형된 적어도 하나의 객체 마스크로부터 획득되는 적어도 하나의 최종 객체 마스크를 합성하여 합성 영상을 생성하는 합성 영상 생성부를 포함하는 합성 데이터 생성 장치 및 방법을 제공할 수 있다.The present invention provides a database unit in which a plurality of background images and a plurality of object masks are stored, a background selection unit that selects and acquires a background image from the database unit, and selects at least one object mask from the database unit, and selects at least one object mask. By synthesizing at least one final object mask obtained from at least one transformed object mask on the background image selected by the object mask setting unit and the background selection unit that transforms by applying at least one of blurring, scaling, and rotation angle adjustment It is possible to provide an apparatus and method for generating synthesized data including a synthesized image generating unit that generates a synthesized image.

Description

Synthetic data generation device and method for learning real-time object detection system {APPARATUS AND METHOD ON GENERATING SYNTHETIC DATA FOR TRAINING REAL-TIME OBJECT DETECTION SYSTEM}

본 발명은 합성 데이터 생성 장치 및 방법에 관한 것으로, 실시간 객체 탐지 시스템 학습을 위한 합성 데이터 생성 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for generating synthesized data, and to an apparatus and method for generating synthesized data for learning a real-time object detection system.

최근 증강 현실(Augmented Reality: 이하 AR)이나 보안 시스템과 같은 다양한 응용 분야에서 객체를 식별하는 객체 탐지 기술이 요구되고 있다. 특히 AR 기술에서는 보다 현실적인 경험을 제공하기 위해 높은 수준의 사용자 상호 작용을 추구하고 있으며, 이에 실시간으로 주변 환경을 인식하는 것이 매우 중요하다.Recently, there is a demand for an object detection technology for identifying an object in various application fields such as Augmented Reality (AR) or security system. In particular, AR technology seeks a high level of user interaction to provide a more realistic experience, and it is very important to recognize the surrounding environment in real time.

현재 영상에서 어떠한 객체가 포함되어 있는지 인지할 수 있다면 주변 환경에 대한 충분한 정보를 제공할 수 있기 때문에, 실시간으로 주변 환경을 인식하기 위해서는 객체 탐지 기술 또한 실시간으로 수행될 수 있어야 한다.If it is possible to recognize which object is included in the current image, sufficient information on the surrounding environment can be provided. Therefore, in order to recognize the surrounding environment in real time, an object detection technology must also be performed in real time.

이에 최근에는 실시간 객체 탐지 기술에 딥 러닝(Deep learning) 기법으로 학습된 인공 신경망(Artificial Neural Network)을 객체 탐지 시스템에 이용함으로써, 그 성능을 획기적으로 향상시키기 위한 연구가 계속되고 있다. 인공 신경망이 신뢰성 있는 성능을 나타내기 위해서는 다양한 환경에서 획득된 다수의 학습 데이터를 이용하여 학습시켜야만 한다.In recent years, research has been continued to dramatically improve the performance of an object detection system by using an artificial neural network learned by a deep learning technique in a real-time object detection technology. In order for an artificial neural network to exhibit reliable performance, it must be trained using a large number of training data acquired in various environments.

일부 공용 데이터베이스들은 몇몇 종류의 객체에 대해 여러 영상들을 제공하므로 학습용 데이터를 획득할 수 있다. 그러나 탐지하고자 하는 객체가 해당 데이터베이스들에 존재하지 않는다면, 직접 학습 데이터를 제작해야 하는 상황이 발생한다.Some common databases provide multiple images for some types of objects, so that learning data can be obtained. However, if the object to be detected does not exist in the databases, a situation arises in which the learning data must be produced.

객체 탐지를 위한 학습 데이터 제작은 영상에 존재하는 모든 목표 객체들에 대해 테두리를 수동으로 설정하여야 하며, 인공 신경망 모델을 학습시키기 위해 통상적으로 몇 만장 이상의 방대한 학습 데이터가 필요하기 때문에 실질적으로 매우 어려운 작업이다.Creating learning data for object detection is a very difficult task in practice because it is necessary to manually set the borders for all target objects existing in the image, and because it usually requires more than tens of thousands of training data to learn an artificial neural network model. to be.

뿐만 아니라 실시간 객체 탐지 시스템을 학습시키고자 하는 경우, 해결되어야 하는 또 다른 문제가 있다.In addition, there is another problem to be solved in the case of learning a real-time object detection system.

이는 객체와 환경 사이의 복잡한 상호 작용에 의한 시각적 열화(visual degradation)이다. 시각적 열화는 일반적으로 모션 블러(motion blur)와 폐색(occlusion)에 의해 유발된다.This is visual degradation due to complex interactions between objects and environments. Visual degradation is generally caused by motion blur and occlusion.

도1 은 시각적 열화가 유발된 영상의 일예를 나타낸다.1 shows an example of an image in which visual deterioration is induced.

도1 에서 (a)는 모션 블러에 의한 시각적 열화가 발생된 영상을 나타내고, (b)는 폐색에 의한 시각적 열화가 발생된 영상을 나타낸다.In FIG. 1, (a) shows an image in which visual deterioration occurs due to motion blur, and (b) shows an image in which visual deterioration occurs due to occlusion.

모션 블러는 (a)에 도시된 바와 같이, 객체(여기서는 야구 배트)의 움직임에 따라 영상에 객체가 흐리게 나타나는 현상을 의미하며, 이는 저품질의 카메라에서 주로 발생한다. 이러한 모션 블러에 의해 객체의 윤곽이 변형됨에 따라, 개별 객체에 각각에 대해 객체 탐지 시스템이 이미 학습되었더라도, 객체를 탐지하지 못하는 결과를 초래한다.As shown in (a), motion blur refers to a phenomenon in which an object appears blurred in an image according to the motion of an object (here, a baseball bat), which mainly occurs in low-quality cameras. As the contour of the object is deformed by the motion blur, even if the object detection system has already been learned for each individual object, it results in the object not being detected.

폐색은 (b)에 도시된 바와 같이, 장애물(distractor)(여기서는 사람의 손)에 의해 영상에서 객체(여기서는 나이프)의 일부가 가려지는 현상을 의미하며, 이 또한 객체 탐지 시스템이 객체의 윤곽을 판별할 수 없도록 함으로써, 객체에 대한 학습 여부에 무관하게 객체를 탐지하지 못하는 결과를 초래한다.As shown in (b), occlusion refers to a phenomenon in which a part of an object (here, a knife) is obscured in an image by a distractor (here, a human hand), and the object detection system By making it impossible to discriminate, it results in not being able to detect the object regardless of whether the object is learned.

만일 객체 탐지 시스템이 시각적 열화에도 불구하고 객체를 탐지할 수 있도록 학습 시키기 위해서는 각 객체에 대한 다양한 형태의 시각적 열화가 반영된 학습 데이터를 이용하여야 하므로, 요구되는 학습 데이터의 양이 기하급수적으로 증가하게 되는 문제를 초래한다.If the object detection system needs to use learning data reflecting various types of visual deterioration for each object in order to learn to detect objects despite visual deterioration, the amount of required learning data increases exponentially. Causes problems.

따라서 객체 탐지 시스템이 효율적으로 제작되더라도, 학습 데이터의 부족으로 인해 객체 탐지 장치가 신뢰성 있게 객체를 탐지하지 못하는 문제가 발생한다.Therefore, even if the object detection system is efficiently fabricated, there arises a problem that the object detection device cannot reliably detect the object due to the lack of learning data.

한국 공개 특허 제10-2017-0037024호 (2017.03.23 공개)Korean Patent Publication No. 10-2017-0037024 (published on March 23, 2017)

본 발명의 목적은 실시간 객체 탐지 시스템을 학습 시키기 위한 다수의 학습용 데이터를 생성할 수 있는 합성 데이터 생성 장치 및 방법을 제공하는데 있다.An object of the present invention is to provide an apparatus and method for generating synthetic data capable of generating a plurality of data for learning to learn a real-time object detection system.

본 발명의 다른 목적은 모션 블러 및 폐색과 같은 시각 열화 현상에서도 객체 탐지 시스템이 용이하게 객체를 탐지할 수 있도록 하는 합성 데이터 생성 장치 및 방법을 제공하는데 있다.Another object of the present invention is to provide an apparatus and method for generating synthetic data that enables an object detection system to easily detect an object even in visual deterioration such as motion blur and occlusion.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 합성 데이터 생성 장치는 다수의 배경 영상 및 다수의 객체 마스크가 저장된 데이터베이스부; 상기 데이터베이스부에서 배경 영상을 선택하여 획득하는 배경 선택부; 상기 데이터베이스부에서 적어도 하나의 객체 마스크를 선택하고, 선택된 적어도 하나의 객체 마스크에 대해 블러링, 크기 조절 및 회전각 조절 중 적어도 하나를 적용하여 변형하는 객체 마스크 설정부; 및 상기 배경 선택부에서 선택된 상기 배경 영상에 변형된 적어도 하나의 객체 마스크로부터 획득되는 적어도 하나의 최종 객체 마스크를 합성하여 합성 영상을 생성하는 합성 영상 생성부; 를 포함한다.In order to achieve the above object, an apparatus for generating synthetic data according to an embodiment of the present invention includes: a database unit storing a plurality of background images and a plurality of object masks; A background selection unit that selects and acquires a background image from the database unit; An object mask setting unit that selects at least one object mask from the database unit and transforms the selected at least one object mask by applying at least one of blurring, scaling, and rotation angle adjustment; And a composite image generator configured to generate a composite image by synthesizing at least one final object mask obtained from at least one object mask transformed to the background image selected by the background selection unit. Includes.

상기 객체 마스크 설정부는 상기 데이터베이스부에 저장된 다수의 객체 마스크 중 적어도 하나의 객체 마스크를 선택하는 객체 마스크 선택부; 선택된 적어도 하나의 객체 마스크에 대해 기지정된 블러링 확률값에 따라 선택적으로 블러링하는 블러링부; 및 적어도 하나의 객체 마스크의 크기 및 회전각을 조절하여 변형된 적어도 하나의 객체 마스크를 출력하는 트리밍부; 를 포함할 수 있다.The object mask setting unit may include an object mask selection unit for selecting at least one object mask from among a plurality of object masks stored in the database unit; A blurring unit for selectively blurring the selected at least one object mask according to a predetermined blurring probability value; And a trimming unit configured to output at least one object mask transformed by adjusting the size and rotation angle of the at least one object mask. It may include.

상기 블러링부는 블러링 크기 및 블러링 방향각에 따라 상기 객체 마스크를 블러링하는 블러링 커널을 이용하여 객체 마스크를 블러링할 수 있다.The blurring unit may blur the object mask by using a blur kernel that blurs the object mask according to a blur size and a blur direction angle.

상기 합성 데이터 생성 장치는 기지정된 장애물 확률값에 따라 적어도 하나의 장애물 마스크를 획득하고, 획득된 장애물 마스크를 상기 객체 마스크 설정부에서 출력되는 객체 마스크와 합성하여, 상기 최종 객체 마스크를 상기 합성 영상 생성부로 전달하는 객체 합성부; 를 더 포함할 수 있다.The synthesized data generating apparatus acquires at least one obstacle mask according to a predetermined obstacle probability value, synthesizes the obtained obstacle mask with an object mask output from the object mask setting unit, and converts the final object mask to the synthesized image generating unit. An object composition unit that transmits; It may further include.

상기 객체 합성부는 적어도 하나의 장애물 마스크의 크기 및 회전각을 조절하여 변형하고, 변형된 장애물 마스크를 상기 객체 마스크와 합성할 수 있다.The object synthesizer may deform by adjusting the size and rotation angle of at least one obstacle mask, and synthesize the transformed obstacle mask with the object mask.

상기 합성 영상 생성부는 상기 배경 영상에 적어도 하나의 최종 객체 마스크를 배치하여 합성하고, 최종 객체 마스크의 개수가 다수개이면, 배치된 다수개의 최종 객체 마스크 사이의 IOU(intersection of union)가 기지정된 허용값 미만이 되도록 배치 위치를 가변하며, 상기 합성 영상에 합성된 적어도 하나의 최종 객체 마스크의 각각에 대한 검증 레이블을 생성할 수 있다.The composite image generator arranges and synthesizes at least one final object mask on the background image, and if the number of final object masks is a plurality, an IOU (intersection of union) between the arranged plurality of final object masks is allowed. The placement position may be varied to be less than the value, and verification labels for each of at least one final object mask synthesized in the synthesized image may be generated.

상기 목적을 달성하기 위한 본 발명의 다른 실시예에 따른 합성 데이터 생성 방법은 미리 획득된 다수의 배경 영상 및 다수의 객체 마스크 중 하나의 배경 영상과 적어도 하나의 객체 마스크를 선택하는 단계; 선택된 적어도 하나의 객체 마스크에 대해 블러링, 크기 조절 및 회전각 조절 중 적어도 하나를 적용하여 변형하는 단계; 및 선택된 상기 배경 영상에 변형된 적어도 하나의 객체 마스크로부터 획득되는 적어도 하나의 최종 객체 마스크를 합성하여 합성 영상을 생성하는 단계; 를 포함한다.According to another embodiment of the present invention for achieving the above object, a method for generating composite data includes: selecting one background image and at least one object mask from among a plurality of background images and a plurality of object masks obtained in advance; Transforming the selected at least one object mask by applying at least one of blurring, scaling, and rotation angle adjustment; And generating a composite image by synthesizing at least one final object mask obtained from the transformed at least one object mask on the selected background image. Includes.

따라서, 본 발명의 실시예에 따른 합성 데이터 생성 장치 및 방법은 다수의 배경과 다수의 객체를 별도로 획득하여 합성함으로써, 객체 탐지 시스템이 다양한 환경에서의 객체 탐색을 수행할 수 있도록 하는 학습 데이터를 생성할 수 있다. 또한 객체에 블러링 처리, 크기 및 방향 조절과 같은 시각적 변화를 적용하여 배경과 합성함으로써, 객체 탐지 시스템이 다양한 상태의 객체를 탐지할 수 있도록 하며, 장애물을 추가로 합성함으로써 폐색이 발생된 영상에서도 객체를 신뢰도 있게 탐지할 수 있도록 하는 학습 데이터를 생성하여 제공할 수 있다.Accordingly, the apparatus and method for generating synthesized data according to an embodiment of the present invention generates learning data that enables the object detection system to search for objects in various environments by separately obtaining and synthesizing a plurality of backgrounds and a plurality of objects. can do. In addition, by applying visual changes such as blurring and adjusting the size and direction to the object and synthesizing it with the background, the object detection system can detect objects in various states. Learning data that enables reliable detection of objects can be created and provided.

도1 은 시각적 열화가 유발된 영상의 일예를 나타낸다.
도2 는 본 발명의 일 실시예에 따른 학습 데이터 생성 장치의 개략적 구조를 나타낸다.
도3 은 도2 의 학습 데이터 생성 장치의 각 구성 요소의 동작을 설명하기 위한 도면이다.
도4 는 알파 채널 블러링의 수행 여부에 다른 합성 데이터의 차이를 비교한 도면이다.
도5 는 본 발명의 일 실시예에 따른 학습 데이터 생성 방법을 나타낸다.
도6 은 본 발명의 일실시예에 따른 학습 데이터 생성 방법을 테스트하기 위한 예제 프레임을 나타낸다.
도7 은 본 발명의 일실시예에 따른 학습 데이터를 이용하여 학습된 객체 탐지 시스템의 객체 탐지 성능을 실험한 결과를 나타낸다.1 shows an example of an image in which visual deterioration is induced.
2 shows a schematic structure of an apparatus for generating learning data according to an embodiment of the present invention.
3 is a view for explaining the operation of each component of the learning data generating apparatus of FIG.
4 is a diagram comparing differences between other synthesized data and whether alpha channel blurring is performed.
5 shows a method of generating learning data according to an embodiment of the present invention.
6 shows an example frame for testing a method of generating learning data according to an embodiment of the present invention.
7 shows an experiment result of an object detection performance of an object detection system learned using learning data according to an embodiment of the present invention.

본 발명과 본 발명의 동작상의 이점 및 본 발명의 실시에 의하여 달성되는 목적을 충분히 이해하기 위해서는 본 발명의 바람직한 실시예를 예시하는 첨부 도면 및 첨부 도면에 기재된 내용을 참조하여야만 한다. In order to fully understand the present invention, the operational advantages of the present invention, and the objects achieved by the implementation of the present invention, reference should be made to the accompanying drawings illustrating preferred embodiments of the present invention and the contents described in the accompanying drawings.

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시예를 설명함으로써, 본 발명을 상세히 설명한다. 그러나, 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 설명하는 실시예에 한정되는 것이 아니다. 그리고, 본 발명을 명확하게 설명하기 위하여 설명과 관계없는 부분은 생략되며, 도면의 동일한 참조부호는 동일한 부재임을 나타낸다.Hereinafter, the present invention will be described in detail by describing a preferred embodiment of the present invention with reference to the accompanying drawings. However, the present invention may be implemented in various different forms, and is not limited to the described embodiments. In addition, in order to clearly describe the present invention, parts irrelevant to the description are omitted, and the same reference numerals in the drawings indicate the same members.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라, 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "...부", "...기", "모듈", "블록" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. Throughout the specification, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components unless specifically stated to the contrary. In addition, terms such as "... unit", "... group", "module", and "block" described in the specification mean units that process at least one function or operation, which is hardware, software, or hardware. And software.

도2 는 본 발명의 일 실시예에 따른 학습 데이터 생성 장치의 개략적 구조를 나타내고, 도3 은 도2 의 학습 데이터 생성 장치의 각 구성 요소의 동작을 설명하기 위한 도면이다.FIG. 2 is a schematic structure of an apparatus for generating learning data according to an exemplary embodiment of the present invention, and FIG. 3 is a diagram for explaining the operation of each component of the apparatus for generating learning data of FIG. 2.

도2 및 도3 을 참조하여, 본 실시예에 따른 학습 데이터 생성 장치를 설명하면, 학습 데이터 생성 장치는 데이터베이스부(100), 배경 선택부(200), 객체 마스크 설정부(300), 장애물 마스크 설정부(400), 객체 합성부(500) 및 합성 영상 생성부(600)를 포함한다.Referring to FIGS. 2 and 3, when the apparatus for generating learning data according to the present embodiment is described, the apparatus for generating learning data includes a database unit 100, a background selection unit 200, an object mask setting unit 300, and an obstacle mask. A setting unit 400, an object synthesis unit 500, and a synthesized image generation unit 600 are included.

데이터베이스부(100)는 객체 탐지 시스템 학습을 위한 합성 영상을 생성하기 위한 소스 영상인 배경 영상과 객체 영상을 저장한다. 데이터베이스부(100)는 배경 영상을 저장하는 배경 데이터베이스(110)와 객체 영상을 저장하는 객체 마스크 데이터베이스(120)를 각각 구비할 수 있다. The database unit 100 stores a background image and an object image, which are source images for generating a composite image for learning an object detection system. The database unit 100 may include a background database 110 for storing a background image and an object mask database 120 for storing an object image, respectively.

배경 데이터베이스(110)는 합성 데이터가 복잡한 실제 환경과 유사하게 생성될 수 있도록 다양한 배경 영상이 수집되어 저장될 수 있다.The background database 110 may collect and store various background images so that the composite data can be generated similarly to a complex real environment.

그리고 객체 마스크 데이터베이스(120)는 탐색 대상이 될 수 있는 다양한 객체에 대한 영상이 저장된다. 여기서 객체 영상은 객체의 윤곽에 따라 철취된 영상으로, 이후 합성 영상을 생성할 때 배경 영상에 중첩되는 일종의 마스크로 기능한다. 즉 본 실시예에서 객체 영상은 객체 마스크로서 이용된다.In addition, the object mask database 120 stores images of various objects that may be search targets. Here, the object image is an image pulled out according to the outline of the object, and functions as a kind of mask that is superimposed on the background image when the composite image is generated. That is, in this embodiment, the object image is used as an object mask.

객체 마스크 데이터베이스(120)에 저장되는 객체 영상은 다양한 방식으로 수집될 수 있으나, 일예로 별도의 보조 세그멘테이션 신경망을 통해서도 획득될 수 있다.The object image stored in the object mask database 120 may be collected in various ways, but as an example, it may also be obtained through a separate auxiliary segmentation neural network.

보조 세그멘테이션 신경망 또한 인공 신경망의 일종으로 학습을 요구하므로, 많은 경우에 효율적이지 않다. 그러나 여기서는 지정된 심플 배경에서 객체가 포함된 영상을 획득하고, 획득된 영상에 대해 단순 이미지 마스킹 알고리즘을 이용하여 배경을 제거함으로써 보조 세그멘테이션 신경망을 학습시키기 위한 객체 영상을 수집할 수 있으며, 학습된 보조 세그멘테이션 신경망은 다양한 객체 영상을 획득할 수 있다.The auxiliary segmentation neural network is also a kind of artificial neural network and requires learning, so it is not efficient in many cases. However, in this case, the object image for learning the auxiliary segmentation neural network can be collected by acquiring the image containing the object in the designated simple background and removing the background using a simple image masking algorithm for the acquired image. The neural network can acquire various object images.

배경 선택부(200)는 데이터베이스부(100)에 저장된 다수의 배경 영상 중 합성하고자 하는 배경 영상을 선택하여 합성 영상 생성부(600)로 전달한다. 배경 선택부(200)는 합성 영상에 객체가 다양한 환경에 배치된 상황으로 합성될 수 있도록 랜덤하게 배경을 선택하거나, 기지정된 순서로 선택할 수 있다.The background selection unit 200 selects a background image to be synthesized from among a plurality of background images stored in the database unit 100 and transmits the selected background image to the synthesis image generation unit 600. The background selection unit 200 may randomly select a background or may select a background in a predetermined order so that objects may be synthesized in a situation in which objects are arranged in various environments in the synthesized image.

객체 마스크 설정부(300)는 데이터베이스부(100)에 저장된 다수의 객체 마스크 중 배경 선택부(200)에서 선택된 배경 영상에 배치되어 합성 영상으로 합성될 N(여기서 N은 자연수)개의 객체 마스크를 선택하고, 선택된 N개의 객체 마스크에 대해 모션 블러링, 크기 조절 및 회전 중 적어도 하나를 적용함으로써 시각적으로 변형시켜 객체 합성부(500)로 전달한다. 즉 객체 마스크 설정부(300)는 합성하고자 하는 N개의 객체 마스크를 선택하고 변형시킴으로써, 선택된 객체 마스크에 다양한 상호 작용에 의해 발생할 수 있는 시각적 변형(Visual Variation) 효과를 부여한다.The object mask setting unit 300 selects N object masks (where N is a natural number) to be combined into a composite image by being placed on the background image selected by the background selection unit 200 among a plurality of object masks stored in the database unit 100 Then, the selected N object masks are visually transformed by applying at least one of motion blur, size adjustment, and rotation to the object synthesis unit 500. That is, the object mask setting unit 300 selects and transforms N object masks to be synthesized, thereby giving the selected object mask a visual variation effect that may occur due to various interactions.

객체 마스크 설정부(300)는 객체 마스크 선택부(310), 블러링부(320) 및 트리밍부(330)를 포함할 수 있다.The object mask setting unit 300 may include an object mask selection unit 310, a blurring unit 320, and a trimming unit 330.

객체 마스크 선택부(310)는 데이터베이스부(100)에 저장된 다수의 객체 마스크 중 합성하고자 하는 N개의 객체 마스크를 선택한다.The object mask selection unit 310 selects N object masks to be synthesized from among a plurality of object masks stored in the database unit 100.

블러링부(320)는 N개의 객체 마스크에 대해 모션 블러 효과를 부가한다. 블러링부(320)는 모션 블러 합성(motion blur synthesis) 방식으로 모션 블러 효과를 부과한다. 블러링부(320)는 객체 마스크의 블러링 크기(w_mb)와 x축 방향에 대한 블러링 방향각(direction angle)(θ_mb)을 조절하는 모션 블러 커널을 이용할 수 있다. 모션 블러 커널은 선형 특성을 가지며, 블러링 크기(w_mb)는 모션 블러 커널의 중심에 대해 블러링 방향각(θ_mb)을 통한 평균 픽셀이다.The blurring unit 320 adds a motion blur effect to the N object masks. The blurring unit 320 imposes a motion blur effect in a motion blur synthesis method. The blurring unit 320 may use a motion blur kernel that adjusts the blurring size (w _mb ) of the object mask and the blurring direction angle (θ _mb ) with respect to the x-axis direction. The motion blur kernel has a linear characteristic, and the blur size (w _mb ) is an average pixel through the blur direction angle (θ _mb ) with respect to the center of the motion blur kernel.

블러링부(320)는 모션 블러 커널을 이용하여 인가된 객체 마스크에 대해 도3 에 도시된 바와 같이 블러링 효과를 추가할 수 있다. 이때 블러링부(320)는 배경 영상과의 자연스러운 융합을 위해 객체 마스크의 알파 채널에 대해서도 블러링 효과를 적용할 수 있다.The blurring unit 320 may add a blurring effect to the applied object mask by using the motion blur kernel as shown in FIG. 3. In this case, the blurring unit 320 may apply a blurring effect to the alpha channel of the object mask for natural fusion with the background image.

그리고 블러링부(320)는 객체 마스크에 대해 선택적으로 블러링 효과를 적용할 수 있다. 예를 들면, 블러링부(320)는 기지정된 블러링 확률값에 따라 인가되는 N개의 객체 마스크 중 일부 객체 마스크에만 블러링 효과를 적용하고 나머지 객체 마스크에는 블러링 효과를 적용하지 않을 수 있다. 또한 블러링 효과를 적용하는 객체 마스크에 대해서도 블러링 크기(w_mb)와 블러링 방향각(θ_mb)을 서로 다르게 적용할 수도 있다.In addition, the blurring unit 320 may selectively apply a blurring effect to the object mask. For example, the blurring unit 320 may apply the blurring effect to only some of the N object masks applied according to the predetermined blurring probability value and not apply the blurring effect to the remaining object masks. Also, for the object mask to which the blurring effect is applied, the blurring size (w _mb ) and the blurring direction angle (θ _mb ) may be differently applied.

한편, 트리밍부(330)는 블러링부(320)로부터 객체 마스크를 인가받아 크기 조절 및 회전하여 출력한다.Meanwhile, the trimming unit 330 receives the object mask from the blurring unit 320, adjusts the size, rotates, and outputs it.

트리밍부(330)는 다양한 상황에 배치된 객체를 연출하기 위해, 객체 마스크의 크기를 스케일링 파라미터(s_obj)에 따라 조절하고, 객체 마스크의 x 축에 대한 회전각을 회전 파라미터(θ_obj)에 따라 조절한다. 여기서 스케일링 파라미터(s_obj)는 배경 영상의 크기에 대한 비율로 설정될 수 있다. 일예로 스케일링 파라미터(s_obj)가 0.5로 설정되면, 객체 마스크는 배경 영상의 절반 크기로 조절될 수 있다. 이때 객체 마스크가 배경 영상의 크기를 초과하지 않도록, 스케일링 파라미터(s_obj)는 배경 영상의 긴 변의 길이에 대한 상대값으로 설정될 수 있다.The trimming unit 330 adjusts the size of the object mask according to the scaling parameter s _obj and adjusts the rotation angle of the object mask with respect to the x axis to the rotation parameter θ _{obj in} order to produce objects arranged in various situations. Adjust accordingly. Here, the scaling parameter s _obj may be set as a ratio to the size of the background image. For example, when the scaling parameter s _obj is set to 0.5, the object mask may be adjusted to be half the size of the background image. In this case, so that the object mask does not exceed the size of the background image, the scaling parameter s _obj may be set as a relative value with respect to the length of the long side of the background image.

그리고 스케일링 파라미터(s_obj)와 회전 파라미터(θ_obj)는 인가되는 다수의 객체 마스크 각각에 대해 서로 다른 값으로 설정될 수 있다.In addition, the scaling parameter s _obj and the rotation parameter θ _obj may be set to different values for each of the applied object masks.

한편, 트리밍부(330)는 크기 조절 및 회전된 객체 마스크의 크기 및 형태에 대응하여, 객체 마스크의 바운딩 박스(bounding box)를 트리밍하여 객체 합성부(500)로 출력한다.Meanwhile, the trimming unit 330 trims the bounding box of the object mask and outputs it to the object synthesis unit 500 in response to the size and shape of the resized and rotated object mask.

장애물 마스크 설정부(400)는 도3 에 도시된 바와 같이, 객체에 대한 폐색을 유발하는 장애물(distractor) 마스크를 생성하여 폐색 합성부로 전달한다. 이때 장애물 마스크 설정부(400)는 모든 객체 마스크에 대해 장애물 마스크를 생성하여 전달하는 것이 아니라, 기지정된 장애물 확률값에 따라 선택적으로 장애물 마스크를 생성할 수 있다.As shown in FIG. 3, the obstacle mask setting unit 400 generates a distractor mask that causes occlusion of an object and transmits it to the occlusion synthesis unit. In this case, the obstacle mask setting unit 400 may not generate and transmit an obstacle mask for all object masks, but may selectively generate an obstacle mask according to a predetermined obstacle probability value.

그리고 경우에 따라서 데이터베이스부(100)는 장애물 마스크를 저장하기 위한 장애물 마스크 데이터베이스를 더 포함할 수도 있으며, 장애물 마스크 설정부(400)는 데이터베이스부(100)에 저장된 장애물 마스크를 선택적으로 인가받을 수 있다.In some cases, the database unit 100 may further include an obstacle mask database for storing an obstacle mask, and the obstacle mask setting unit 400 may selectively receive the obstacle mask stored in the database unit 100. .

장애물 마스크 설정부(400)는 객체 마스크 설정부(400)와 유사하게 장애물 마스크의 크기와 회전 각도를 조절하여 출력할 수 있다. 이때, 장애물 마스크의 크기와 회전 각도는 장애물 스케일링 파라미터(s_dis)와 장애물 회전 파라미터(θ_dis)에 의해 조절되며, 장애물 마스크가 객체 마스크를 오버레이하지 않도록 장애물 스케일링 파라미터(s_dis)와 장애물 회전 파라미터(θ_dis)는 스케일링 파라미터(s_obj)와 회전 파라미터(θ_obj)를 고려하여 결정될 수 있다.Similar to the object mask setting unit 400, the obstacle mask setting unit 400 may adjust and output the size and rotation angle of the obstacle mask. At this time, the size and rotation angle of the obstacle mask are adjusted by the obstacle scaling parameter (s _dis ) and the obstacle rotation parameter (θ _dis ), and the obstacle scaling parameter (s _dis ) and the obstacle rotation parameter so that the obstacle mask does not overlay the object mask. (θ _dis ) may be determined in consideration of the scaling parameter s _obj and the rotation parameter θ _obj .

뿐만 아니라 경우에 따라서는 장애물 마스크 설정부(400)는 장애물 마스크에 대해 블러링 효과를 적용할 수도 있다. 즉 장애물 마스크 설정부(400)는 객체 마스크 설정부(400)와 유사하게 구성될 수 있다.In addition, in some cases, the obstacle mask setting unit 400 may apply a blurring effect to the obstacle mask. That is, the obstacle mask setting unit 400 may be configured similarly to the object mask setting unit 400.

객체 합성부(500)는 객체 마스크 설정부(400)에서 인가되는 변형된 객체 마스크와 장애물 마스크 설정부(400)에서 인가되는 장애물 마스크를 합성하여 최종 객체 마스크를 생성한다. 여기서 장애물 마스크는 객체 마스크 상에 오버랩되되, 상기한 바와 같이 장애물 마스크에 의해 객체 마스크가 완전히 가려지지 않도록 합성된다. 도3 에서는 일예로 객체 마스크인 가방과 장애물 마스크인 사람의 팔이 합성된 최종 객체 마스크를 도시하였다.The object synthesis unit 500 generates a final object mask by synthesizing a modified object mask applied from the object mask setting unit 400 and an obstacle mask applied from the obstacle mask setting unit 400. Here, the obstacle mask is overlapped on the object mask, but is synthesized so that the object mask is not completely covered by the obstacle mask as described above. FIG. 3 shows a final object mask in which a bag as an object mask and a human arm as an obstacle mask are combined as an example.

여기서는 설명의 편의를 위해 장애물 마스크 설정부(400)와 객체 합성부(500)를 구분하여 도시하였으나, 장애물 마스크 설정부(400)는 객체 합성부(500)에 포함될 수 있다.Here, for convenience of explanation, the obstacle mask setting unit 400 and the object combining unit 500 are separately illustrated, but the obstacle mask setting unit 400 may be included in the object combining unit 500.

합성 영상 생성부(600)는 배경 선택부(200)에서 전달되는 배경 영상 위에 객체 합성부(500)에서 전달되는 N개의 최종 객체 마스크를 오버랩하여 합성 영상을 생성한다. 즉 도3 에 도시된 바와 같이 합성 영상 생성부(600)는 하나의 배경 영상에 다수의 최종 객체 마스크를 배치할 수 있다.The composite image generation unit 600 generates a composite image by overlapping the N final object masks transmitted from the object synthesis unit 500 on the background image transmitted from the background selection unit 200. That is, as shown in FIG. 3, the composite image generator 600 may arrange a plurality of final object masks on one background image.

이때 합성 영상 생성부(600)는 이전 배경 영상 위에 배치된 최종 객체 마스크와 중첩되어 배치되지 않도록 조절할 수 있다. 이를 위해, 합성 영상 생성부(600)는 N개의 최종 객체 마스크를 순차적으로 랜덤하게 배치한다. 그리고 이전 배치된 최종 객체 마스크와 현재 배치되는 최종 객체 마스크 사이의 IOU(intersection of union)를 측정하고, 측정된 IOU가 기지정된 허용값(ε_iou)이상이면 현재 배치되는 최종 객체 마스크의 위치를 재설정할 수 있다. 즉 IOU가 허용값(ε_iou) 미만이 될 때까지 반복적으로 최종 객체 마스크의 위치를 가변할 수 있다.In this case, the composite image generator 600 may adjust so that the final object mask disposed on the previous background image is not overlapped and disposed. To this end, the composite image generator 600 sequentially randomly arranges the N final object masks. And the IOU (intersection of union) between the previously placed final object mask and the currently placed final object mask is measured, and if the measured IOU is more than a predetermined tolerance (ε _iou ), the position of the final object mask currently placed is reset can do. That is, the position of the final object mask can be repeatedly changed until the IOU becomes less than the allowable value (ε _iou ).

합성 영상 생성부(600)는 다양한 환경에서 다양한 객체가 배치된 상황에 대한 합성 영상을 생성할 수 있으며, 생성된 합성 영상은 객체 탐지 시스템을 학습 시키기 위한 학습 데이터로 이용될 수 있다. 특히 합성 영상 생성부(600)는 배경 영상에 최종 객체 마스크를 직접 위치 시킴에 따라, 합성 영상 생성 시에 각 객체에 대한 검증(ground truth) 레이블을 자동으로 생성한다.The composite image generator 600 may generate a composite image for a situation in which various objects are arranged in various environments, and the generated composite image may be used as training data for learning an object detection system. In particular, the composite image generation unit 600 automatically generates a ground truth label for each object when the composite image is generated by directly positioning the final object mask on the background image.

결과적으로 본 발명의 실시예에 따른 합성 데이터 생성 장치는 적은 수의 배경 영상과 객체 영상을 이용하여 여러 환경 및 조건과 다양한 객체에 대한 대량의 영상을 획득할 수 있도록 한다. 따라서 객체 탐지 시스템을 학습 시키기 위해 대량의 학습 데이터를 용이하게 획득할 수 있도록 한다.As a result, the apparatus for generating synthesized data according to an embodiment of the present invention makes it possible to acquire a large number of images for various environments and conditions and various objects by using a small number of background images and object images. Therefore, in order to learn the object detection system, it is possible to easily acquire a large amount of learning data.

도4 는 알파 채널 블러링의 수행 여부에 다른 합성 데이터의 차이를 비교한 도면이다.4 is a diagram comparing differences between other synthesized data and whether alpha channel blurring is performed.

도4 에서 (a)는 객체 마스크 설정부(300)의 블러링부(320)가 객체 마스크의 알파 채널에 모션 블러링을 적용하여 합성 영상을 생성한 경우를 나타내고, (b)는 알파 채널에 모션 블러링을 적용하지 않고 합성 영상을 생성한 경우를 나타낸다.In FIG. 4, (a) shows a case where the blurring unit 320 of the object mask setting unit 300 applies motion blur to the alpha channel of the object mask to generate a composite image, and (b) shows the case where the motion blur is applied to the alpha channel. This shows the case of creating a composite image without applying blurring.

알파 채널은 그래픽상의 한 픽셀의 색이 다른 픽셀의 색과 겹쳐서 나타날 때 두 색을 효과적으로 융합하기 위한 채널로서, 만일 객체 마스크에서 알파 채널을 제외하고 모션 블러링을 적용하는 경우, 즉 알파 채널에 대해 모션 블러링을 적용하지 않는 경우에는 (b)의 확대 영상에 도시된 바와 같이, 합성 영상에 아티팩트(artifact)가 발생되는 문제가 있다. 따라서 신뢰성 있는 학습 데이터를 생성하기 위해서는 객체 마스크에 대해 모션 블러링 효과를 적용할 때, 알파 채널도 함께 적용되는 것이 바람직하다.The alpha channel is a channel for effectively fusing two colors when the color of one pixel in the graphic overlaps the color of another pixel. If motion blur is applied except for the alpha channel in the object mask, that is, for the alpha channel. When motion blur is not applied, as shown in the enlarged image of (b), there is a problem that artifacts are generated in the composite image. Therefore, in order to generate reliable training data, it is preferable that an alpha channel is also applied when applying a motion blur effect to an object mask.

도5 는 본 발명의 일 실시예에 따른 학습 데이터 생성 방법을 나타낸다.5 shows a method of generating learning data according to an embodiment of the present invention.

도2 및 도3 을 참조하여, 도5 의 학습 데이터 생성 방법을 설명하면, 우선 배경 선택부(200)가 데이터베이스부(100)에 저장된 다수의 배경 영상 중 하나의 배경 영상을 선택하고, 객체 마스크 설정부(300)가 다수의 객체 마스크 중 N개의 객체 마스크를 선택한다(S10).Referring to FIGS. 2 and 3, the method of generating the training data of FIG. 5 will be described. First, the background selection unit 200 selects one background image from among a plurality of background images stored in the database unit 100, and an object mask The setting unit 300 selects N object masks from among a plurality of object masks (S10).

그리고 객체 마스크 설정부(300)는 선택된 N개의 객체 마스크 각각에 대해 모션 블러링을 수행할 지 여부를 지지정된 블러링 확률값에 따라 결정한다(S20). 만일 모션 블러링을 수행할 것으로 판별되면, 객체 마스크 설정부(300)는 블러링 크기(w_mb)와 블러링 방향각(θ_mb)에 따라 객체 마스크에 모션 블러링하여 모션 블러 효과를 추가한다(S30). 이때 블러링 크기(w_mb)와 블러링 방향각(θ_mb)은 각각의 객체 마스크에 대해 서로 상이하게 조절될 수 있다.In addition, the object mask setting unit 300 determines whether to perform motion blur on each of the selected N object masks according to a supported blurring probability value (S20). If it is determined that motion blur is to be performed, the object mask setting unit 300 adds a motion blur effect by motion blur to the object mask according to the blur size (w _mb ) and the blur direction angle (θ _mb ). (S30). In this case, the blurring size (w _mb ) and the blurring direction angle (θ _mb ) may be adjusted differently for each object mask.

또한 객체 마스크 설정부(300)는 N개의 객체 마스크 각각의 크기를 스케일링 파라미터(s_obj)에 따라 조절하고, 객체 마스크의 x 축에 대한 회전각을 회전 파라미터(θ_obj)에 따라 조절한다(S40). 이때 객체 마스크 설정부(300)는 배경 영상의 크기를 초과하지 않도록, 객체 마스크의 크기를 조절할 수 있다. 또한 객체 마스크 설정부(300)는 크기 조절 및 회전된 객체 마스크의 크기 및 형태에 대응하여, 객체 마스크의 바운딩 박스를 트리밍할 수 있다.In addition, the object mask setting unit 300 adjusts the size of each of the N object masks according to the scaling parameter s _obj , and adjusts the rotation angle of the object mask with respect to the x axis according to the rotation parameter θ _obj (S40). ). In this case, the object mask setting unit 300 may adjust the size of the object mask so as not to exceed the size of the background image. In addition, the object mask setting unit 300 may trim the bounding box of the object mask in response to the size and shape of the resized and rotated object mask.

그리고 장애물 마스크 설정부(300)는 기지정된 장애물 확률값에 따라 객체 마스크에 장애물 마스크를 추가할지 여부를 결정한다(S50). 장애물 마스크 설정부(300)는 장애물 마스크를 추가할 것으로 결정되면, 추가할 장애물을 선택하고, 객체 마스크 설정부(300)는 장애물 마스크 설정부(300)에서 선택된 장애물 마스크를 객체 마스크 설정부(300)에서 변형되어 객체 마스크에 합성하여 최종 객체 마스크를 생성한다(S60).Further, the obstacle mask setting unit 300 determines whether to add an obstacle mask to the object mask according to a predetermined obstacle probability value (S50). When it is determined to add an obstacle mask, the obstacle mask setting unit 300 selects an obstacle to be added, and the object mask setting unit 300 sets the obstacle mask selected by the obstacle mask setting unit 300 to the object mask setting unit 300 ) To generate a final object mask by combining it with an object mask (S60).

이때 장애물 마스크 설정부(300)는 객체 마스크 설정부(400)와 유사하게 장애물 마스크의 크기와 회전 각도를 조절할 수 있으며, 경우에 따라서는 블러링 효과를 적용할 수도 있다.In this case, the obstacle mask setting unit 300 may adjust the size and rotation angle of the obstacle mask similar to the object mask setting unit 400, and in some cases, a blurring effect may be applied.

합성 영상 생성부(600)는 배경 선택부(200)에서 선택된 배경 영상에 객체 합성부(500)에서 전달되는 N개의 최종 객체 마스크를 오버랩하여 합성 영상을 생성한다(S70). 합성 영상 생성부(600)는 IOU가 기지정된 허용값(ε_iou) 이상인지 여부를 판별하여, 이전 배경 영상 위에 배치된 최종 객체 마스크와 중첩되어 배치되지 않도록 조절할 수 있다. 이때 합성 영상 생성부(600)는 합성 영상에서 각 객체에 대한 검증(ground truth) 레이블을 자동으로 생성한다.The composite image generation unit 600 generates a composite image by overlapping the N final object masks transmitted from the object synthesis unit 500 with the background image selected by the background selection unit 200 (S70). The composite image generator 600 may determine whether the IOU is equal to or greater than a predetermined allowable value ε _iou , and adjust so that the IOU is not overlapped with the final object mask disposed on the previous background image. At this time, the composite image generator 600 automatically generates a ground truth label for each object in the composite image.

이하에서는 본 실시예의 학습 데이터 생성 장치 및 방법에 따라 생성되는 학습 데이터의 학습 성능을 테스트한 결과를 나타낸다.Hereinafter, results of testing the learning performance of learning data generated according to the learning data generation apparatus and method of the present embodiment are shown.

여기서는 일예로 표1 에 나타난 바와 같이, 12개의 객체 마스크 카테고리를 선택하였으며, 각 카테고리에 대한 객체 마스크의 개수는 상이할 수 있다.Here, as an example, as shown in Table 1, 12 object mask categories are selected, and the number of object masks for each category may be different.

학습 데이터 생성 시에 블러링 크기(w_mb)는 {20, 40} 중에서 선택되도록 설정되었으며, 블러링 방향각(θ_mb)은 {-45, 0, 45, 90} 중에서 선택되도록 설정되었다. 그리고 스케일링 파라미터(s_obj)와 장애물 스케일링 파라미터(s_dis)는 {0.2, 0.3, 0.4}에서 선택되고, 회전 파라미터(θ_obj)와 장애물 회전 파라미터(θ_dis)는 {-45, 0, 45, 90}에서 선택되도록 설정되었다. 또한 IOU에 대한 허용값(ε_iou)은 0.1로 설정되었다.When generating training data, the blurring size (w _mb ) was set to be selected from {20, 40}, and the blurring direction angle (θ _mb ) was set to be selected from {-45, 0, 45, 90}. And the scaling parameter (s _obj ) and the obstacle scaling parameter (s _dis ) are selected from {0.2, 0.3, 0.4}, and the rotation parameter (θ _obj ) and the obstacle rotation parameter (θ _dis ) are {-45, 0, 45, 90} was set to be selected. Also, the allowable value (ε _iou ) for IOU was set to 0.1.

표2 는 본 발명의 실시예에 따른 합성 영상을 학습 데이터로 이용하여 객체 탐지 시스템을 학습한 경우의 객체 탐지 성능을 나타내며, 평균 정밀도(average precision: AP) 메트릭으로 객체 탐지 시스템의 성능을 평가하였다.Table 2 shows the object detection performance when the object detection system is trained using the synthesized image according to the embodiment of the present invention as training data, and the performance of the object detection system is evaluated using an average precision (AP) metric. .

표2 에서는 합성 데이터를 이용한 객체 탐지 시스템의 학습 성능을 분석할 때, 시각적 열화에 대한 성능을 비교할 수 있도록 학습 데이터 집합을 4가지로 분류하여 테스트한 결과를 나타낸다.Table 2 shows the results of testing by classifying the training data set into 4 types so that the performance against visual degradation can be compared when analyzing the learning performance of an object detection system using synthetic data.

이하에서 SynDB_n은 n(n은 자연수)개의 영상에 포함된 객체의 개수를 의미한다. 즉 표2 의 SynDB₁은 영상에 1개의 객체가 포함되어 있음을 의미한다.Hereinafter, SynDB _n refers to the number of objects included in n (n is a natural number) images. In other words, SynDB ₁ in Table 2 means that one object is included in the image.

그리고 w/none는 모션 블러와 폐색이 포함되지 않은 학습 데이터 집합을 의미하고, w/occl은 폐색된 객체가 포함된 학습 데이터 집합, w/mb 는 모션 블러가 용된 객체 포함된 학습 데이터 집합을 의미하며, w/mb and occl은 모션 블러와 폐색이 모두 포함된 학습 데이터 집합을 의미한다.And w/none means a training data set that does not include motion blur and occlusion, w/occl means a training data set that includes occluded objects, and w/mb means a training data set that includes objects that use motion blur. And, w/mb and occl refer to a training data set that includes both motion blur and occlusion.

표2 를 살펴보면, 본 실시예에 따라 생성된 학습 데이터를 이용하여 학습된 객체 탐지 시스템은 시각적 열화가 발생된 영상에 대해 매우 우수한 성능을 나타냄을 알 수 있다.Referring to Table 2, it can be seen that the object detection system learned by using the training data generated according to the present embodiment exhibits very excellent performance with respect to an image in which visual deterioration has occurred.

도6 은 본 발명의 일실시예에 따라 생성된 학습 데이터를 테스트하기 위한 예제 영상을 나타낸다.6 shows an example image for testing training data generated according to an embodiment of the present invention.

도6 에서는 (a)와 (b)는 합성 데이터를 이용한 객체 탐지 시스템의 학습 성능을 비교하여 분석하기 위해 기존 방식으로 획득된 학습 데이터의 일예를 나타내며, 미리 수작업을 통해 검증 레이블이 설정되어 있는 것으로 가정한다. 그리고 (c)는 본 실시예에 따라 생성된 학습 데이터의 일예를 나타내며, (c)는 배경 영상과 객체 마스크가 합성된 합성 영상이므로, 검증 레이블 자동으로 생성된다.In FIG. 6, (a) and (b) show examples of learning data obtained by the conventional method in order to compare and analyze the learning performance of the object detection system using synthetic data. I assume. And (c) shows an example of the training data generated according to the present embodiment, and (c) is a composite image in which a background image and an object mask are combined, so a verification label is automatically generated.

(a)는 표1 에 나타난 영상에 객체 카테고리의 객체들 중 하나의 객체가 포함된 영상(N = 1)으로 상단 영상에는 총이 포함되어 있으며, 하단 영상에는 빗자루가 포함되어 있다. 그리고 (b)는 영상에 객체 카테고리의 객체들 중 다수개의 객체가 포함된 영상으로 상단 영상에는 가방과 배트 및 의자 포함되어 있으며, 하단 영상에는 병, 나이프, 총 및 폰이 포함되어 있다. (a) is an image (N = 1) in which one of the objects of the object category is included in the image shown in Table 1, and the top image includes a gun, and the bottom image includes a broom. And (b) is an image including a plurality of objects among objects of the object category in the image. The upper image includes a bag, a bat, and a chair, and the lower image includes a bottle, knife, gun, and phone.

한편 (c)는 배경 영상과 객체 마스크가 합성된 합성 영상으로 다수의 객체가 포함되어 있다.Meanwhile, (c) is a composite image in which a background image and an object mask are combined, and includes a plurality of objects.

표3 은 도6 과 같이 기존의 방식으로 획득된 학습 데이터와 본 실시예에 따라 생성된 학습데이터를 이용하여 객체 탐지 시스템을 학습시킨 결과를 비교하였다.Table 3 compares the results of learning the object detection system using the learning data obtained by the conventional method and the learning data generated according to the present embodiment as shown in FIG. 6.

표3 에서는 영상에 포함된 객체의 개수를 1개에서 8개까지 가변(n = 1 ~ 8)하면서 테스트를 수행한 결과이며, 좌측은 검증 레이블이 수작업으로 획득된 실제 영상을 의미하며, 우측은 본 실시예에 따라 합성된 합성 영상을 의미하며, 모든 영상에 객체의 시각적 열화가 포함된 경우를 테스트하였다.Table 3 shows the results of testing while varying the number of objects included in the image from 1 to 8 (n = 1 to 8), the left is the actual image obtained by hand with the verification label, and the right is This refers to a synthesized image synthesized according to the present embodiment, and a case where visual deterioration of an object is included in all images was tested.

표3 에 나타난 바와 같이 본 실시예에 따른 학습 데이터 생성 방법에 의해 생성된 합성 영상을 이용하여 학습된 객체 탐지 시스템은 객체의 개수에 상관없이 더 나은 객체 탐지 성능을 나타냄을 확인할 수 있다.As shown in Table 3, it can be seen that the object detection system learned using the synthesized image generated by the learning data generation method according to the present embodiment exhibits better object detection performance regardless of the number of objects.

도7 은 본 발명의 일실시예에 따른 학습 데이터를 이용하여 학습된 객체 탐지 시스템의 객체 탐지 성능을 실험한 결과를 나타낸다.7 shows an experiment result of an object detection performance of an object detection system learned using learning data according to an embodiment of the present invention.

도7 에서 (a)는 폐색된 객체에 대한 검출 결과를 나타내고, (b)는 모션 블러된 객체에 대한 검출 결과를 나타낸다. 그리고 도7 에서 상단은 모션 블러와 폐색이 포함되지 않은 합성 영상을 이용하여 학습시킨 결과를 나타내고, 가운데단은 폐색이 포함된 합성 영상을 이용하여 학습시킨 결과를 나타내며, 하단은 모션 블러 및 폐색이 모두 포함된 합성 영상을 이용하여 학습시킨 결과를 나타낸다.In FIG. 7, (a) shows the detection result of the occluded object, and (b) shows the detection result of the motion-blurred object. And in FIG. 7, the upper part shows the result of learning using a composite image that does not include motion blur and occlusion, the middle part shows the result of learning using the composite image including the occlusion, and the lower part shows the result of learning by using the composite image including motion blur and occlusion. It shows the result of training using all included composite images.

도7 을 살펴보면, 상단의 모션 블러와 폐색이 포함되지 않은 합성 영상을 이용하여 학습된 객체 검출 시스템은 (a)에서 폐색이 나타나지 않는 영역만을 검출하는데 반해, 모션 블러가 존재하는 (b)에서는 객체를 전혀 검출하지 못하였다.Referring to FIG. 7, the object detection system learned using a composite image that does not include motion blur and occlusion at the top detects only the area where occlusion does not appear in (a), whereas in (b) where motion blur exists, the object Was not detected at all.

그리고 가운데 단의 폐색이 포함된 합성 영상을 이용하여 학습된 객체 검출 시스템은 (a)에서 폐색이 발생하였음에도 객체의 대부분의 영역을 검출하는데 반해, 여전히 모션 블러가 존재하는 (b)에서는 객체를 전혀 검출하지 못하였다.In addition, the object detection system learned using the composite image containing the occlusion in the middle detects most areas of the object even though the occlusion occurred in (a), whereas in (b) where motion blur still exists, the object is completely detected. Could not be detected.

마지막으로 하단의 모션 블러 및 폐색이 모두 포함된 합성 영상을 이용하여 학습된 객체 검출 시스템은 폐색이 발생된 (a)와 모션 블러가 존재하는 (b) 모두에서 양호하게 객체를 검출할 수 있음을 알 수 있다.Finally, the object detection system learned using the composite image containing both motion blur and occlusion at the bottom can detect the object satisfactorily in both the occlusion (a) and the motion blur (b). Able to know.

결과적으로 본 발명의 실시예에 따른 합성 데이터 생성 장치 및 방법에 의해 생성된 합성 영상은 배경에 다양한 객체를 삽입하여 합성 영상을 생성함으로써, 객체 검출 시스템이 다양한 환경에서 객체를 용이하게 검출할 수 있도록 한다. 뿐만 아니라 합성 영상에 포함되는 각 객체에 모션 블러 및 폐색과 같은 시각적 열화 현상이 반영되도록 함으로써, 객체 검출 시스템이 시각적 열화 현상에도 불구하고 객체를 검출할 수 있도록 학습 시킬 수 있다. 따라서 객체 검출 시스템의 성능을 크게 향상 시킬 수 있다.As a result, the synthesized image generated by the synthesized data generating apparatus and method according to an embodiment of the present invention generates a synthesized image by inserting various objects into the background, so that the object detection system can easily detect objects in various environments. do. In addition, by reflecting visual deterioration phenomena such as motion blur and occlusion to each object included in the composite image, the object detection system can be trained to detect an object despite visual deterioration. Therefore, the performance of the object detection system can be greatly improved.

본 발명에 따른 방법은 컴퓨터에서 실행 시키기 위한 매체에 저장된 컴퓨터 프로그램으로 구현될 수 있다. 여기서 컴퓨터 판독가능 매체는 컴퓨터에 의해 액세스 될 수 있는 임의의 가용 매체일 수 있고, 또한 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함하며, ROM(판독 전용 메모리), RAM(랜덤 액세스 메모리), CD(컴팩트 디스크)-ROM, DVD(디지털 비디오 디스크)-ROM, 자기 테이프, 플로피 디스크, 광데이터 저장장치 등을 포함할 수 있다.The method according to the present invention may be implemented as a computer program stored in a medium for execution on a computer. Here, the computer-readable medium may be any available medium that can be accessed by a computer, and may also include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, and ROM (Read Dedicated memory), RAM (random access memory), CD (compact disk)-ROM, DVD (digital video disk)-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다.The present invention has been described with reference to the embodiments shown in the drawings, but these are merely exemplary, and those of ordinary skill in the art will appreciate that various modifications and other equivalent embodiments are possible therefrom.

따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 청구범위의 기술적 사상에 의해 정해져야 할 것이다.Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the appended claims.

100: 데이터베이스부 200: 배경 선택부
300: 객체 마스크 설정부 400: 장애물 마스크 설정부
500: 객체 합성부 600: 합성 영상 생성부100: database unit 200: background selection unit
300: object mask setting unit 400: obstacle mask setting unit
500: object synthesis unit 600: synthesis image generation unit

Claims

A database unit storing a plurality of background images and a plurality of object masks;
A background selection unit that selects and acquires a background image from the database unit;
An object mask setting unit that selects at least one object mask from the database unit and transforms the selected at least one object mask by applying at least one of blurring, scaling, and rotation angle adjustment; And
A composite image generator configured to generate a composite image by synthesizing at least one final object mask obtained from at least one object mask transformed to the background image selected by the background selector; Including,
An object synthesis unit that obtains at least one obstacle mask according to a predetermined obstacle probability value, synthesizes the obtained obstacle mask with an object mask output from the object mask setting unit, and transmits the final object mask to the synthesis image generation unit; Contains more
The object composition unit
Deforming by adjusting the size and rotation angle of at least one obstacle mask, and synthesizing the transformed obstacle mask with the object mask,
The object combining unit synthesizes the obstacle mask to overlap the object mask to cover a part of the object mask, and applies a scaling parameter and a rotation parameter so that the obstacle mask does not overlay the object mask.

The method of claim 1, wherein the object mask setting unit
An object mask selection unit selecting at least one object mask from among a plurality of object masks stored in the database unit;
A blurring unit for selectively blurring the selected at least one object mask according to a predetermined blurring probability value; And
A trimming unit configured to output at least one object mask modified by adjusting the size and rotation angle of the at least one object mask; Synthetic data generating device comprising a.

The method of claim 2, wherein the blurring part
A composite data generation apparatus for blurring an object mask using a blur kernel that blurs the object mask according to a blur size and a blur direction angle.

delete

The method of claim 1, wherein the composite image generator
At least one final object mask is arranged and synthesized on the background image, and if the number of final object masks is multiple, the IOU (intersection of union) between the arranged multiple final object masks is arranged to be less than a predetermined allowable value. Variable position,
Synthetic data generation apparatus for generating a verification label for each of at least one final object mask synthesized in the synthesized image.

Selecting one background image and at least one object mask from among a plurality of previously acquired background images and a plurality of object masks;
Transforming the selected at least one object mask by applying at least one of blurring, scaling, and rotation angle adjustment; And
Generating a composite image by synthesizing at least one final object mask obtained from the transformed at least one object mask on the selected background image; Including,
The transforming step
Obtaining at least one obstacle mask according to a predetermined obstacle probability value; And
Synthesizing the acquired obstacle mask with the transformed object mask so that the acquired obstacle mask overlaps on the transformed object mask to cover a part of the object mask to generate the final object mask; Including,
The transforming step
Adjusting the size and rotation angle of the obtained obstacle mask by applying a scaling parameter and a rotation parameter so that the obstacle mask does not overlay the object mask; Synthetic data generation method further comprising a.

The method of claim 7, wherein the transforming step
Selecting at least one object mask from among the plurality of object masks;
Selectively blurring the selected at least one object mask according to a predetermined blurring probability value; And
Outputting at least one deformed object mask by adjusting the size and rotation angle of the at least one object mask; Synthetic data generation method comprising a.

The method of claim 8, wherein the blurring step
A method of generating composite data for blurring an object mask using a blur kernel that blurs the object mask according to a blur size and a blur direction angle.

delete

The method of claim 7, wherein generating the composite image comprises:
Placing at least one final object mask on the background image;
If the number of the final object masks is plural, adjusting the placement position so that the IOU (intersection of union) between the placed plural final object masks is less than a predetermined allowable value; And
Generating a verification label for each of at least one final object mask synthesized on the synthesized image; Synthetic data generation method comprising a.