KR102310446B1

KR102310446B1 - Apparatus for identifying landmarks irrespective of location change based on deep learning model and method therefor

Info

Publication number: KR102310446B1
Application number: KR1020200189468A
Authority: KR
Inventors: 이정민
Original assignee: (주)트레블씨투비
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-10-07
Anticipated expiration: 2040-12-31

Abstract

Provided is a device for identifying a landmark. The device includes: a data processing unit for receiving an image and location information at which the image was photographed from a user device; and an identification unit which calculates a probability indicating whether a learned landmark exists in the image through an identification model including an identification network corresponding to the location information, identifies the landmark according to the probability, and transmits name and description of the identified landmark to the user device.

Description

Apparatus for identifying landmarks irrespective of location change based on deep learning model and method therefor

본 발명은 영상에서 랜드 마크를 식별하는 기술에 관한 것으로, 보다 상세하게는, 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 장치 및 이를 위한 방법에 관한 것이다. The present invention relates to a technique for identifying a landmark in an image, and more particularly, to an apparatus for identifying a landmark regardless of a location change based on a deep learning model, and a method therefor.

뉴욕시의 랜드 마크 중 하나인 브루클린 브릿지(Brooklyn Bridge)는 맨해튼 구역(Manhattan Borough)과 브루클린 구역(Brooklyn Borough)을 연결한다. 하지만, 브루클린 다리 인근에 맨해튼 구역과 브루클린 구역을 연결하는 맨해튼 브릿지(Manhattan Bridge)가 있으며, 다리의 형상 또한 유사하여 인터넷 상에서 자신이 촬영한 사진을 업로드하면서 사진 설명에 맨해튼 브릿지를 브루클린 브릿지로 기재하는 경우가 상당수 존재한다. One of New York City's landmarks, the Brooklyn Bridge connects the Manhattan and Brooklyn Boroughs. However, near the Brooklyn Bridge, there is the Manhattan Bridge that connects the Manhattan and Brooklyn areas, and the shape of the bridge is also similar. Many cases exist.

한국공개특허 제2015-0026535호 (2015년 03월 11일 공개)Korea Patent Publication No. 2015-0026535 (published on March 11, 2015)

본 발명의 목적은 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 장치 및 이를 위한 방법을 제공함에 있다. It is an object of the present invention to provide an apparatus and a method for identifying a landmark regardless of a change in location based on a deep learning model.

상술한 바와 같은 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 위치 변화에 무관하게 랜드 마크를 식별하기 위한 방법은 데이터처리부가 사용자장치로부터 영상 및 영상이 촬영된 위치 정보를 수신하는 단계와, 식별부가 상기 위치 정보에 대응하는 식별망을 포함하는 식별모델을 통해 상기 영상에 학습된 랜드마크가 존재하는지 여부를 나타내는 확률을 산출하는 단계와, 식별부가 상기 확률에 따라 상기 랜드마크를 식별하고, 식별된 랜드마크의 명칭과 설명을 상기 사용자장치로 전송하는 단계를 포함한다. A method for identifying a landmark regardless of a change in location according to a preferred embodiment of the present invention for achieving the object as described above includes the steps of: Calculating, by an identification unit, a probability indicating whether a learned landmark exists in the image through an identification model including an identification network corresponding to the location information, an identification unit identifying the landmark according to the probability, and transmitting the name and description of the identified landmark to the user device.

상기 확률을 산출하는 단계는 상기 식별모델의 백본망이 상기 영상에 대해 복수의 컨볼루션층의 컨볼루션 연산을 통해 특징지도를 생성하는 단계와, 상기 식별모델의 영역검출망이 특징지도에서 영역상자를 통해 하나 이상의 관심 영역을 도출하는 단계와, 상기 식별부가 상기 위치 정보와 최단 거리에 위치한 랜드마크를 식별하도록 학습된 식별망을 로드하는 단계와, 상기 식별망이 상기 특징지도 및 상기 영역상자를 입력받고, 상기 특징지도에 대해 복수의 계층 간 가중치가 적용되는 복수의 연산을 통해 상기 영역상자 내에 학습된 랜드마크가 존재하는지 여부를 나타내는 확률을 산출하는 단계를 포함한다. Calculating the probability includes: generating a feature map by the backbone network of the identification model through a convolution operation of a plurality of convolutional layers on the image; Deriving one or more regions of interest through the step of, the identification unit loading the learned identification network to identify the landmark located in the shortest distance from the location information, the identification network is the feature map and the region box and calculating a probability indicating whether a learned landmark exists in the area box through a plurality of calculations in which a plurality of inter-layer weights are applied to the feature map.

상기 영상 및 영상이 촬영된 위치 정보를 수신하는 단계 전, 모델생성부가 복수의 학습용 영상를 포함하는 학습용 영상 배치를 마련하는 단계와, 모델생성부가 손실함수

의 하이퍼파라미터를 설정하는 단계와, 상기 학습용 영상 배치 각각에 포함된 학습용 영상에 대한 레이블을 설정하는 단계와, 상기 모델생성부가 학습용 영상을 식별모델에 입력하는 단계와, 상기 식별모델이 상기 학습용 영상에 대해 복수의 계층 간의 가중치가 적용되는 복수의 연산을 수행하여 출력값을 산출하는 단계와 상기 모델생성부가 상기 손실 함수를 통해 산출된 출력값과, 상기 레이블과의 차이인 손실이 최소가 되도록 식별모델의 가중치를 수정하는 단계를 포함하며, 상기 L은 손실이고, 상기

및 상기

는 하이퍼파라미터이며,

이고, 상기 i는 학습용 영상에 대응하는 인덱스이고, 상기 yi는 i번째 학습용 영상에 대한 레이블이고, 상기 f(xi)는 i번째 학습용 영상에 대한 상기 식별망의 출력값이고, 상기

및 상기

는 학습용 영상 배치에 포함된 복수의 학습용 영상에서 상기 데이터처리부에 의해 수집된 영상 및 사용자장치가 전송한 영상 각각이 차지하는 비율에 따라 값이 설정되는 것을 특징으로 한다. Before the step of receiving the image and the location information at which the image was taken, the step of the model generating unit providing a training image arrangement including a plurality of learning images, and the model generating unit losing function

setting hyperparameters of, setting a label for a training image included in each arrangement of the training image, and inputting the training image by the model generator into an identification model, wherein the identification model is the training image Calculating an output value by performing a plurality of calculations to which a weight between a plurality of layers is applied to , and a loss that is a difference between the output value calculated through the loss function by the model generator and the label and the label is minimized. modifying the weights, wherein L is the loss, and

and said

is a hyperparameter,

, wherein i is an index corresponding to the training image, yi is a label for the i-th training image, and f(xi) is an output value of the identification network for the i-th training image, and

and said

It is characterized in that the value is set according to the ratio of each of the image collected by the data processing unit and the image transmitted by the user device in the plurality of learning images included in the training image arrangement.

상술한 바와 같은 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 랜드 마크를 식별하기 위한 장치는 사용자장치로부터 영상 및 영상이 촬영된 위치 정보를 수신하는 데이터처리부와, 상기 위치 정보에 대응하는 식별망을 포함하는 식별모델을 통해 상기 영상에 학습된 랜드마크가 존재하는지 여부를 나타내는 확률을 산출하고, 상기 확률에 따라 상기 랜드마크를 식별하고, 식별된 랜드마크의 명칭과 설명을 상기 사용자장치로 전송하는 식별부를 포함한다. An apparatus for identifying a landmark according to a preferred embodiment of the present invention for achieving the object as described above includes a data processing unit for receiving an image and location information at which the image is taken from a user device, and identification corresponding to the location information Calculate a probability indicating whether or not a learned landmark exists in the image through an identification model including a network, identify the landmark according to the probability, and transfer the name and description of the identified landmark to the user device It includes an identification unit to transmit.

본 발명에 따르면, 사용자가 어느 위치에서 촬영을 하는지 여부에 상관없이 정밀하게 랜드 마크를 식별할 수 있다. According to the present invention, it is possible to precisely identify a landmark regardless of where the user takes the picture.

도 1은 본 발명의 실시예에 따른 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 시스템의 구성을 설명하기 위한 도면이다.
도 2는 본 발명의 실시예에 따른 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 장치의 구성을 설명하기 위한 도면이다.
도 3은 본 발명의 실시예에 따른 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 장치의 세부구성을 설명하기 위한 도면이다.
도 4는 본 발명의 실시예에 따른 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 식별모델의 전체적인 구성을 설명하기 위한 도면이다.
도 5 및 도 6은 본 발명의 실시예에 따른 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 식별모델의 세부적인 구성을 설명하기 위한 도면이다.
도 7은 본 발명의 실시예에 따른 학습용 영상 배치를 마련하는 방법을 설명하기 위한 흐름도이다.
도 8은 본 발명의 실시예에 따른 학습용 영상 배치를 마련하기 위한 벡터 공간을 설명하기 위한 도면이다.
도 9는 본 발명의 실시예에 따른 학습용 영상 배치를 이용하여 식별모델을 생성하는 방법을 설명하기 위한 흐름도이다.
도 10은 본 발명의 실시예에 따른 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 방법을 설명하기 위한 흐름도이다.
도 11 및 도 12는 본 발명의 실시예에 따른 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 방법을 설명하기 위한 도면이다. 1 is a view for explaining the configuration of a system for identifying a landmark regardless of a change in location based on a deep learning model according to an embodiment of the present invention.
2 is a view for explaining the configuration of an apparatus for identifying a landmark regardless of a change in location based on a deep learning model according to an embodiment of the present invention.
3 is a view for explaining the detailed configuration of an apparatus for identifying a landmark regardless of a location change based on a deep learning model according to an embodiment of the present invention.
4 is a view for explaining the overall configuration of the identification model for identifying landmarks regardless of location changes based on the deep learning model according to an embodiment of the present invention.
5 and 6 are diagrams for explaining the detailed configuration of an identification model for identifying a landmark regardless of a change in location based on a deep learning model according to an embodiment of the present invention.
7 is a flowchart illustrating a method of preparing an image arrangement for learning according to an embodiment of the present invention.
8 is a diagram for explaining a vector space for preparing an image arrangement for learning according to an embodiment of the present invention.
9 is a flowchart illustrating a method of generating an identification model using an image arrangement for training according to an embodiment of the present invention.
10 is a flowchart for explaining a method for identifying a landmark regardless of a location change based on a deep learning model according to an embodiment of the present invention.
11 and 12 are diagrams for explaining a method for identifying a landmark regardless of a location change based on a deep learning model according to an embodiment of the present invention.

본 발명의 상세한 설명에 앞서, 이하에서 설명되는 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 실시예에 불과할 뿐, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다. Prior to the detailed description of the present invention, the terms or words used in the present specification and claims described below should not be construed as being limited to their ordinary or dictionary meanings, and the inventors should develop their own inventions in the best way. For explanation, it should be interpreted as meaning and concept consistent with the technical idea of the present invention based on the principle that it can be appropriately defined as a concept of a term. Accordingly, the embodiments described in this specification and the configurations shown in the drawings are only the most preferred embodiments of the present invention, and do not represent all of the technical spirit of the present invention, so various equivalents that can be substituted for them at the time of the present application It should be understood that there may be water and variations.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예들을 상세히 설명한다. 이때, 첨부된 도면에서 동일한 구성 요소는 가능한 동일한 부호로 나타내고 있음을 유의해야 한다. 또한, 본 발명의 요지를 흐리게 할 수 있는 공지 기능 및 구성에 대한 상세한 설명은 생략할 것이다. 마찬가지의 이유로 첨부 도면에 있어서 일부 구성요소는 과장되거나 생략되거나 또는 개략적으로 도시되었으며, 각 구성요소의 크기는 실제 크기를 전적으로 반영하는 것이 아니다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In this case, it should be noted that the same components in the accompanying drawings are denoted by the same reference numerals as much as possible. In addition, detailed descriptions of well-known functions and configurations that may obscure the gist of the present invention will be omitted. For the same reason, some components are exaggerated, omitted, or schematically illustrated in the accompanying drawings, and the size of each component does not fully reflect the actual size.

먼저, 본 발명의 실시예에 따른 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 시스템에 대해서 설명하기로 한다. 도 1은 본 발명의 실시예에 따른 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 시스템의 구성을 설명하기 위한 도면이다. First, a system for identifying landmarks regardless of location changes based on a deep learning model according to an embodiment of the present invention will be described. 1 is a view for explaining the configuration of a system for identifying a landmark regardless of a change in location based on a deep learning model according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 시스템(이하, '식별시스템'으로 축약함)은 식별서버(10) 및 복수의 사용자장치(20)를 포함한다. Referring to Figure 1, a system for identifying a landmark regardless of a change in location based on a deep learning model according to an embodiment of the present invention (hereinafter, abbreviated as 'identification system') is an identification server 10 and a plurality of of the user device 20 .

식별서버(10)는 기본적으로, 사용자장치(20)로부터 촬영된 영상 및 영상을 촬영한 장소의 위치 정보를 수신하면, 수신된 영상에서 랜드마크를 식별하고, 랜드마크의 명칭을 비롯하여 랜드마크에 관련된 정보를 사용자장치(20)에 제공한다. 또한, 식별서버(10)는 식별된 랜드마크를 기준으로 주변의 관광지, 맛집 등에 대한 정보를 제공할 수 있다. The identification server 10 basically, when receiving the image and the location information of the place where the image was shot from the user device 20, identifies the landmark in the received image, and includes the name of the landmark Related information is provided to the user device 20 . In addition, the identification server 10 may provide information about a nearby tourist destination, a restaurant, etc. based on the identified landmark.

사용자장치(20)는 식별서버(10)가 제공하는 서비스에 가입한 사용자가 사용하는 장치이다. 사용자장치(20)는 사용자의 조작에 따라 사용자가 랜드마크로 추정한 객체에 대한 영상을 촬영하고, 촬영 시점의 위치 정보(예컨대, GPS 신호)를 획득하여 촬영된 영상과, 위치 정보를 식별서버(10)에 업로드한다. 그리고 식별서버(10)로부터 영상에 포함된 랜드마크에 대한 정보 및 랜드마크 주변의 정보를 수신하고, 사용자가 열람할 수 있도록 출력할 수 있다. 이러한 사용자장치(20)는 모바일 전화, 휴대폰, 이동통신단말기, 스마트폰, PDA, 태블릿(tablet), 패블릿(phablet) 등을 예시할 수 있다. The user device 20 is a device used by a user who has subscribed to the service provided by the identification server 10 . The user device 20 captures an image of an object estimated as a landmark by the user according to the user's manipulation, and obtains location information (eg, GPS signal) at the time of shooting to identify the captured image and the location information to the identification server ( 10) is uploaded. In addition, information about the landmark included in the image and information around the landmark may be received from the identification server 10, and the information may be output for the user to read. The user device 20 may be exemplified by a mobile phone, a mobile phone, a mobile communication terminal, a smart phone, a PDA, a tablet, a phablet, and the like.

그러면, 전술한 식별서버(10)의 구성에 대해 보다 상세하게 설명하기로 한다. 도 2는 본 발명의 실시예에 따른 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 장치의 구성을 설명하기 위한 도면이다. 도 3은 본 발명의 실시예에 따른 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 장치의 세부구성을 설명하기 위한 도면이다. 도 2를 참조하면, 본 발명의 실시예에 따른 식별서버(10)는 통신모듈(11), 저장모듈(12) 및 제어모듈(13)을 포함한다. Then, the configuration of the above-described identification server 10 will be described in more detail. 2 is a view for explaining the configuration of an apparatus for identifying a landmark regardless of a change in location based on a deep learning model according to an embodiment of the present invention. 3 is a view for explaining the detailed configuration of an apparatus for identifying a landmark regardless of a location change based on a deep learning model according to an embodiment of the present invention. Referring to FIG. 2 , the identification server 10 according to an embodiment of the present invention includes a communication module 11 , a storage module 12 , and a control module 13 .

통신모듈(11)은 네트워크를 통해 사용자장치(20)와 통신하기 위한 것이다. 통신모듈(11)은 사용자장치(20)와 데이터를 송수신 할 수 있다. 통신모듈(11)은 네트워크를 통해 데이터를 송수신하기 위해 송신되는 신호를 변조하고, 수신되는 신호를 복조하는 모뎀(modem)을 포함할 수 있다. 이러한 통신모듈(11)은 제어모듈(13)로부터 전달 받은 데이터를 네트워크를 통해 전송할 수 있다. 또한, 통신모듈(11)은 네트워크를 통해 수신되는 데이터를 제어모듈(13)로 전달할 수 있다. The communication module 11 is for communicating with the user device 20 through a network. The communication module 11 may transmit/receive data to and from the user device 20 . The communication module 11 may include a modem that modulates a signal to be transmitted and demodulates a received signal in order to transmit/receive data through a network. The communication module 11 may transmit data received from the control module 13 through the network. In addition, the communication module 11 may transmit data received through the network to the control module 13 .

저장모듈(12)은 식별서버(10)의 동작에 필요한 프로그램 및 데이터를 저장하는 역할을 수행한다. 예컨대, 저장모듈(12)은 본 발명의 실시예에 따른 학습용 영상 등을 저장할 수 있다. 이러한 학습용 영상은 사용자장치(20)로부터 수신된 것이거나, 웹 상에서 수집된 것일 수 있다. 저장모듈(12)에 저장되는 각 종 데이터는 관리자의 조작에 따라 등록, 삭제, 변경, 추가될 수 있다. The storage module 12 serves to store programs and data necessary for the operation of the identification server 10 . For example, the storage module 12 may store an image for learning according to an embodiment of the present invention. These learning images may be received from the user device 20 or collected on the web. Various types of data stored in the storage module 12 may be registered, deleted, changed, or added according to the operation of the administrator.

제어모듈(13)은 식별서버(10)의 전반적인 동작 및 식별서버(10)의 내부 블록들 간 신호 흐름을 제어하고, 데이터를 처리하는 데이터 처리 기능을 수행할 수 있다. 제어모듈(13)은 CPU(central processing unit), GPU(Graphic Processing Unit), DSP(Digital Signal Processor) 등이 될 수 있다. The control module 13 may control the overall operation of the identification server 10 and the signal flow between internal blocks of the identification server 10, and may perform a data processing function of processing data. The control module 13 may be a central processing unit (CPU), a graphic processing unit (GPU), a digital signal processor (DSP), or the like.

도 3을 참조하면, 제어모듈(13)은 데이터처리부(100), 모델생성부(200) 및 식별부(300)를 포함한다. Referring to FIG. 3 , the control module 13 includes a data processing unit 100 , a model generation unit 200 , and an identification unit 300 .

데이터처리부(100)는 웹(Web) 상에서 다양한 사이트에 접속하여 랜드마크의 명칭과 함께 개재된 영상을 수집할 수 있다. 또한, 데이터처리부(100)는 검색 사이트에 접속하여 랜드마크의 명칭을 검색어로 사용하여 영상을 검색하여 신뢰도 혹은 정확도 순서에 따라 복수의 영상을 수집할 수 있다. 데이터처리부(100)는 수집된 영상을 랜드마크 별로 분류하여 저장모듈(12)에 저장한다. The data processing unit 100 may access various sites on the web and collect images interposed with the names of landmarks. In addition, the data processing unit 100 may access a search site, search for an image using the name of the landmark as a search word, and collect a plurality of images according to the order of reliability or accuracy. The data processing unit 100 classifies the collected images by landmarks and stores them in the storage module 12 .

모델생성부(200)는 심층학습모델(deep learning model: DLM)인 식별모델(RM)을 학습(deep learning)을 통해 생성하기 위한 것이다. 이러한 모델생성부(200)의 동작에 대해서는 아래에서 더 상세하게 설명될 것이다. The model generator 200 is to generate an identification model RM, which is a deep learning model (DLM), through deep learning. The operation of the model generating unit 200 will be described in more detail below.

식별부(300)는 모델생성부(300)가 생성한 식별모델(RM)을 이용하여 사용자장치(20)가 전송한 영상에 포함된 랜드마크를 식별하기 위한 것이다. 이러한 식별부(300)의 동작에 대해서는 아래에서 더 상세하게 설명될 것이다. The identification unit 300 is for identifying the landmark included in the image transmitted by the user device 20 using the identification model RM generated by the model generation unit 300 . The operation of the identification unit 300 will be described in more detail below.

다음으로, 본 발명의 실시예에 따른 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한식별모델에 대해서 설명하기로 한다. 도 4는 본 발명의 실시예에 따른 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 식별모델의 전체적인 구성을 설명하기 위한 도면이다. 도 5 및 도 6은 본 발명의 실시예에 따른 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 식별모델의 세부적인 구성을 설명하기 위한 도면이다. Next, an identification model for identifying a landmark regardless of a change in location based on the deep learning model according to an embodiment of the present invention will be described. 4 is a view for explaining the overall configuration of the identification model for identifying landmarks regardless of location changes based on the deep learning model according to an embodiment of the present invention. 5 and 6 are diagrams for explaining the detailed configuration of an identification model for identifying a landmark regardless of a change in location based on a deep learning model according to an embodiment of the present invention.

도 5 및 도 6을 참조하면, 식별모델(RM)은 영상이 입력되면, 영상에 대해 복수의 계층의 학습된 가중치가 적용되는 복수의 연산을 통해 영상 내에 학습된 랜드마크가 존재하는지 여부를 확률로 출력한다. 이러한 식별모델(RM)은 백본망(Back Bone network: BN), 영역검출망(Region Proposal Network: RPN) 및 식별망(RN: Recognition Network)을 포함한다. 5 and 6 , when an image is input, the identification model RM determines whether a learned landmark exists in the image through a plurality of operations in which the learned weights of a plurality of layers are applied to the image. output as This identification model (RM) includes a backbone network (Back Bone network: BN), a region detection network (Region Proposal Network: RPN), and an identification network (RN: Recognition Network).

백본망(BN)은 복수의 컨벌루션층(Convolution Layer)을 포함하며, 입력된 영상에 대해 복수의 컨볼루션계층(CL)의 컨볼루션 연산을 통해 특징지도(feature map: FM)를 생성하여 출력한다. The backbone network (BN) includes a plurality of convolution layers, and generates and outputs a feature map (FM) through convolution operation of a plurality of convolution layers (CL) for an input image. .

영역검출망(RPN)은 특징지도(FM)에서 객체가 존재할 확률이 소정 수치 이상인 영역을 나타내는 적어도 하나의 관심 영역(region of interest: ROI)을 영역상자(Bounding Box: BB)를 통해 도출한다. 영역상자(BB)는 사각형이며, 중심 좌표, 중심 좌표(x, y)를 기준으로 하는 폭(w)과 높이(h)로 표현된다. The region detection network (RPN) derives at least one region of interest (ROI) indicating a region in which the probability of the existence of an object is greater than or equal to a predetermined value in the feature map (FM) through a bounding box (BB). The area box BB is a rectangle, and is expressed by a center coordinate, a width w and a height h based on the center coordinates (x, y).

식별망(RN)은 백본망(BN)으로부터 특징지도(FM)를 입력받고, 영역검출망(RPN)으로부터 관심 영역을 나타내는 영역상자(BB)를 입력받고, 영역상자(BB) 내에 학습된 랜드마크가 존재하는지 여부에 대한 확률을 산출하고, 이를 출력한다. The identification network (RN) receives the feature map (FM) from the backbone network (BN), receives the region box (BB) representing the region of interest from the region detection network (RPN), and the land learned in the region box (BB) Calculate the probability of whether a mark exists or not, and output it.

식별망(RN)은 대표적으로 CNN(Convolutional Neural Network)을 예시할 수 있다. 도 5에 도시된 바와 같이, 식별망(RN)은 순차로 입력층(input layer: IL), 컨벌루션층(convolution layer: CL), 풀링층(pooling layer: PL), 완전연결층(fully-connected layer: FL) 및 출력층(output layer: OL)을 포함한다. 여기서, 컨볼루션층(CL), 풀링층(PL) 및 완전연결층(FL) 각각은 2 이상이 될 수도 있다. 컨볼루션층(CL) 및 풀링층(PL)은 적어도 하나의 특징지도(FM: Feature Map)로 구성된다. 특징지도(FM)는 이전 계층의 연산 결과에 대해 가중치 및 임계치를 적용한 값을 입력받고, 입력받은 값에 대한 연산을 수행한 결과로 도출된다. 이러한 가중치는 소정 크기의 가중치(w: weight) 행렬인 필터 혹은 커널을 통해 적용된다. 본 발명의 실시예에서 컨볼루션층(CL)의 컨벌루션 연산은 제1 필터(W1)가 사용되며, 풀링층(PL)의 풀링 연산은 제2 필터(W2)가 사용된다. The identification network (RN) may typically be exemplified by a Convolutional Neural Network (CNN). As shown in FIG. 5 , the identification network RN sequentially includes an input layer (IL), a convolution layer (CL), a pooling layer (PL), and a fully-connected layer. layer: FL) and an output layer (OL). Here, each of the convolutional layer CL, the pooling layer PL, and the fully connected layer FL may be two or more. The convolutional layer CL and the pooling layer PL are composed of at least one feature map (FM). The feature map FM is derived as a result of receiving a value applied with a weight and a threshold to the operation result of the previous layer, and performing an operation on the input value. These weights are applied through a filter or kernel that is a weight matrix of a predetermined size. In the embodiment of the present invention, the first filter W1 is used for the convolution operation of the convolutional layer CL, and the second filter W2 is used for the pooling operation of the pooling layer PL.

입력층(IL)에 백본망(BN)으로부터 특징지도(FM)가 입력되면, 컨볼루션층(CL)은 입력층(IL)의 특징지도(FM)에 대해 제1 필터(W1)를 이용한 컨벌루션(convolution) 연산 및 활성화함수에 의한 연산을 수행하여 적어도 하나의 제1 특징지도(FM1)를 도출한다. 이어서, 풀링층(PL)은 컨볼루션층(CL)의 적어도 하나의 제1 특징지도(FM1)에 대해 제2 필터(W2)를 이용한 풀링(pooling 또는 sub-sampling) 연산을 수행하여 적어도 하나의 제2 특징지도(FM2)를 도출한다. When the feature map FM is input from the backbone network BN to the input layer IL, the convolution layer CL performs convolution using the first filter W1 on the feature map FM of the input layer IL. At least one first feature map FM1 is derived by performing a (convolution) operation and an operation by an activation function. Subsequently, the pooling layer PL performs a pooling or sub-sampling operation using the second filter W2 on at least one first feature map FM1 of the convolutional layer CL to obtain at least one A second feature map FM2 is derived.

완결연결층(FL)은 복수의 연산노드(E1 내지 En)로 이루어진다. 도 6을 참조하면, 완결연결층(FL)의 복수의 연산노드(E1 내지 En)는 풀링층(PL)의 적어도 하나의 제2 특징지도(FM2)에 대해 활성화함수에 의한 연산을 통해 복수의 연산노드값 E[e1, e2, e3, ..., en]을 산출한다. The final connection layer FL includes a plurality of operation nodes E1 to En. Referring to FIG. 6 , the plurality of operation nodes E1 to En of the final interconnection layer FL are calculated by an activation function for at least one second feature map FM2 of the pooling layer PL. Calculate the operation node value E[e1, e2, e3, ..., en].

도 6에 도시된 바와 같이, 출력층(OL)은 2개의 출력노드(O1, O2)를 포함한다. 완전연결층(FL)의 복수의 연산노드(E1 내지 En) 각각은 가중치(w: weight)를 가지는 채널(점선으로 표시)로 출력층(OL)의 출력노드(O1, O2)와 연결된다. 다른 말로, 복수의 연산노드(E1 내지 En)의 복수의 연산노드값(e1, e2, e3, ..., en)은 가중치가 적용되어 출력노드(O1, O2)에 각각 입력된다. 이에 따라, 출력층(OL)의 출력노드(O1, O2)는 가중치가 적용된 복수의 연산노드값(e1, e2, e3, ..., en)에 대해 활성화함수에 의한 연산을 통해 입력층(IL)으로 입력된 특징지도(FM)에 대응하여 영역상자 내의 객체가 랜드마크인지 여부를 확률로 산출한다. As shown in FIG. 6 , the output layer OL includes two output nodes O1 and O2. Each of the plurality of operation nodes E1 to En of the fully connected layer FL is connected to the output nodes O1 and O2 of the output layer OL through a channel (indicated by a dotted line) having a weight w: weight. In other words, the plurality of operation node values e1, e2, e3, ..., en of the plurality of operation nodes E1 to En are inputted to the output nodes O1 and O2, respectively. Accordingly, the output nodes O1 and O2 of the output layer OL are calculated by the activation function for a plurality of operation node values e1, e2, e3, ..., en to which the weight is applied to the input layer IL. ), in response to the input feature map (FM), it is calculated with a probability whether the object in the area box is a landmark.

출력층(OL)의 2개의 출력노드(O1, O2) 각각은 관심영역(ROI)의 범위를 나타내는 영역상자 내의 객체가 랜드마크인 경우(S) 및 랜드마크가 아닌 경우(F)에 대응한다. 이에 따라, 제1 출력노드(O1)의 출력값은 관심영역(ROI)의 범위를 나타내는 영역상자 내의 객체가 랜드마크일 확률을 나타내며, 제2 출력노드(O2)의 출력값은 해당 관심영역(ROI)의 범위를 나타내는 영역상자 내의 객체가 랜드마크가 아닐 확률을 나타낸다. 예컨대, 출력노드(O1, O1) 각각은 복수의 연산노드값(e1, e2, e3, ..., en) 각각에 가중치 w=[w1, w2, … , wn]를 적용한 후, 그 결과에 활성화함수를 취하여 출력값을 산출한다. 예컨대, 제1 및 제2 출력노드(O1, O2) 각각의 출력값이 각각 0.122, 0.878이면, 랜드마크일 확률이 12%이고, 랜드마크가 아닐 확률이 88%임을 나타낸다. 이러한 출력층(OL)의 제1 및 제2 출력노드(O1, O2)의 출력값은 식별망(RN) 및 식별모델(RM)의 출력값이 된다. 식별망(RN)이 확률 (0.122, 0.878)을 출력하면, 식별부(300)는 이러한 확률 (0.122, 0.878)을 통해 랜드마크일 확률이 12%이고, 랜드마크가 아닐 확률이 88%임을 알 수 있어 그 확률에 따라 영역상자 내의 객체가 랜드마크가 아닌 것으로 판단할 수 있다. Each of the two output nodes O1 and O2 of the output layer OL corresponds to a case in which an object in an area box indicating a range of the region of interest ROI is a landmark (S) and a case where it is not a landmark (F). Accordingly, the output value of the first output node O1 represents the probability that an object in the region box indicating the range of the region of interest (ROI) is a landmark, and the output value of the second output node O2 is the corresponding region of interest (ROI). It represents the probability that the object in the area box representing the range of is not a landmark. For example, each of the output nodes O1, O1 has a weight w=[w1, w2, ... , wn], and an activation function is applied to the result to calculate an output value. For example, if the respective output values of the first and second output nodes O1 and O2 are 0.122 and 0.878, respectively, the probability of being a landmark is 12%, and the probability of not being a landmark is 88%. Output values of the first and second output nodes O1 and O2 of the output layer OL become output values of the identification network RN and the identification model RM. If the identification network (RN) outputs the probability (0.122, 0.878), the identification unit 300 knows that the probability of being a landmark is 12% and the probability that it is not a landmark is 88% through these probabilities (0.122, 0.878). It can be determined that the object in the area box is not a landmark according to the probability.

전술한 컨벌루션층(CL), 완결연결층(FL) 및 출력층(OL)에서 사용되는 활성화함수는 시그모이드(Sigmoid), 하이퍼볼릭탄젠트(tanh: Hyperbolic tangent), ELU(Exponential Linear Unit), ReLU(Rectified Linear Unit), Leakly ReLU, Maxout, Minout, Softmax 등을 예시할 수 있다. 컨벌루션층(CL), 완결연결층(FL) 및 출력층(OL)에 이러한 활성화함수 중 어느 하나를 선택하여 적용할 수 있다. Activation functions used in the above-described convolutional layer (CL), final connection layer (FL) and output layer (OL) are Sigmoid, Hyperbolic tangent (tanh), Exponential Linear Unit (ELU), and ReLU. (Rectified Linear Unit), Leakly ReLU, Maxout, Minout, Softmax, and the like can be exemplified. Any one of these activation functions may be selected and applied to the convolutional layer CL, the final connection layer FL, and the output layer OL.

다음으로, 학습(deep learning)을 통해 전술한 바와 같은 식별모델(RM)을 생성하는 방법에 대해서 설명하기로 한다. 먼저, 학습(deep learning)을 위해 학습용 데이터를 마련해야 한다. 이러한 학습용 데이터를 마련하는 방법에 대해서 설명하기로 한다. 도 7은 본 발명의 실시예에 따른 학습용 영상 배치를 마련하는 방법을 설명하기 위한 흐름도이다. 도 8은 본 발명의 실시예에 따른 학습용 영상 배치를 마련하기 위한 벡터 공간을 설명하기 위한 도면이다. Next, a method of generating the identification model RM as described above through deep learning will be described. First, it is necessary to prepare data for learning for deep learning. A method of preparing such training data will be described. 7 is a flowchart illustrating a method of preparing an image arrangement for learning according to an embodiment of the present invention. 8 is a diagram for explaining a vector space for preparing an image arrangement for learning according to an embodiment of the present invention.

도 7을 참조하면, 모델생성부(200)는 S110 단계에서 저장모듈(12)에 저장된 복수의 영상 중 학습하고자 하는 랜드마크를 포함하는 복수의 영상을 수집한다. 전술한 바와 같이, 데이터처리부(100)는 웹(Web) 상에서 다양한 사이트에 접속하여 랜드마크의 명칭과 함께 개재된 영상을 수집할 수 있다. 또한, 데이터처리부(100)는 검색 사이트에 접속하여 랜드마크의 명칭을 검색어로 사용하여 영상을 검색하여 신뢰도 혹은 정확도 순서에 따라 복수의 영상을 수집할 수 있다. 데이터처리부(100)는 수집된 영상을 랜드마크 별로 분류하여 저장모듈(12)에 저장한다. 또한, 사용자가 업로드한 영상 중 해당 랜드마크가 포함된 것으로 분류된 영상 또한 저장모듈(12)에 저장된다. 이에 따라, S110 단계에서 모델생성부(200)는 저장모듈(12)에 저장된 복수의 영상 중 학습하고자 하는 랜드마크를 포함하는 복수의 영상을 수집할 수 있다. 그런 다음, 모델생성부(200)는 S120 단계에서 추출된 영상에 대해 복수의 컨벌루션 연산을 수행하여 복수의 컨벌루션 영상을 다차원 벡터공간(VS)에 임베딩(embedding)하여 복수의 영상벡터를 생성한다. Referring to FIG. 7 , the model generator 200 collects a plurality of images including a landmark to be learned among a plurality of images stored in the storage module 12 in step S110 . As described above, the data processing unit 100 may access various sites on the web and collect images interposed with the names of landmarks. In addition, the data processing unit 100 may access a search site, search for an image using the name of the landmark as a search word, and collect a plurality of images according to the order of reliability or accuracy. The data processing unit 100 classifies the collected images by landmarks and stores them in the storage module 12 . In addition, an image classified as including a corresponding landmark among images uploaded by the user is also stored in the storage module 12 . Accordingly, in step S110 , the model generator 200 may collect a plurality of images including a landmark to be learned among a plurality of images stored in the storage module 12 . Then, the model generator 200 performs a plurality of convolution operations on the image extracted in step S120 to embed the plurality of convolutional images in the multidimensional vector space VS to generate a plurality of image vectors.

다음으로, 모델생성부(200)는 S130 단계에서 다차원의 벡터공간(VS) 상에서 무작위로(randomly) 복수의 영상벡터 중 소정 수의 클러스터의 중심벡터를 선택한다. 이어서, 모델생성부(200)는 S140 단계에서 도 8에 도시된 바와 같은 다차원의 벡터공간(VS) 상에서 복수의 영상벡터를 소정 수의 클러스터의 중심벡터와의 거리에 따라 클러스터링하여 복수의 클러스터를 생성한다. 즉, S140 단계에서 모델생성부(200)는 벡터공간(VS) 상에서 복수의 영상벡터의 소정 수의 클러스터의 중심벡터 각각에 대한 거리를 산출하고, 복수의 영상벡터 각각을 클러스터의 중심벡터와의 거리가 최소인 클러스터에 포함시킨다. 그런 다음, 모델생성부(200)는 S150 단계에서 생성된 복수의 클러스터 별로 클러스터의 중심벡터를 다시 산출한다. 예컨대, S150 단계에서 모델생성부(200)는 각 클러스터의 복수의 영상벡터 중 중간값을 가지는 영상벡터를 중심벡터로 선정할 수 있다. Next, the model generator 200 randomly selects a center vector of a predetermined number of clusters from among a plurality of image vectors in the multidimensional vector space VS in step S130 . Next, the model generator 200 clusters a plurality of image vectors according to the distance from the center vector of a predetermined number of clusters in the multidimensional vector space VS as shown in FIG. 8 in step S140 to form a plurality of clusters. create That is, in step S140 , the model generator 200 calculates a distance to each center vector of a predetermined number of clusters of a plurality of image vectors in the vector space VS, and divides each of the plurality of image vectors with the center vector of the cluster. Include the cluster with the smallest distance. Then, the model generator 200 re-calculates the center vector of the cluster for each of the plurality of clusters generated in step S150 . For example, in step S150 , the model generator 200 may select an image vector having an intermediate value among a plurality of image vectors of each cluster as a center vector.

다음으로, 모델생성부(200)는 S160 단계에서 모든 영상벡터가 앞서(S150) 산출된 클러스터의 중심벡터와의 거리가 최소인 클러스터에 포함되었는지 여부를 확인한다. Next, the model generator 200 checks whether all the image vectors are included in the cluster having the minimum distance from the center vector of the cluster calculated above ( S150 ) in step S160 .

S160 단계의 확인 결과, 모든 영상벡터가 앞서(S150) 산출된 클러스터의 중심벡터와의 거리가 최소인 클러스터에 포함되어 있지 않으면, 모델생성부(200)는 전술한 S140 단계 내지 S160 단계를 반복한다. As a result of checking in step S160, if all the image vectors are not included in the cluster having the minimum distance to the center vector of the cluster calculated above (S150), the model generator 200 repeats steps S140 to S160 described above. .

즉, 데이터처리부(100)는 다음의 수학식 1이 최소가 될 때까지 S140 단계 내지 S160 단계를 반복한다. That is, the data processing unit 100 repeats steps S140 to S160 until the following Equation 1 becomes the minimum.

수학식 1에서, k는 클러스터의 인덱스이고, Xk는 k번째 클러스터의 중심벡터를 나타낸다. n은 영상벡터의 인덱스이고, Un은 n번째 영상벡터를 의미한다.

는 n번째 영상벡터가 k번째 클러스터에 속하면 1, 그렇지 않으면 0을 가지는 플래그 변수를 나타낸다. In Equation 1, k is the index of the cluster, and Xk is the center vector of the k-th cluster. n is the index of the image vector, and Un means the nth image vector.

denotes a flag variable having 1 if the n-th image vector belongs to the k-th cluster, and 0 otherwise.

반면, S160 단계의 확인 결과, 모든 영상벡터가 앞서(S150) 산출된 클러스터의 중심벡터와의 거리가 최소인 클러스터에 포함되어 있으면, 모델생성부(200)는 현재의 클러스터를 최적화된 클러스터로 결정한다. On the other hand, as a result of checking in step S160, if all the image vectors are included in the cluster having the minimum distance from the center vector of the cluster calculated previously (S150), the model generator 200 determines the current cluster as the optimized cluster. do.

그러면, 모델생성부(200)는 S180 단계에서 최적화된 복수의 클러스터 각각에 대해 도 8에 도시된 바와 같이, 동일한 클러스터 내에서 클러스터의 중심(CX)으로부터 소정 반경(R) 이내에 포함된 복수의 영상벡터를 추출한다. 즉, 모델생성부(200)는 어느 하나의 클러스터 내에서 클러스터의 중심(CX)으로부터 소정 반경(R)을 가지는 영역(FA)을 형성하고, 형성된 영역 내의 복수의 영상벡터를 추출한다. 그런 다음, 모델생성부(200)는 S190 단계에서 추출된 복수의 영상벡터에 대응하는 복수의 영상을 복수의 학습용 영상으로 추출하여 추출된 복수의 학습용 영상을 포함하는 학습용 영상 배치(batch)를 생성한다. 이와 같이, 클러스터의 중심(CX)으로부터 소정 반경(R)을 가지는 영역(FA) 내의 영상벡터에 대응하는 복수의 영상을 학습용 영상으로 추출함으로써 특이치(Outlier)를 안정적으로 제거할 수 있다. Then, as shown in FIG. 8 for each of the plurality of clusters optimized in step S180 , the model generator 200 generates a plurality of images included within a predetermined radius R from the center CX of the cluster within the same cluster. Extract the vector. That is, the model generator 200 forms an area FA having a predetermined radius R from the center CX of the cluster in any one cluster, and extracts a plurality of image vectors in the formed area. Then, the model generation unit 200 extracts a plurality of images corresponding to the plurality of image vectors extracted in step S190 as a plurality of images for learning, and generates a batch of images for learning including the plurality of images for learning. do. As described above, outliers can be stably removed by extracting a plurality of images corresponding to the image vectors in the area FA having a predetermined radius R from the center CX of the cluster as training images.

그러면, 학습용 영상 배치를 이용하여 식별망(RN)을 학습시키는 방법에 대해서 설명하기로 한다. 도 9는 본 발명의 실시예에 따른 학습용 영상 배치(Batch)를 이용하여 식별망(RN)을 학습시키는 방법을 설명하기 위한 흐름도이다. 도 9의 본 발명의 실시예에서 모델생성부(200)는 다음의 수학식 2와 같은 손실 함수를 통해 학습을 수행한다. Then, a method for learning the identification network (RN) using the training image arrangement will be described. 9 is a flowchart illustrating a method of learning an identification network (RN) using a batch of images for training according to an embodiment of the present invention. In the embodiment of the present invention of FIG. 9 , the model generator 200 performs learning through a loss function as shown in Equation 2 below.

수학식 2에서, L은 손실을 의미한다.

및

는 하이퍼파라미터이며,

이다. 특히, 하이퍼라미터

및

는 학습용 영상 배치에 포함된 복수의 학습용 영상에서 데이터처리부(100)에 의해 수집된 영상 및 사용자장치(20)가 전송한 영상 각각이 차지하는 비율에 따라 그 값이 설정된다. 또한, i는 학습용 영상에 대응하는 인덱스이다. yi는 i번째 학습용 영상에 대한 레이블이고, f(xi)는 i번째 학습용 영상에 대한 식별망(RN)의 출력값이다. In Equation 2, L means loss.

and

is a hyperparameter,

am. In particular, hyperparameters

and

The value of is set according to the ratio of each of the images collected by the data processing unit 100 and the images transmitted by the user device 20 in the plurality of learning images included in the training image arrangement. Also, i is an index corresponding to an image for training. yi is a label for the ith training image, and f(xi) is an output value of the identification network (RN) for the ith training image.

도 9를 참조하면, 모델생성부(200)는 S210 단계에서 학습용 영상 배치(Batch)를 마련한다. 학습용 영상 배치를 마련하는 구체적인 방법에 대해서는 앞서 도 7 및 도 8을 참조로 설명되었다. Referring to FIG. 9 , the model generator 200 prepares a batch of images for training in step S210 . A specific method of preparing an image arrangement for learning has been previously described with reference to FIGS. 7 and 8 .

다음으로, 모델생성부(200)는 S220 단계에서 하이퍼파라미터를 설정한다. 모델생성부(200)는 학습용 영상 배치에 포함된 복수의 학습용 영상에서 데이터처리부(100)에 의해 수집된 영상 및 사용자장치(20)가 전송한 영상 각각이 차지하는 비율에 따라 하이퍼라미터

및

의 값을 설정한다. 손실함수의 첫 번째 항은 특이치(Outlier)에 강인한(robust) 특성을 가지며, 두 번째 항은 특이치(Outlier)에 민감한(sensitive) 특성을 가진다. 이에 따라, 데이터처리부(100)에 의해 수집된 영상의 경우, 검증이 이루어진 영상이 아니기 때문에 특이치(Outlier)가 존재할 가능성이 높다. 따라서 학습용 영상 중 데이터처리부(100)에 의해 수집된 영상의 비율이 높을수록

값을 높게 설정한다. 반면, 사용자장치(20)가 전송한 영상의 경우, 검증이 한번 이루어지기 때문에 특이치(Outlier)가 존재할 가능성이 희박하다. 따라서 학습용 영상 중 사용자장치(20)가 전송한 영상의 비율이 높을수록

값을 높게 설정한다. Next, the model generation unit 200 sets the hyperparameter in step S220. The model generating unit 200 uses a hyperparameter according to a ratio of each of the image collected by the data processing unit 100 and the image transmitted by the user device 20 in the plurality of training images included in the training image arrangement.

and

set the value of The first term of the loss function is robust to outliers, and the second term is sensitive to outliers. Accordingly, in the case of the image collected by the data processing unit 100 , since the image is not verified, there is a high possibility that an outlier exists. Therefore, the higher the ratio of the images collected by the data processing unit 100 among the learning images, the more

Set the value high. On the other hand, in the case of an image transmitted by the user device 20, since verification is performed once, the possibility that an outlier exists is slim. Therefore, the higher the ratio of the image transmitted by the user device 20 among the learning images, the more

Set the value high.

다음으로, 모델생성부(200)는 S230 단계에서 저장모듈(12)에서 필요한 정보를 추출하여 학습용 영상 배치 각각에 포함된 학습용 영상에 대응하는 레이블을 설정한다. 레이블은 원 핫 인코딩을 이용하여 설정된다. 예컨대, 학습용 영상에 랜드마크가 존재하는 경우, [1, 0]이며, 그 반대의 경우, [0, 1]이 될 수 있다. Next, the model generator 200 extracts necessary information from the storage module 12 in step S230 and sets a label corresponding to the training image included in each training image arrangement. Labels are set using one-hot encoding. For example, if a landmark exists in the learning image, it may be [1, 0], and vice versa, it may be [0, 1].

이어서, 모델생성부(200)가 학습용 영상을 식별모델(RM)에 입력하면, 식별모델(RM)은 S240 단계에서 입력된 학습용 영상에 대해 복수의 계층 간의 가중치가 적용되는 복수의 연산을 수행하여 출력값을 산출한다. 보다 구체적으로 설명하면, 백본망(BN)이 복수의 컨벌루션 연산을 수행하여 특징지도(FM)를 생성한다. 생성된 특징지도(FM)는 영역검출망(RPN)에 입력되며, 영역검출망(RPN)은 특징지도(FM)에서 영역상자를 통해 하나 이상의 관심 영역을 도출한다. 그리고 식별망(RN)은 영역검출망(RPN)으로부터 관심영역을 정의하는 영역상자에 대한 정보와, 백본망(BN)으로부터 특징지도(FM)를 입력받는다. 여기서, 영역상자에 대한 정보는 중심좌표(x, y), 중심좌표를 기준으로 하는 폭(w) 및 높이(h)를 포함한다. 식별망(RN)은 특징지도(FM)에 대해 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 영역상자 내에 학습 대상 랜드마크가 존재하는지 여부를 나타내는 확률을 산출하고, 산출된 확률을 출력값으로 출력한다. Then, when the model generating unit 200 inputs the training image to the identification model RM, the identification model RM performs a plurality of operations in which weights between a plurality of layers are applied to the training image input in step S240. Calculate the output value. More specifically, the backbone network (BN) generates a feature map (FM) by performing a plurality of convolution operations. The generated feature map FM is input to a region detection network RPN, and the region detection network RPN derives one or more regions of interest from the feature map FM through a region box. And the identification network (RN) receives information on the region box defining the region of interest from the region detection network (RPN) and the feature map (FM) from the backbone network (BN). Here, the information on the area box includes the central coordinates (x, y), the width (w) and the height (h) based on the central coordinates. The identification network (RN) calculates a probability indicating whether a learning target landmark exists in the area box by performing a plurality of operations to which a plurality of inter-layer weights are applied on the feature map (FM), and outputs the calculated probability as an output value output as

그러면, 모델생성부(200)는 S250 단계에서 수학식 2와 같은 손실 함수를 통해 출력값과 레이블의 차이인 손실이 되도록 역전파(Back-propagation) 알고리즘을 통해 식별망(RN) 및 백본망(BN)의 가중치(w)를 수정하는 최적화를 수행한다. Then, the model generation unit 200 through a back-propagation algorithm so that the loss that is the difference between the output value and the label through the loss function as in Equation 2 in step S250, the identification network (RN) and the backbone network (BN) ), an optimization is performed to correct the weight w.

전술한 S210 단계 내지 S250 단계는 학습용 영상 배치 내의 모든 학습용 영상에 대해 수행된다. 그런 다음, 평가 지표를 통해 정확도를 산출하고, 목표하는 정확도에 도달할 때까지 서로 다른 복수의 학습용 영상 배치를 통해 반복하여 수행될 수 있다. Steps S210 to S250 described above are performed for all training images in the training image arrangement. Then, the accuracy may be calculated through the evaluation index, and it may be repeatedly performed through a plurality of different training image arrangements until a target accuracy is reached.

다음으로, 전술한 바와 같이, 학습이 완료된 후, 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 방법에 대해서 설명하기로 한다. 도 10은 본 발명의 실시예에 따른 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한방법을 설명하기 위한 흐름도이다. 도 11 및 도 12는 본 발명의 실시예에 따른 심층학습모델을 기초로 위치 변화에 무관하게 랜드 마크를 식별하기 위한 방법을 설명하기 위한 도면이다. Next, as described above, after learning is completed, a method for identifying landmarks regardless of location changes based on the deep learning model will be described. 10 is a flowchart for explaining a method for identifying a landmark regardless of a location change based on a deep learning model according to an embodiment of the present invention. 11 and 12 are diagrams for explaining a method for identifying a landmark regardless of a location change based on a deep learning model according to an embodiment of the present invention.

도 10을 참조하면, 사용자장치(20)는 사용자의 조작에 따라 식별서버(10)에 접속하여 사용자가 촬영한 영상 및 영상을 촬영한 위치를 나타내는 위치 정보를 식별서버(10)로 전송할 수 있다. 예컨대, 도 11에 도시된 바와 같이, 사용자는 맨해튼 브릿지와 브루클린 브릿지의 사이에 위치한 상태라고 가정한다. Referring to FIG. 10 , the user device 20 may connect to the identification server 10 according to a user's operation and transmit an image captured by the user and location information indicating a location at which the image was captured to the identification server 10 . . For example, as shown in FIG. 11 , it is assumed that the user is located between the Manhattan Bridge and the Brooklyn Bridge.

데이터처리부(100)는 S310 단계에서 통신모듈(11)을 통해 영상 및 위치정보를 수신하면, 수신된 영상 및 위치 정보를 식별부(300)에 제공한다. 이에 따라, 식별부(300)는 데이터처리부(100)로부터 제공받은 영상을 식별모델(RM)에 입력한다. 그러면, 식별모델(RM)의 백본망(BN)은 S320 단계에서 입력된 영상에 대해 복수의 컨벌루션 계층의 컨벌루션 연산을 수행하여 특징지도를 생성한다. When the data processing unit 100 receives the image and location information through the communication module 11 in step S310 , the data processing unit 100 provides the received image and location information to the identification unit 300 . Accordingly, the identification unit 300 inputs the image provided from the data processing unit 100 into the identification model RM. Then, the backbone network (BN) of the identification model (RM) generates a feature map by performing a convolution operation of a plurality of convolutional layers on the input image in step S320.

생성된 특징지도(FM)는 영역검출망(RPN)에 입력되며, 영역검출망(RPN)은 S330 단계에서 특징지도(FM)에서 영역상자(BB: x, y, w, h)를 통해 하나 이상의 관심 영역(ROI)을 도출한다. The generated feature map (FM) is input to the region detection network (RPN), and the region detection network (RPN) is one through the region box (BB: x, y, w, h) in the feature map (FM) in step S330. A region of interest (ROI) above is derived.

그러면, 식별부(300)는 입력된 위치 정보와 가까운 순서에 따른 랜드마크를 식별하도록 학습된 식별망(RN)을 로드한다. 예컨대, 도 11에서 사용자의 위치(U)와 가장 가까운 랜드마크가 맨해튼 브릿지(B)라고 가정하면, 식별부(300)는 랜드마크로 맨해튼 브릿지(B)를 식별하도록 학습된 식별망(RN)을 로드한다. Then, the identification unit 300 loads the learned identification network (RN) to identify the landmark according to the order close to the input location information. For example, assuming that the landmark closest to the user's location (U) in FIG. 11 is the Manhattan Bridge (B), the identification unit 300 is an identification network (RN) learned to identify the Manhattan Bridge (B) as a landmark. load

식별망(RN)이 로드되면, 백본망(BN)은 식별망(RN)에 특징지도(FM)를 제공하고, 영역검출망(RPN)은 관심영역을 정의하는 영역상자를 식별망(RN)에 제공한다. 여기서, 영역상자는 중심좌표(x, y), 중심좌표를 기준으로 하는 폭(w) 및 높이(h)를 나타낸다. 이때, 영역검출망(RPN)은 S350 단계에서 영역상자의 크기가 큰 순서에 따라 순차로 관심영역의 정보를 제공할 수 있다. 예컨대, 도 12를 참조하면, 제3 영역상자(BB3)의 크기가 제4 영역상자(BB4)의 크기보다 크기 때문에 우선, 영역검출망(RPN)은 제3 영역상자(BB3)의 중심좌표, 폭 및 높이(x, y, w, h)를 식별망(RN)에 제공한다. When the identification network (RN) is loaded, the backbone network (BN) provides a feature map (FM) to the identification network (RN), and the region detection network (RPN) creates a region box defining the region of interest. provided to Here, the area box represents the central coordinates (x, y), and the width (w) and height (h) based on the central coordinates. In this case, the region detection network (RPN) may sequentially provide information on the region of interest in the order of increasing the size of the region box in step S350. For example, referring to FIG. 12 , since the size of the third area box BB3 is larger than the size of the fourth area box BB4, first, the area detection network RPN determines the center coordinates of the third area box BB3, The width and height (x, y, w, h) are provided to the identification network (RN).

그러면, 식별망(RN)은 S360 단계에서 특징지도(FM)에 대해 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 영역상자(BB3) 내에 학습 대상 랜드마크(예컨대, 맨하튼 브릿지)가 존재하는지 여부를 나타내는 확률을 산출하고, 산출된 확률을 출력값으로 출력한다. Then, the identification network RN performs a plurality of operations in which a plurality of inter-layer weights are applied to the feature map FM in step S360 so that a learning target landmark (eg, Manhattan Bridge) exists in the area box BB3. A probability indicating whether or not to do so is calculated, and the calculated probability is output as an output value.

그러면, 식별부(300)는 S370 단계에서 출력값에 따라 랜드마크가 식별되었는지 여부를 판단한다. 일례로, 식별망(RN)이 확률 (0.122, 0.878)을 출력하면, 식별부(300)는 이러한 확률 (0.122, 0.878)을 통해 영역상자, 예컨대, 제3 영역상자(BB3) 내에 랜드마크(예컨대, 맨해큰 브릿지)가 존재할 확률이 12%이고, 랜드마크가 존재하지 않을 확률이 88%임을 알 수 있어 그 확률에 따라 영역상자 내의 객체가 랜드마크가 아닌 것으로 판단할 수 있다. 반대로, 식별망(RN)이 확률 (0.746, 0.254)을 출력하면, 식별부(300)는 이러한 확률 (0.746, 0.254)을 통해 영역상자, 예컨대, 제3 영역상자(BB4) 내에 랜드마크(예컨대, 맨해큰 브릿지)가 존재할 확률이 75%이고, 랜드마크가 존재하지 않을 확률이 25%임을 알 수 있어 그 확률에 따라 영역상자 내의 객체가 랜드마크인 것으로 판단할 수 있다. Then, the identification unit 300 determines whether the landmark is identified according to the output value in step S370. For example, if the identification network (RN) outputs a probability (0.122, 0.878), the identification unit 300 uses these probabilities (0.122, 0.878) to generate a landmark ( For example, it can be seen that the probability of the existence of the Manhattan Bridge) is 12% and the probability that the landmark does not exist is 88%. Conversely, if the identification network RN outputs probabilities (0.746, 0.254), the identification unit 300 uses these probabilities (0.746, 0.254) to generate landmarks (eg, within the area box, for example, the third area box BB4). , Manhattan Bridge), the probability of existence is 75%, and the probability that the landmark does not exist is 25%, so it can be determined that the object in the area box is a landmark according to the probability.

S370 단계의 판단 결과, 랜드마크 식별이 이루어지면, 식별부(300)는 S380 단계에서 식별된 랜드마크의 명칭과 설명을 통신모듈(11)을 통해 사용자장치(20)로 전송한다. 추가로, 식별부(300)는 랜드마크 주변의 맛집 및 관광명소 등의 부가 정보를 사용자장치(20)로 전송할 수 있다. As a result of the determination in step S370 , when the landmark is identified, the identification unit 300 transmits the name and description of the landmark identified in step S380 to the user device 20 through the communication module 11 . In addition, the identification unit 300 may transmit additional information such as restaurants and tourist attractions around the landmark to the user device 20 .

반면, S370 단계의 판단 결과, 랜드마크 식별이 이루어지지 않으면, 식별부(300)는 모든 관심영역에서 랜드마크 식별이 실패하였는지 여부를 판별한다. 이러한 판별 결과, 모든 관심영역에서 랜드마크 식별이 실패하지 않은 경우, 식별망(RN)은 영역검출망(RPN)으로부터 차순위의 크기를 가지는 다른 관심영역을 정의하는 영역상자를 수신하여 전술한 S350 단계 내지 S370 단계를 반복한다. 예컨대, 앞서 전송된 제3 영역상자(BB3) 다음으로 크기가 큰 영역상자인 제4 영역상자(BB4)를 이용한다. On the other hand, as a result of the determination in step S370, if landmark identification is not made, the identification unit 300 determines whether landmark identification has failed in all regions of interest. As a result of this determination, if landmark identification in all regions of interest does not fail, the identification network (RN) receives a region box defining another region of interest having the next-order size from the region detection network (RPN) and receives the above-described step S350 to S370 are repeated. For example, the fourth area box BB4, which is the larger area box next to the previously transmitted third area box BB3, is used.

한편, 모든 관심영역에서 랜드마크 식별이 실패한 경우, 식별부(300)는 S400 단계로 진행하여, 사용자장치(20)가 전송한 위치 정보(U)로부터 소정 반경 내에 위치한 다른 랜드마크 각각을 식별하도록 학습된 식별망(RN) 모두에서 사용자장치(20)가 전송한 영상 내의 랜드마크를 식별하는 데에 실패하였는지 여부를 확인한다. On the other hand, if landmark identification fails in all regions of interest, the identification unit 300 proceeds to step S400 to identify each other landmark located within a predetermined radius from the location information U transmitted by the user device 20. In all of the learned identification networks (RN), it is checked whether the user device 20 fails to identify the landmark in the transmitted image.

S400 단계의 확인 결과, 모든 식별망(RN)이 랜드마크 식별을 실패하지 않았다면, 식별부(300)는 S340 단계로 진행하여 사용자장치(20)가 전송한 위치 정보와의 거리가 차순위인 랜드마크를 식별하도록 학습된 식별망(RN)으로 교체하여 전술한 절차를 반복한다. 예컨대, 도 11을 참조하면, 맨해큰 브릿지 대신 브르클린 브릿지를 식별할 수 있는 식별망(RN)으로 교체하고, 해당 식별망(RN)에 제1 영역상자(BB1) 및 제2 영역상자(BB2)를 순차로 제공하여 랜드마크(예컨대, 브루클린 브릿지)를 식별하도록 할 수 있다. As a result of the check in step S400, if all identification networks (RN) do not fail to identify the landmark, the identification unit 300 proceeds to step S340 and the landmark whose distance from the location information transmitted by the user device 20 is the next priority. Repeat the above procedure by replacing it with a learned identification network (RN) to identify . For example, referring to FIG. 11 , the Brooklyn Bridge is replaced with an identification network RN that can identify the Manhattan Bridge, and the first area box BB1 and the second area box BB2 are connected to the corresponding identification network RN. ) can be provided sequentially to identify landmarks (eg, Brooklyn Bridge).

반면, S400 단계의 확인 결과, 모든 식별망(RN)이 랜드마크 식별을 실패하였다면, 식별부(300)는 S410 단계로 진행하여 통신모듈(11)을 통해 사용자장치(20)로 해당 영상에서 랜드마크를 식별이 불가함을 알린다. On the other hand, as a result of the confirmation of step S400, if all identification networks (RN) have failed to identify landmarks, the identification unit 300 proceeds to step S410 and sends the land from the image to the user device 20 through the communication module 11. Indicates that the mark cannot be identified.

한편, 전술한 본 발명의 실시예에 따른 방법은 다양한 컴퓨터수단을 통하여 판독 가능한 프로그램 형태로 구현되어 컴퓨터로 판독 가능한 기록매체에 기록될 수 있다. 여기서, 기록매체는 프로그램 명령, 데이터 파일, 데이터구조 등을 단독으로 또는 조합하여 포함할 수 있다. 기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 예컨대 기록매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광 기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함한다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 와이어뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 와이어를 포함할 수 있다. 이러한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다. Meanwhile, the method according to the embodiment of the present invention described above may be implemented in the form of a program readable by various computer means and recorded in a computer readable recording medium. Here, the recording medium may include a program command, a data file, a data structure, etc. alone or in combination. The program instructions recorded on the recording medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software. For example, the recording medium includes magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floppy disks ( magneto-optical media) and hardware devices specially configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions may include not only machine language wires such as those generated by a compiler, but also high-level language wires that can be executed by a computer using an interpreter or the like. Such hardware devices may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상 본 발명을 몇 가지 바람직한 실시예를 사용하여 설명하였으나, 이들 실시예는 예시적인 것이며 한정적인 것이 아니다. 이와 같이, 본 발명이 속하는 기술분야에서 통상의 지식을 지닌 자라면 본 발명의 사상과 첨부된 특허청구범위에 제시된 권리범위에서 벗어나지 않으면서 균등론에 따라 다양한 변화와 수정을 가할 수 있음을 이해할 것이다. Although the present invention has been described above using several preferred embodiments, these examples are illustrative and not restrictive. As such, those of ordinary skill in the art to which the present invention pertains will understand that various changes and modifications can be made in accordance with the doctrine of equivalents without departing from the spirit of the present invention and the scope of rights set forth in the appended claims.

10: 식별서버 11: 통신모듈
12: 저장모듈 13: 제어모듈
20: 사용자장치 100: 데이터처리부
200: 모델생성부 300: 식별부 10: identification server 11: communication module
12: storage module 13: control module
20: user device 100: data processing unit
200: model generation unit 300: identification unit

Claims

In the method for identifying a landmark regardless of a change in location,
receiving, by a data processing unit, an image and location information at which the image was captured from the user device;
calculating, by an identification unit, a probability indicating whether a learned landmark exists in the image through an identification model including an identification network corresponding to the location information; and
identifying the landmark according to the probability by an identification unit, and transmitting a name and description of the identified landmark to the user device;
includes,
The step of calculating the probability is
generating, by the backbone network of the identification model, a feature map through a convolution operation of a plurality of convolutional layers with respect to the image;
determining, by the region detection network of the identification model, a region box representing the region of interest in the feature map;
Loading the identification network learned to identify the landmark based on the identification unit the location information; and
The identification network receives the feature map and the region box, and calculates a probability indicating whether a learned landmark exists in the region box through a plurality of operations in which a plurality of inter-layer weights are applied to the feature map to do;
including,
The identification unit loads the identification network learned to identify landmarks according to the order close to the location information,
The identification unit loads the identification network learned to identify the landmark closest to the location information, and sequentially in each of the plurality of area boxes in the order of the size of the area box among the plurality of area boxes based on the identification network. Determining whether the nearest landmark is identified,
The identification unit determines that the closest landmark is identified when the nearest landmark is identified in all of the plurality of area boxes,
When the identification unit succeeds in identification of the nearest landmark based on some of the area boxes among the plurality of area boxes, and fails to identify the nearest landmark based on the remaining area boxes, the identification network Receiving another area box defining another area of interest having a next-order size from the area detection network and re-determining whether the nearest landmark is identified based on the other area box,
When the identification unit fails to identify the nearest landmark based on each of the plurality of area boxes, the identification network is based on another identification network for identifying another landmark located within a predetermined radius from the location information. A method for identifying a landmark, comprising defining a plurality of different area boxes to perform identification of the different landmarks based on each of the different plurality of area boxes.

delete

According to claim 1,
Before receiving the image and the location information at which the image was taken,
Further comprising the step of providing a training image arrangement including a plurality of training images by the model generator,
The training video arrangement is
collecting a plurality of images including the landmark;
generating a plurality of image vectors by embedding the plurality of images in a multidimensional vector space through a plurality of convolution operations;
determining a center vector of a cluster randomly on the vector space;
clustering the plurality of image vectors according to a distance from the center vector;
recrystallizing a center vector of a cluster for each cluster generated based on the clustering;
determining whether all of the plurality of image vectors are included in a cluster having a minimum distance from the center vector of the re-determined cluster;
When all of the plurality of image vectors are not included in the cluster having the minimum distance from the center vector of the recrystallized cluster, the plurality of image vectors are again clustered according to the distance from the center vector, and the plurality of images determining a current cluster as an optimization cluster when all vectors are included in a cluster having a minimum distance from the center vector of the cluster recrystallized;
extracting a plurality of image vectors within a predetermined radius from the center of the optimization cluster; and
Method for identifying a landmark, characterized in that determined based on the step of determining a plurality of images corresponding to the extracted plurality of image vectors as the learning image arrangement.

delete