KR102149051B1

KR102149051B1 - System and method for analyzing document using self confidence based on ocr

Info

Publication number: KR102149051B1
Application number: KR1020200050181A
Authority: KR
Inventors: 정안재; 김상헌
Original assignee: 주식회사 애자일소다
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2020-08-28
Anticipated expiration: 2040-04-24

Abstract

자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 시스템 및 방법을 개시한다. 본 발명은 OCR 인식에 기반하여 인식된 정보에 대한 신뢰 점수를 제공함으로써 검사자에 의한 확인 작업시간을 감소시킬 수 있다.Disclosed is an OCR-based document analysis system and method using self-reliability information. The present invention can reduce the verification work time by the inspector by providing a confidence score for the recognized information based on OCR recognition.

Description

OCR-based document analysis system and method using self-reliability information {SYSTEM AND METHOD FOR ANALYZING DOCUMENT USING SELF CONFIDENCE BASED ON OCR}

본 발명은 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 시스템 및 방법에 관한 발명으로서, 더욱 상세하게는 OCR을 이용하여 인식된 정보에 대한 신뢰 점수를 제공함으로써 검사자에 의한 확인 작업시간을 감소시킬 수 있는 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 시스템 및 방법에 관한 것이다.The present invention relates to an OCR-based document analysis system and method using self-reliability information, and more specifically, a self-confirmation work time that can be reduced by an inspector by providing a confidence score for information recognized using OCR. It relates to an OCR-based document analysis system and method using reliability information.

개인이나 기업은 경제 활동 중에 수집하는 영수증을 경비보고, 지출결의 등의 회계 처리 혹은 종합소득세 신고 등의 세무 처리를 위한 사후 증빙 서류로 보관하여 관리한다.Individuals and companies maintain and manage receipts collected during economic activities as post-mortem documents for accounting for expense reports, expenditure resolutions, or tax processing such as comprehensive income tax returns.

이렇게 보관 및 관리되는 영수증은 종이로 되어 있기 때문에 근본적으로 원본의 훼손 위험이 있으며, 오손, 분실, 부패에 대한 노출을 예방해야 하는 기술적, 경제적 부담이 있고, 영수증 보관량의 증대에 비례하여 영수증 보관 공간을 늘려야 하는 문제점이 있다.Since the receipts stored and managed in this way are made of paper, there is a risk of fundamentally damaging the original, there is a technical and economic burden to prevent exposure to contamination, loss, and corruption, and the receipt storage space in proportion to the increase in the amount of receipt storage There is a problem that needs to be increased.

또한, 개인이나 기업의 담당자는 수기 입력 방식으로 종래의 영수증으로부터 상기한 회계 처리 혹은 세무 처리에 필요한 정보를 추출하고 분류하여 장부에 기입하거나 회계관리 프로그램이 설치된 PC에 입력, 저장하기 때문에 정보 추출 작업이 불편한 문제점이 있다.In addition, the person in charge of an individual or company extracts and classifies the information necessary for the above accounting or tax processing from the conventional receipt by hand input method and writes it in the ledger or enters and stores it in a PC with an accounting management program installed. There is this inconvenient problem.

한편, 문서에 포함된 글자(텍스트) 이미지는 기계 인코딩을 통해 변환할 수 있는 데, 기계 인코딩을 통해 변환된 글자는 전자적으로 편집, 검색 등이 가능하고, 변환된 글자는 파일 등의 형태로 데이터베이스에 저장할 수도 있게 된다.On the other hand, text (text) images included in documents can be converted through mechanical encoding. Characters converted through mechanical encoding can be electronically edited and searched, and converted characters are converted into a database in the form of a file, etc. You can also save it.

이러한 기계 인코딩은 주로 광학문자인식(OCR)을 통해 수행될 수 있고, 컴퓨터 등을 이용하여 이미지 기반의 텍스트 문서를 자동으로 감지, 식별 및 인코딩할 수 있다.Such mechanical encoding may be performed mainly through optical character recognition (OCR), and image-based text documents may be automatically detected, identified, and encoded using a computer or the like.

한국 등록특허공보 등록번호 제10-1139801호(발명의 명칭: 영수증 판독을 통한 자동 정보 수집 시스템 및 방법)에는 종래의 영수증에 인쇄된 구매 물품, 구매 수량, 사용 금액 등을 OCR을 통해 판독하여 저장함으로써 해당 영수증의 사용자의 구매 정보를 자동으로 수집, 관리하는 구성이 개시되어 있다.In Korean Registered Patent Publication No. 10-1139801 (title of the invention: automatic information collection system and method through receipt reading), the purchased goods, purchase quantities, and usage amounts printed on conventional receipts are read and stored through OCR. By doing so, a configuration for automatically collecting and managing purchase information of the user of the receipt is disclosed.

그러나, 종래 기술에 따른 OCR은 저품질의 프린터 또는 팩스 등에서 인쇄되거나, 해상도가 낮은 촬영수단에서 이미지화되거나, 구겨지거나, 또는 기울어진 상태에서 촬영된 이미지의 경우 OCR의 인식 정확도가 떨어지는 문제점이 있다.However, the OCR according to the prior art has a problem in that the recognition accuracy of the OCR is deteriorated in the case of an image printed on a low-quality printer or fax machine, imaged in a low-resolution photographing means, or photographed in a wrinkled or inclined state.

또한, 종래 기술에 따른 정보 수집 시스템 및 방법은 OCR을 통해 수집된 정보에 대하여 검사자가 인식된 정보에 대하여 이상 유/무를 모두 확인해야만 하는 문제점이 있다.In addition, the information collection system and method according to the prior art have a problem in that the inspector must check all the presence/absence of abnormalities in the information recognized by the inspector for the information collected through OCR.

한국 등록특허공보 등록번호 제10-1139801호(발명의 명칭: 영수증 판독을 통한 자동 정보 수집 시스템 및 방법)Korean Registered Patent Publication No. 10-1139801 (Name of invention: Automatic information collection system and method through receipt reading)

이러한 문제점을 해결하기 위하여, 본 발명은 OCR 인식에 기반하여 인식된 정보에 대한 신뢰 점수를 제공함으로써 검사자에 의한 확인 작업시간을 감소시킬 수 있는 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 시스템 및 방법을 제공하는 것을 목적으로 한다.In order to solve this problem, the present invention provides an OCR-based document analysis system and method using self-reliability information that can reduce the verification work time by the inspector by providing a confidence score for the recognized information based on OCR recognition. It aims to do.

상기한 목적을 달성하기 위하여 본 발명의 일 실시 예는 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 시스템으로서, 객체 탐지 모델을 이용하여 인식 대상 이미지에서 임의의 형식(form), 글자 및 숫자 중 적어도 하나의 객체의 위치를 탐지하고, 탐지된 형식, 글자 및 숫자 객체의 둘레를 따라 사각형상을 표시하여 사각형상의 픽셀 위치 값을 생성하며, OCR 모델을 이용하여 상기 사각형상의 픽셀 안에서 인식되는 글자 및 숫자 정보를 인식하며, 상기 생성된 사각형상의 픽셀 위치 값을 기반으로 인접한 모든 사각형상의 픽셀들을 연결하며, 상기 연결된 사각형상의 픽셀 위치에 상기 OCR 모델을 통해 인식된 글자 및 숫자 정보를 매칭시켜 디스플레이 하는 문서 분석 장치를 포함한다.In order to achieve the above object, an embodiment of the present invention is an OCR-based document analysis system using self-reliability information, and at least one of a form, letter, and number in an image to be recognized using an object detection model. It detects the location of an object, displays a square shape along the perimeter of the detected format, letter and number object to generate a square pixel position value, and uses the OCR model to retrieve the character and number information recognized in the square pixel. A document analysis device that recognizes, connects all adjacent square pixels based on the generated square pixel position value, and matches and displays letter and numeric information recognized through the OCR model with the connected square pixel position. Include.

또한, 상기 실시 예에 따른 문서 분석 장치는 OCR 모델을 이용한 인식률에 기반한 인식된 글자 및 숫자의 신뢰 점수를 산출하고, 산출된 신뢰 점수를 상기 디스플레이 정보에 반영하여 시각적으로 표시되도록 하는 것을 특징으로 한다.In addition, the document analysis apparatus according to the embodiment is characterized in that the reliability score of the recognized letters and numbers is calculated based on the recognition rate using the OCR model, and the calculated confidence score is reflected in the display information to be visually displayed. .

또한, 상기 실시 예에 따른 문서 분석 장치는 신뢰 점수에 따라 정상 인식 영역과, 틀린 영역 및 보정 영역을 포함한 오류발생 영역이 서로 다른 색상의 시각화 정보로 표시되도록 하는 것을 특징으로 한다.In addition, the document analysis apparatus according to the embodiment is characterized in that the normal recognition area, the wrong area, and the error occurrence area including the correction area are displayed as visualization information of different colors according to the confidence score.

또한, 상기 실시 예에 따른 신뢰 점수는 보정 모델을 이용하여 폼, 형상, 위치 중 적어도 하나의 보정 수행에 따라 산출되는 재건율이 추가 반영되는 것을 특징으로 한다.In addition, the confidence score according to the embodiment is characterized in that the reconstruction rate calculated by performing at least one of a form, a shape, and a location is additionally reflected using a correction model.

또한, 상기 실시 예에 따른 문서 분석 장치는 인식 대상 이미지를 수신하는 입력부; 상기 수신된 인식 대상 이미지에서 객체 탐지 모델을 이용하여 형식, 글자 및 숫자 객체 중 적어도 하나의 위치를 탐지하고, 탐지된 형식, 글자 및 숫자 객체의 둘레에 사각형상을 표시하며, 표시된 사각형상의 픽셀 위치 값을 생성하는 객체 탐지 모델링부; OCR 모델을 이용하여 상기 사각형상의 픽셀 내에서 인식되는 글자 및 숫자 정보를 출력하는 OCR 모델링부; 상기 인식된 글자를 항목 DB의 글자 정보로 교정함에 따라 숫자 정보를 갖는 임의의 사각형상의 픽셀 위치를 시작 위치로 하여 좌측 방향과 상측 방향으로 이동하되, 글자 정보가 검색되면 이동중에 검색된 모든 사각형상의 픽셀들을 연결하고, 상기 연결된 사각형상의 픽셀 위치에 상기 OCR 모델을 통해 교정된 글자 및 숫자 정보를 매칭시켜 디스플레이되도록 하는 폼 구성 모델링부; 보정 모델을 이용하여 상기 인식된 글자 및 숫자의 신뢰 점수를 산출하고, 산출된 신뢰 점수를 기반으로 상기 디스플레이에 반영하여 시각적으로 표시되도록 하는 신뢰도 평가부; 및 상기 생성된 사각형상의 픽셀 위치 값, 인식된 글자, 숫자 정보, 신뢰 정보와, 특정 기관에서 사용하는 문서 데이터의 폼을 저장하는 데이터베이스;를 포함하는 것을 특징으로 한다.In addition, the document analysis apparatus according to the embodiment includes an input unit for receiving an image to be recognized; In the received recognition target image, by using an object detection model, the position of at least one of a type, a letter, and a number object is detected, a square shape is displayed around the detected type, letter and number object, and the displayed square pixel position An object detection modeling unit that generates a value; An OCR modeling unit that outputs letter and number information recognized in the square pixel by using an OCR model; As the recognized character is corrected with the character information of the item DB, the pixel position of a random rectangle with numeric information is moved in the left direction and the upper direction as the starting position, but when the character information is searched, all the rectangular pixels searched during movement A form configuration modeling unit that connects them and matches the corrected letter and number information through the OCR model to the connected square pixel positions and displays them; A reliability evaluation unit that calculates a confidence score of the recognized letters and numbers using a correction model, and reflects it on the display based on the calculated confidence score to be visually displayed; And a database storing the generated square pixel position value, recognized letters, number information, trust information, and a form of document data used by a specific institution.

또한, 상기 실시 예에 따른 인식 대상 이미지에 포함된 글자에 대하여 임의의 문서로부터 미리 설정된 글자에 대한 항목 DB 정보를 생성하는 항목 DB 생성부;를 더 포함한다.In addition, it further includes an item DB generation unit for generating item DB information for preset characters from an arbitrary document with respect to the characters included in the recognition target image according to the embodiment.

또한, 상기 실시 예에 따른 폼 구성 모델링부는 상기 탐지된 글자 객체에 대하여 항목 DB 정보와 매칭시키고, 매칭 결과에 따라 인식된 글자가 교정되면, 상기 교정된 글자가 반영되도록 하는 것을 특징으로 한다.In addition, the form configuration modeling unit according to the embodiment may match the detected text object with item DB information, and when the recognized text is corrected according to the matching result, the corrected text is reflected.

또한, 상기 실시 예에 따른 폼 구성 모델링부는 글자 및 숫자의 사각형상 박스의 색상을 정상 인식 영역과, 틀린 영역 및 보정 영역을 포함한 오류발생 영역이 서로 다른 색상으로 표시되는 것을 특징으로 한다.In addition, the form configuration modeling unit according to the embodiment is characterized in that the color of the rectangular box of letters and numbers is displayed in different colors between a normal recognition area and an error area including a wrong area and a correction area.

또한, 본 발명의 일 실시 예에 따른 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 방법은 a) 문서 분석 장치가 수신된 인식 대상 이미지에서 객체 탐지 모델을 이용하여 임의의 형식, 글자 및 숫자 객체 중 적어도 하나의 위치를 탐지하고, 탐지된 형식, 글자 및 숫자 객체의 둘레에 사각형상을 표시하여 사각형상의 픽셀 위치 값을 생성하는 단계; b) 상기 문서 분석 장치가 OCR 모델을 이용하여 탐지된 사각형상 픽셀 안에서 인식되는 글자 및 숫자 정보를 출력하는 단계; c) 상기 문서 분석 장치가 상기 생성된 사각형상의 픽셀 위치 값과, 상기 인식된 글자 및 숫자 정보를 기반으로 숫자 정보를 갖는 임의의 사각형상의 픽셀 위치를 시작 위치로 하여 좌측 방향과 상측 방향으로 이동하되, 글자 정보가 검색되면 이동중에 검색된 모든 사각형상의 픽셀들을 연결하고, 상기 연결된 사각형상의 픽셀 위치에 상기 OCR 모델을 통해 인식된 글자 및 숫자 정보를 매칭시켜 디스플레이되도록 하는 단계; 및 d) 상기 문서 분석 장치가 보정 모델을 이용하여 상기 인식된 글자 및 숫자의 신뢰 점수를 산출하고, 산출된 신뢰 점수를 기반으로 디스플레이에 반영하여 시각적으로 표시하는 단계;를 포함한다.In addition, the OCR-based document analysis method using self-reliability information according to an embodiment of the present invention includes: a) at least one of an arbitrary format, letter, and numeric object by using an object detection model in a recognition target image received by the document analysis device. Detecting a position of and generating a square pixel position value by displaying a square shape around the detected type, letter and number object; b) outputting, by the document analysis device, character and number information recognized within the detected rectangular pixels using the OCR model; c) The document analysis device moves in the left direction and the upward direction using the generated square pixel position value and a random square pixel position having numerical information based on the recognized letter and number information as a starting position. , When the text information is searched, connecting all the square pixels retrieved while moving, and matching the text and number information recognized through the OCR model to the connected square pixel positions to be displayed; And d) calculating, by the document analysis device, a confidence score of the recognized letters and numbers using a correction model, and visually displaying it by reflecting on a display based on the calculated confidence score.

또한, 상기 실시 예에 따른 c) 단계는 문서 분석 장치가 상기 인식 대상 이미지에서 인식되는 글자 정보와 비교하기 위해, 임의의 문서에서 사용되는 글자(항목) 정보를 정의한 항목 DB를 생성하는 단계;를 더 포함하는 것을 특징으로 한다.In addition, step c) according to the embodiment includes the steps of generating an item DB defining character (item) information used in an arbitrary document in order to compare the character information recognized in the recognition target image by the document analysis device; It characterized in that it further includes.

또한, 상기 실시 예에 따른 c) 단계는 문서 분석 장치가 탐지된 글자 객체에 대하여 항목 DB 정보와 매칭시키고, 매칭 결과에 따라 인식된 글자를 교정하는 단계를 더 포함하는 것을 특징으로 한다.In addition, step c) according to the embodiment may further include the step of matching the detected text object with item DB information, and correcting the recognized text according to the matching result.

또한, 상기 실시 예에 따른 d) 단계의 보정 모델은 폼, 형상, 위치 중 적어도 하나의 보정 수행에 따른 재건율과, 항목 DB 정보와의 매칭 결과에 따른 교정된 글자의 반영 여부에 기반하여 신뢰 점수를 산출하는 것을 특징으로 한다.In addition, the correction model in step d) according to the above embodiment is trusted based on the reconstruction rate according to performing correction of at least one of form, shape, and position, and whether corrected characters are reflected according to the matching result with item DB information. It is characterized by calculating a score.

본 발명은 OCR을 통해 인식된 정보에 대하여 신뢰 점수를 기반으로 정상 인식된 부분과, 틀린 부분 또는 의심되는 부분을 서로 다른 색상의 시각화 정보로 제공함으로써 검사자에 의한 확인 작업시간을 감소시킬 수 있는 장점이 있다.The present invention provides the normally recognized part and the wrong part or suspected part as visualization information of different colors based on the confidence score for the information recognized through OCR, thereby reducing the verification work time by the inspector. There is this.

또한, 본 발명은 병원, 보험사 등의 기관에서 사용하는 다양한 포맷의 영수증에 대하여 정확하고, 신뢰할 수 있는 사용정보를 제공할 수 있는 장점이 있다.In addition, the present invention has the advantage of providing accurate and reliable usage information for receipts in various formats used by institutions such as hospitals and insurance companies.

도 1은 본 발명의 일 실시 예에 따른 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 시스템을 나타낸 블록도.
도 2는 도 1의 실시 예에 따른 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 시스템의 항목 DB 생성을 설명하기 위한 예시도.
도 3은 도 1의 실시 예에 따른 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 시스템의 항목 DB를 나타낸 예시도.
도 4는 도 1의 실시 예에 따른 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 시스템의 글자 위치 탐지를 설명하기 위한 예시도.
도 5는 도 4의 실시 예에 따른 글자 위치 탐지 결과를 나타낸 예시도.
도 6은 도 4의 실시 예에 따른 글자 위치에서 글자 객체를 인식하는 과정을 나타낸 예시도.
도 7은 도 1의 실시 예에 따른 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 시스템의 OCR 인식 결과를 나타낸 예시도.
도 8은 도 1의 실시 예에 따른 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 시스템의 객체 탐지 박스의 연결 과정을 설명하기 위한 예시도.
도 9는 도 7에 따른 연결 과정을 설명하기 위한 예시도.
도 10은 도 1의 실시 예에 따른 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 시스템의 재구성 이미지를 나타낸 예시도.
도 11은 본 발명의 일 실시 예에 따른 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 과정을 나타낸 흐름도.1 is a block diagram showing an OCR-based document analysis system using self-reliability information according to an embodiment of the present invention.
FIG. 2 is an exemplary diagram for explaining generation of an item DB in an OCR-based document analysis system using self-reliability information according to the embodiment of FIG. 1;
3 is an exemplary diagram showing an item DB of an OCR-based document analysis system using self-reliability information according to the embodiment of FIG. 1.
FIG. 4 is an exemplary diagram for explaining character position detection in an OCR-based document analysis system using self-reliability information according to the embodiment of FIG. 1;
5 is an exemplary view showing a result of character position detection according to the embodiment of FIG. 4.
6 is an exemplary view showing a process of recognizing a character object at a character position according to the embodiment of FIG. 4.
7 is an exemplary view showing a result of OCR recognition of an OCR-based document analysis system using self-reliability information according to the embodiment of FIG. 1.
FIG. 8 is an exemplary view for explaining a connection process of an object detection box of an OCR-based document analysis system using self-reliability information according to the embodiment of FIG. 1.
9 is an exemplary view for explaining a connection process according to FIG. 7.
10 is an exemplary view showing a reconstructed image of an OCR-based document analysis system using self-reliability information according to the embodiment of FIG. 1.
11 is a flowchart illustrating an OCR-based document analysis process using self-reliability information according to an embodiment of the present invention.

이하에서는 본 발명의 바람직한 실시 예 및 첨부하는 도면을 참조하여 본 발명을 상세히 설명하되, 도면의 동일한 참조부호는 동일한 구성요소를 지칭함을 전제하여 설명하기로 한다.Hereinafter, the present invention will be described in detail with reference to a preferred embodiment of the present invention and the accompanying drawings, but it will be described on the premise that the same reference numerals refer to the same elements.

본 발명의 실시를 위한 구체적인 내용을 설명하기에 앞서, 본 발명의 기술적 요지와 직접적 관련이 없는 구성에 대해서는 본 발명의 기술적 요지를 흩뜨리지 않는 범위 내에서 생략하였음에 유의하여야 할 것이다. Prior to describing specific details for the implementation of the present invention, it should be noted that configurations that are not directly related to the technical gist of the present invention have been omitted within the scope not disturbing the technical gist of the present invention.

또한, 본 명세서 및 청구범위에 사용된 용어 또는 단어는 발명자가 자신의 발명을 최선의 방법으로 설명하기 위해 적절한 용어의 개념을 정의할 수 있다는 원칙에 입각하여 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야 할 것이다.In addition, the terms or words used in the specification and claims are based on the principle that the inventor can define the concept of an appropriate term to describe his or her invention in the best way. Should be interpreted as.

본 명세서에서 어떤 부분이 어떤 구성요소를 "포함"한다는 표현은 다른 구성요소를 배제하는 것이 아니라 다른 구성요소를 더 포함할 수 있다는 것을 의미한다.In this specification, the expression that a certain part "includes" a certain component does not exclude other components, but means that other components may be further included.

또한, "‥부", "‥기", "‥모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어, 또는 그 둘의 결합으로 구분될 수 있다.In addition, terms such as "... unit", "... group", and "... module" mean units that process at least one function or operation, which can be classified into hardware, software, or a combination of the two.

또한, "적어도 하나의" 라는 용어는 단수 및 복수를 포함하는 용어로 정의되고, 적어도 하나의 라는 용어가 존재하지 않더라도 각 구성요소가 단수 또는 복수로 존재할 수 있고, 단수 또는 복수를 의미할 수 있음은 자명하다 할 것이다. In addition, the term “at least one” is defined as a term including the singular and plural, and even if the term “at least one” does not exist, each component may exist in the singular or plural, and may mean the singular or plural. Will say self-evident.

또한, 각 구성요소가 단수 또는 복수로 구비되는 것은, 실시 예에 따라 변경가능하다 할 것이다.In addition, it will be said that each component may be provided in singular or plural, and may be changed according to embodiments.

이하, 첨부된 도면을 참조하여 본 발명의 일 실시 예에 따른 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 시스템 및 방법의 바람직한 실시예를 상세하게 설명한다.Hereinafter, a preferred embodiment of an OCR-based document analysis system and method using self-reliability information according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시 예에 따른 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 시스템을 나타낸 블록도이다.1 is a block diagram showing an OCR-based document analysis system using self-reliability information according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시 예에 따른 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 시스템은 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 시스템으로서, 객체 탐지 모델을 이용하여 인식 대상 이미지에서 임의의 형식(form), 글자 및 숫자 중 적어도 하나의 객체의 위치를 탐지하고, 탐지된 형식, 글자 및 숫자 객체의 둘레를 따라 사각형상을 표시하여 사각형상의 픽셀 위치 값을 생성하며, OCR 모델을 이용하여 상기 사각형상의 픽셀 안에서 인식되는 글자 및 숫자 정보를 인식하며, 상기 생성된 사각형상의 픽셀 위치 값을 기반으로 인접한 모든 사각형상의 픽셀들을 연결하며, 상기 연결된 사각형상의 픽셀 위치에 상기 OCR 모델을 통해 인식된 글자 및 숫자 정보를 매칭시켜 디스플레이 하는 문서 분석 장치(100)를 포함하여 구성될 수 있다.Referring to FIG. 1, an OCR-based document analysis system using self-reliability information according to an embodiment of the present invention is an OCR-based document analysis system using self-reliability information, and an arbitrary format in an image to be recognized using an object detection model. Detects the location of at least one object among (form), letters and numbers, displays a square shape along the perimeter of the detected format, letter and number object to generate a square pixel position value, and uses the OCR model Recognizes character and number information recognized within a rectangular pixel, connects all adjacent rectangular pixels based on the generated rectangular pixel position value, and recognizes characters and numbers recognized through the OCR model at the connected rectangular pixel positions It may be configured to include a document analysis device 100 that matches and displays numeric information.

또한, 문서 분석 장치(100)는 인식률에 따라 인식된 글자 및 숫자의 신뢰 점수를 산출하고, 산출된 신뢰 점수를 디스플레이에 반영하여 시각적으로 표시되도록 할 수 있다.In addition, the document analysis apparatus 100 may calculate the confidence score of the recognized letters and numbers according to the recognition rate, and reflect the calculated confidence score on the display to be visually displayed.

또한, 문서 분석 장치(100)는 재구성 폼이 신뢰 점수에 따라 정상 인식 영역과, 틀린 영역 및 보정 영역을 포함한 오류발생 영역이 서로 다른 색상의 시각화 정보로 표시되도록 할 수 있다.In addition, the document analysis apparatus 100 may cause the reconstruction form to display a normal recognition area and an error occurrence area including an incorrect area and a correction area as visualization information of different colors according to the confidence score.

또한, 문서 분석 장치(100)는 신뢰 점수가 보정 모델을 이용하여 폼, 형상, 위치 중 적어도 하나의 보정 수행에 따라 산출되는 재건율이 추가 반영될 수 있도록 한다.In addition, the document analysis apparatus 100 allows the reconstruction rate calculated according to the correction of at least one of a form, a shape, and a position to be additionally reflected by using a correction model in which a confidence score is used.

이를 위해, 문서 분석 장치(100)는 인식 대상 이미지를 수신하는 입력부(110)를 포함하여 구성될 수 있다.To this end, the document analysis apparatus 100 may be configured to include an input unit 110 for receiving an image to be recognized.

입력부(110)는 네트워크를 통해 접속된 외부 단말로부터 전송되는 이미지 또는 스캐너 등을 통해 스캐닝된 이미지 등을 수신하는 데이터 통신 수단으로 구성될 수 있다.The input unit 110 may be configured as a data communication means for receiving an image transmitted from an external terminal connected through a network or an image scanned through a scanner.

또한, 문서 분석 장치(100)는 인식 대상 이미지에 포함된 글자에 대하여 임의의 문서로부터 미리 설정된 글자에 대한 항목 DB(300, 도 3 참조) 정보를 생성하는 항목 DB 생성부(120)를 포함하여 구성될 수 있다.In addition, the document analysis apparatus 100 includes an item DB generation unit 120 that generates item DB (300, see FIG. 3) information for a character set in advance from an arbitrary document with respect to the character included in the recognition target image. Can be configured.

항목 DB 생성부(120)는 도 2와 같이, 예를 들어 병원 영수증 등의 문서(200)에 고정으로 포함되어, 청구 내역, 진료 내역 등의 텍스트로 표시된 항목(210)에 대한 정보를 분석한다.As shown in FIG. 2, the item DB generation unit 120 is fixedly included in a document 200 such as a hospital receipt, and analyzes information on the item 210 displayed in text such as billing details and medical treatment details. .

또한, 항목 DB 생성부(120)는 도 2에서 분석된 항목에 대하여 도 3과 같은 항목 DB(300)를 생성하여 데이터베이스(170)에 저장되도록 한다.In addition, the item DB generation unit 120 generates the item DB 300 as shown in FIG. 3 with respect to the item analyzed in FIG. 2 to be stored in the database 170.

본 발명에서는 설명의 편의를 위해, 인식 대상 이미지를 병원 영수증과 관련된 이미지를 실시 예로 설명하지만 이에 한정되는 것은 아니고, 약국 영수증, 세금계산서, 견적서, 청구서, 거래 명세서, 각종 계산서 및 영수증 등을 포함할 수 있다.In the present invention, for convenience of explanation, the image to be recognized is described as an example of an image related to a hospital receipt, but is not limited thereto, and includes pharmacy receipts, tax invoices, estimates, bills, transaction statements, various bills and receipts, etc. I can.

또한, 문서 분석 장치(100)는 입력부(110)를 통해 수신된 인식 대상 이미지에서 객체 탐지 모델을 이용하여 이미지에 포함된 임의의 형식(form), 글자 및 숫자 객체의 위치를 탐지하고, 탐지된 형식, 글자 및 숫자 객체의 둘레를 따라 사각형상을 표시하여 사각형상의 픽셀 위치 정보를 생성하는 객체 탐지 모델링부(130)를 포함하여 구성될 수 있다.In addition, the document analysis apparatus 100 detects the location of an arbitrary form, text, and numeric object included in the image using an object detection model in the recognition target image received through the input unit 110, and detects the It may be configured to include an object detection modeling unit 130 for generating square pixel position information by displaying a square shape along the perimeter of a format, letter, and number object.

즉, 객체 탐지 모델링부(130)는 형식, 글자 및 숫자 객체에 대한 상대적인 위치의 인식을 수행하고, 형식, 글자 및 숫자 객체의 위치에 따른 배열을 확인할 수 있도록 탐지된 형식, 글자 및 숫자 객체의 둘레를 따라 사각형상을 표시하며, 표시된 사각형상의 픽셀 위치 값(좌표 정보)을 생성한다.That is, the object detection modeling unit 130 performs the recognition of the relative position of the type, letter, and number objects, and checks the arrangement according to the position of the type, letter, and number objects. A square shape is displayed along the perimeter, and a pixel position value (coordinate information) on the displayed square is generated.

여기서, 객체 탐지 모델은 PSENet(Progressive Scale Expansion Network) 기반의 딥러닝 모델을 이용하여 문서 이미지를 포함한 학습 데이터로부터 형식, 글자 및 숫자 객체와, 그 위치를 탐지하고, 탐지율의 향상을 위해 학습을 수행할 수 있다.Here, the object detection model uses a deep learning model based on PSENet (Progressive Scale Expansion Network) to detect format, letter, and numeric objects and their location from training data including document images, and performs learning to improve detection rate. can do.

즉, 영수증의 원본 이미지를 기반으로 문서의 임의의 부분이 n 등분으로 접혀진 이미지, 문서의 위치가 임의의 각도로 기울어진 이미지, 임의의 조도를 갖는 밝기가 조절된 이미지, 문서에 표시된 내용이 선명하지 않고 끊어진 연결선을 갖는 이미지, 문서의 임의의 부분이 굴곡진 이미지, 숫자와 연결선이 겹쳐진 이미지 등을 기반으로 학습 데이터를 학습할 수 있다.In other words, based on the original image of the receipt, an image in which an arbitrary part of the document is folded into n equal parts, an image in which the position of the document is inclined at an arbitrary angle, an image with an arbitrary brightness adjusted, and the content displayed on the document is clear. Learning data can be learned based on images with broken connecting lines, images in which arbitrary parts of the document are curved, and images overlapping numbers and connecting lines.

이와 같이, 다양한 환경, 예를 들어, 조명, 촬영각도, 촬영시 흔들림, 촬영구도, 사진 해상도 등 다양한 여건에서 촬영될 수 있는 이미지를 감안하여 원본 이미지와 비교하여 다양한 변화를 준 이미지들을 사전에 학습함으로써, 실제 환경에서 수신되는 이미지들에 대한 탐지율 또는 인식율을 향상시킬 수 있다.In this way, images that can be photographed in various environments, such as lighting, shooting angle, shaking during shooting, shooting composition, photo resolution, etc., are compared with the original image to learn in advance the images with various changes. By doing so, it is possible to improve a detection rate or recognition rate for images received in an actual environment.

한편, 본 발명에서는 설명의 편의를 위해, 인식 대상 이미지를 병원 영수증과 관련된 이미지를 실시 예로 설명하지만 이에 한정되는 것은 아니고, 약국 영수증, 세금계산서, 견적서, 청구서, 거래 명세서, 각종 계산서 및 영수증 등을 포함할 수 있다.On the other hand, in the present invention, for convenience of explanation, the image to be recognized is described as an example of an image related to a hospital receipt, but is not limited thereto, and a pharmacy receipt, tax invoice, estimate, bill, transaction statement, various bills and receipts, etc. Can include.

또한, 객체 탐지 모델은 자동 증강(Auto Augmentation)을 통해 다양한 변화를 준 이미지들을 기반으로 탐지를 수행함에 있어서 최적의 규칙을 찾을 수도 있다.In addition, the object detection model may find an optimal rule in performing detection based on images to which various changes have been made through auto augmentation.

또한, 객체 탐지 모델은 탐지된 형식, 글자 및 숫자의 객체에 대하여 사각형상의 픽셀을 설정하고, 설정된 픽셀의 위치 값을 생성할 수 있다.In addition, the object detection model may set a square pixel for an object of a detected type, letter, and number, and generate a position value of the set pixel.

즉, 도 4와 같이, 입력된 문서 이미지(400)에서 임의의 탐지 영역(410)으로부터 탐지된 글자 객체(411)에 대하여 외부 둘레면을 따라 사각형상의 박스로 표시한다.That is, as shown in FIG. 4, the text object 411 detected from the arbitrary detection area 410 in the input document image 400 is displayed as a rectangular box along the outer peripheral surface.

또한, 객체 탐지 모델은 인식된 객체에 대한 정보를 도 5와 같이, 객체 탐지 결과(420)로 표시함으로써, 사각형상 기반의 패턴 인식도 가능하다.In addition, the object detection model displays information on the recognized object as an object detection result 420 as shown in FIG. 5, thereby enabling pattern recognition based on a square shape.

본 실시 예에서는 설명의 편의를 위해 글자 객체로 설명하지만, 이에 한정되는 것은 아니고, 숫자, 영수증의 폼을 구성하는 형식(form)을 객체로 포함할 수 있음은 당업자에게 있어서 자명할 것이다.In the present embodiment, the text object is described for convenience of description, but it is not limited thereto, and it will be apparent to those skilled in the art that a form constituting the form of a number and a receipt may be included as an object.

또한, 형식의 둘레를 따라 표시된 사각형상의 구성(또는 배열) 패턴에 근거하여 후술되는 폼 구성 모델링부(150)가 미리 저장된 기관(병원)의 영수증 구성과 비교하여 어떤 기관의 영수증인지 구분할 수도 있다.In addition, based on a rectangular configuration (or arrangement) pattern displayed along the perimeter of the form, the form configuration modeling unit 150 to be described later may be compared with a receipt configuration of an institution (hospital) previously stored to distinguish which institution's receipt is.

여기서, 형식은 문서 테이블을 구성하는 폼에서 하나의 셀(Cell)로서, 직사각형으로 이루어질 수 있다.Here, the format is one cell in the form constituting the document table, and may be formed in a rectangle.

또한, 문서 분석 장치(100)는 객체 탐지 모델링부(130)에서 탐지된 형식, 글자 및 숫자 객체에 대해 OCR 모델을 이용하여 글자 및 숫자를 인식하는 OCR 모델링부(140)를 포함하여 구성될 수 있다.In addition, the document analysis apparatus 100 may be configured to include an OCR modeling unit 140 for recognizing letters and numbers using an OCR model with respect to the type, text and number objects detected by the object detection modeling unit 130. have.

여기서, OCR 모델링부(140)는 이미지 기반의 텍스트 문서를 자동으로 감지하여 인식하는 구성으로서, 공지의 OCR 모델을 사용하여 구성될 수 있다.Here, the OCR modeling unit 140 automatically detects and recognizes an image-based text document, and may be configured using a known OCR model.

또한, OCR 모델링부(140)는 OCR 모델을 통해 도 7과 같이 인식된 OCR 인식 결과(430)에 대하여 인식된 예측 정보(431)와, 예측 정보(431)에 대하여 후술되는 폼 구성 모델링부에서 산출된 신뢰 점수(432)에 기반한 신뢰 점수를 함께 제공할 수 있다.In addition, the OCR modeling unit 140 includes the prediction information 431 and the prediction information 431 recognized for the OCR recognition result 430 recognized as shown in FIG. 7 through the OCR model in a form configuration modeling unit to be described later. A confidence score based on the calculated confidence score 432 may be provided together.

여기서, 예측 정보(431)는 인식된 객체에 포함될 글자 및 숫자를 나타낸 것이고, 신뢰 점수(432)는 OCR을 통해 인식하는 과정에서 내용이 선명하지 않거나, 연결된 부분이 끊어진 경우 등을 반영하여 전체부분에서 인식된 부분의 비율을 산출한 인식률일 수 있다.Here, the prediction information 431 indicates letters and numbers to be included in the recognized object, and the confidence score 432 reflects the case that the content is not clear or the connected part is disconnected during the recognition process through OCR. It may be a recognition rate obtained by calculating the ratio of parts recognized in.

또한, 문서 분석 장치(100)는 인식된 글자 정보를 항목 DB(300)와 비교하고, 인식된 글자를 항목 DB(300)의 글자 정보로 교정함에 따라 숫자 정보를 갖는 임의의 사각형상의 픽셀 위치를 시작 위치로 하여 좌측 방향과 상측 방향으로 이동하되, 글자 정보가 검색되면 이동중에 검색된 모든 사각형상의 픽셀들을 연결하고, 상기 연결된 사각형상의 픽셀 위치에 상기 OCR 모델을 통해 교정된 글자 및 숫자 정보를 매칭시켜 디스플레이되도록 하는 폼 구성 모델링부(150)를 포함하여 구성될 수 있다.In addition, the document analysis apparatus 100 compares the recognized character information with the item DB 300, and corrects the recognized character with the character information of the item DB 300, thereby determining the position of an arbitrary square pixel having numeric information. As the starting position, it moves in the left direction and upward direction, but when text information is searched, all square pixels retrieved during movement are connected, and the corrected letter and number information through the OCR model are matched to the connected square pixel position. It may be configured to include a form configuration modeling unit 150 to be displayed.

폼 구성 모델링부(150)는 인식 대상 이미지에서 인식되는 글자 정보와 비교하기 위해, 임의의 문서, 예를 들면, 병원 영수증, 약국 영수증, 거래명세서, 세금 계산서 등에서 사용되는 글자(항목) 정보를 정의한 항목 DB(300)를 생성할 수 있다.The form configuration modeling unit 150 defines character (item) information used in arbitrary documents, for example, hospital receipts, pharmacy receipts, transaction statements, tax invoices, etc., in order to compare with the character information recognized in the recognition target image. Item DB 300 can be created.

또한, 폼 구성 모델링부(150)는 도 6과 같이, 인식된 글자 객체(411)에 대한 자연어 처리(Natural Language Processing, NLP)를 통해 탐지된 글자에 대하여 형태소, 또는 분절음 별로, NLP 객체(411a)를 분석하고, 분석된 결과와 항목 DB(300) 정보 사이의 비교를 기반으로 인식된 글자에 대한 신뢰 점수를 산출할 수도 있다.In addition, as shown in FIG. 6, the form configuration modeling unit 150 is an NLP object 411a for each morpheme or segmental sound for a character detected through natural language processing (NLP) for the recognized character object 411. ) May be analyzed, and a confidence score for the recognized character may be calculated based on a comparison between the analyzed result and the item DB 300 information.

예를 들어, 인식된 글자가 "MRI 진단-"인 경우, 항목 DB(300)에 저장된 항목에 대한 정보를 검색한 다음, 검색된 항목에 대응하는 항목 DB(300) 정보와의 비교를 통해 신뢰 점수를 산출한다. For example, if the recognized character is "MRI diagnosis-", after searching for information on the item stored in the item DB 300, the confidence score is compared with the item DB 300 information corresponding to the searched item. Yields

즉, 폼 구성 모델링부(150)는 병원 영수증에 새로운 항목이 인식되거나 OCR의 실수로 인해 오탈자, 미인식자 등이 발생하더라도, NLP를 이용하여 상황에 맞게 처리할 수 있도록 한다.That is, even if a new item is recognized on the hospital receipt or a mistake or an unrecognized person occurs due to an OCR mistake, the form configuration modeling unit 150 can process it according to the situation using the NLP.

또한, 폼 구성 모델링부(150)에서 산출된 글자에 대한 신뢰 점수는 도 7과 같이, OCR 인식 결과(430)에 인식된 예측 정보(431)와, 폼 구성 모델링부(150)에서 예측 정보(431)에 대하여 산출된 신뢰 점수(432)를 제공할 수 있다.In addition, the confidence score for the letters calculated by the form configuration modeling unit 150 is predicted information 431 recognized in the OCR recognition result 430 and the prediction information ( A confidence score 432 calculated for 431) may be provided.

또한, 폼 구성 모델링부(150)는 탐지된 글자 및 객체의 위치를 기반으로 임의의 시작 위치에서 좌측방향 및 상측방향으로 인접한 모든 글자 및 숫자들의 객체 위치들을 연결하여 재구성된 폼을 생성한다.In addition, the form configuration modeling unit 150 creates a reconstructed form by connecting the object positions of all letters and numbers adjacent to the left and upward directions from a random starting position based on the detected position of the text and the object.

한편, 폼 구성 모델링부(150)는 새로운 항목이 인식된 경우, 숫자 객체의 박스 크기에 대하여 가로, 세로 길이가 인접한 다른 박스의 가로, 세로 길이와 동일하거나 또는 다른 박스의 가로, 세로 길이에 포함되면 연결되도록 한다.On the other hand, when a new item is recognized, the form configuration modeling unit 150 includes the horizontal and vertical lengths of the box size of the numeric object equal to the horizontal and vertical lengths of other adjacent boxes or in the horizontal and vertical lengths of other boxes. If you do, try to connect.

도 8을 참조하여 더욱 상세하게 설명하면, 숫자로 인식된 시작 객체 박스(500)를 기준으로 시작 객체 박스(500)의 횡방향 크기(600) 및 종방향 크기(610)를 좌측 객체 박스(510)와 상측 객체 박스(520)의 횡방향 크기(600a) 및 종방향 크기(610a)와 비교하여 동일하거나 또는 좌측 객체 박스(510)와 상측 객체 박스(520)의 횡방향 크기 및 종방향 크기에 포함되면 좌측 연결선(700) 및 상측 연결선(710)으로 연결하여 표시한다.When described in more detail with reference to FIG. 8, the horizontal size 600 and the vertical size 610 of the starting object box 500 are determined based on the starting object box 500 recognized as a number. ) And the horizontal size (600a) and the vertical size (610a) of the upper object box 520, or the same, or to the horizontal and vertical size of the left object box 510 and the upper object box 520 If included, the left connection line 700 and the upper connection line 710 are connected and displayed.

본 실시 예에서는 설명의 편의를 위해 시작 위치를 중앙부분에 위치한 숫자 객체를 시작 객체 박스(500)로 설명하였지만, 시작 객체 박스(500)의 우측에 위치한 우측 객체 박스(500a), 시작 객체 박스(500)의 하측에 위치한 하측 객체 박스를 시작 위치로 설정할 수도 있다.In the present embodiment, for convenience of explanation, the number object located in the center part of the starting position is described as the starting object box 500, but the right object box 500a and the starting object box located to the right of the starting object box 500 are described. You can also set the lower object box located below 500) as the start position.

예를 들어, 우측 객체 박스(500a)를 시작 위치로 설정하면, "공단 부담금"이 인식(또는 탐지)된 글자 객체는 상측 객체 박스(520a)가 될 수 있다.For example, if the right object box 500a is set as the starting position, the text object recognized (or detected) for "industrial burden" may become the upper object box 520a.

또한, 폼 구성 모델링부(150)는 상기된 과정을 반복하여 가장 가까운 좌측과 상측만을 연결하고, 연결 후 숫자로부터 시작하여 좌측과 상측에 텍스트, 예를 들면, 글자 객체(항목)가 나올때까지 이동 및 검색을 통해 연결한다.In addition, the form configuration modeling unit 150 repeats the above-described process to connect only the nearest left and upper sides, and after the connection, starts from a number and moves until a text, for example, a character object (item) appears on the left and upper sides. And connect through search.

즉, 도 9에 나타낸 바와 같이, 문서 이미지(800)에서 기준 객체 박스(810)의 좌측과 상측으로 이동 및 검색을 통해 다음 객체와 연결을 수행하는데 있어서, 좌측에 위치한 좌측 객체 박스(820)가 글자이면 좌측 연결선(840)을 통해 연결한다.That is, as shown in FIG. 9, in performing a connection with the next object through movement and search from the document image 800 to the left and the upper side of the reference object box 810, the left object box 820 located on the left is If it is a letter, it is connected through the left connection line 840.

또한, 상측으로 이동 및 검색을 통해 상측 객체 박스(830)가 글자 객체(항목)이면, 상측 연결선(841)을 통해 계속해서 연결한다.In addition, if the upper object box 830 is a text object (item) through moving upward and searching, it is continuously connected through the upper connection line 841.

또한, 폼 구성 모델링부(150)는 재구성 폼의 항목이 교정된 글자, 즉 항목 DB(300)에서 선택된 글자들로만 구성되면, 오탈자 또는 미인식된 글자가 없는 것으로 판단하여 OCR 모델링부(140)의 OCR 모델을 통해 인식된 숫자만 인식하고, 인식된 숫자와 항목을 매칭시킬 수도 있다.In addition, the form configuration modeling unit 150 determines that there are no typos or unrecognized characters when the item of the reconstruction form is composed of only corrected characters, that is, characters selected from the item DB 300, and the OCR modeling unit 140 Only the numbers recognized through the OCR model are recognized, and the recognized numbers and items can be matched.

상기 폼 구성 모델링부(150)는 탐지된 글자 객체에 대하여 항목 DB(300) 정보와 매칭시키고, 매칭 결과에 따라 인식된 글자가 교정되면, 교정된 글자가 반영되도록 하는 것을 특징으로 한다.The form configuration modeling unit 150 matches the detected text object with the item DB 300 information, and when the recognized text is corrected according to the matching result, the corrected text is reflected.

또한, 폼 구성 모델링부(150)는 직사각형 인식을 통한 글자 객체 및 숫자 객체를 찾을 뿐만 아니라, 문서 이미지가 기울어진 상태의 이미지인 경우, 꼭지점에 의한 재건을 통해 수평 상태의 문서 이미지로 보정할 수도 있다.In addition, the form configuration modeling unit 150 not only finds text objects and numeric objects through rectangle recognition, but also corrects the document image in a horizontal state through reconstruction by vertices when the document image is an inclined image. have.

또한, 폼 구성 모델링부(150)는 촬영 각도에 의해 문서 이미지가 사다리꼴 형상의 이미지인 경우, 사다리꼴 보정을 통한 재건을 통해 사각형상의 문서 이미지로 변환할 수도 있다.In addition, when the document image is a trapezoidal image according to the photographing angle, the form configuration modeling unit 150 may convert the document image into a rectangular document image through reconstruction through trapezoidal correction.

또한, 폼 구성 모델링부(150)는 문서 이미지가 구겨지거나 또는 접혀진 부분을 포함하여 촬영된 경우, 형상 또는 폼을 보정하기 위한 프로그램을 이용하여 사각형상의 문서 이미지로 변환할 수도 있다.In addition, when the document image is photographed including a wrinkled or folded portion, the form configuration modeling unit 150 may convert the document image into a rectangular document image using a program for correcting the shape or form.

한편, 폼 구성 모델링부(150)가 인식 정확도의 제고를 위한 보정을 수행하여 직사각형상의 충실한 재건과 이를 기반으로 하여 정확한 인식이 이루질 수 있도록 동작하지만, 그에 따른 보정 및 재건을 통해 오류의 발생 가능성도 함께 증가할 수 있다.On the other hand, the form configuration modeling unit 150 performs correction to improve the recognition accuracy and operates to achieve faithful reconstruction of a rectangular shape and accurate recognition based on this, but the possibility of error occurrence through correction and reconstruction accordingly Can also increase with.

이를 위해, 문서 분석 장치(100)는 보정 모델을 이용하여 보정 및 재건으로 인한 판단 오류의 위험성 정보, 예를 들면, 보정 및 재건 유형과 갯수 등을 수치화시킨 정보와, 인식된 글자 및 숫자의 신뢰도를 기반으로 신뢰 점수를 산출하고, 산출된 신뢰 점수를 기반으로 사용자가 확인할 수 있도록 폼 구성 모델링부(150)를 통해 디스플레이에 반영함으로써, 시각적으로 표시되도록 하는 신뢰도 평가부(160)를 포함하여 구성될 수 있다.To this end, the document analysis apparatus 100 uses a correction model to quantify information on the risk of a judgment error due to correction and reconstruction, for example, information obtained by quantifying the type and number of correction and reconstruction, and the reliability of recognized letters and numbers. Consists of a reliability evaluation unit 160 that calculates a confidence score based on and reflects it on the display through the form configuration modeling unit 150 so that the user can check based on the calculated confidence score, so that it is visually displayed. Can be.

즉, 신뢰도 평가부(160)는 보정 및 재건으로 인한 판단 오류의 위험성 정보, 예를 들면, 보정 및 재건의 유형과, 갯수 등을 수치화시킨 신뢰 점수를 폼 구성 모델링부(150)로 제공한다.That is, the reliability evaluation unit 160 provides information on the risk of a judgment error due to correction and reconstruction, for example, a confidence score obtained by quantifying the type and number of correction and reconstruction to the form configuration modeling unit 150.

폼 구성 모델링부(150)는 글자 및 숫자의 사각형상 박스의 색상이 정상 객체 영역과, 틀린 객체 영역 및 보정 영역을 포함한 오류발생 객체 영역이 서로 다른 색상으로 표시한다.The form configuration modeling unit 150 displays the normal object area and the error-prone object area including the wrong object area and the correction area in different colors of the rectangular box of letters and numbers.

즉, 폼 구성 모델링부(150)는 신뢰도 평가부(160)를 통해 제공되는 신뢰 점수에 기반하여 도 10과 같이, 신뢰 점수가 미리 설정된 기준 값 이상인 객체 영역은 재구성 이미지(900)에서 청색으로 표시하여 정상 객체 영역(910, 911)이 확인될 수 있도록 표시한다.That is, the form configuration modeling unit 150 displays an object area having a confidence score equal to or greater than a preset reference value in blue in the reconstructed image 900 as shown in FIG. 10 based on the confidence score provided through the reliability evaluation unit 160. Thus, the normal object areas 910 and 911 are displayed so that they can be identified.

또한, 폼 구성 모델링부(150)는 신뢰도 평가부(160)를 통해 제공된 신뢰 점수가 기준 값 이하인 객체 영역은 재구성 이미지(900)에서 적색으로 표시하여 오류발생 객체 영역(920, 921, 922, 923, 924)이 확인될 수 있도록 표시한다.In addition, the form configuration modeling unit 150 displays the object area in which the confidence score provided through the reliability evaluation unit 160 is less than or equal to the reference value in red in the reconstructed image 900 to display the error-prone object areas 920, 921, 922, and 923. , 924) are marked so that they can be identified.

이러한 폼 구성 모델링부(150)의 서로 다른 색상 표시를 통해 사용자는 신속하고 정확하게 확인할 수 있다.The user can quickly and accurately check through the display of different colors in the form configuration modeling unit 150.

또한, 문서 분석 장치(100)는 상기 생성된 사각형상의 픽셀 위치 값, 인식된 글자, 숫자 정보, 신뢰 정보와, 특정 기관에서 사용하는 문서 데이터의 폼을 저장하는 데이터베이스(170)를 포함하여 구성될 수 있다.In addition, the document analysis device 100 may be configured to include a database 170 that stores the generated square pixel position value, recognized letters, number information, trust information, and a form of document data used by a specific institution. I can.

여기서, 특정 기관은 병원, 약국, 회사 등, 영수증 및 회계 관련 문서를 임의의 폼으로 발행하는 모든 곳을 포함할 수 있다.Here, the specific institution may include hospitals, pharmacies, companies, etc., all places that issue receipts and accounting-related documents in an arbitrary form.

다음은 본 발명의 일 실시 예에 따른 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 방법을 설명한다.The following describes an OCR-based document analysis method using self-reliability information according to an embodiment of the present invention.

도 11은 본 발명의 일 실시 예에 따른 자기 신뢰도 정보를 이용한 OCR 기반 문서 분석 과정을 나타낸 흐름도이다.11 is a flowchart illustrating an OCR-based document analysis process using self-reliability information according to an embodiment of the present invention.

도 1 내지 도 11을 참조하면, 문서 분석 장치(100)는 네트워크를 통해 연결된 외부 단말 또는 팩스 등을 통해 인식 대상 영수증의 이미지를 수신(S100)한다.1 to 11, the document analysis apparatus 100 receives (S100) an image of a receipt to be recognized through an external terminal connected through a network or a fax.

또한, 문서 분석 장치(100)는 수신된 인식 대상 영수증의 이미지에서 객체 탐지 모델을 이용하여 글자 및 숫자의 객체와 위치를 탐지하고, 탐지된 글자 및 숫자의 객체에 사각형상의 박스를 형성하는 객체 탐지 단계를 수행(S200)한다.In addition, the document analysis apparatus 100 detects the object and position of letters and numbers using an object detection model in the received image of the recognition target receipt, and detects an object forming a rectangular box in the detected text and number objects. Perform the step (S200).

또한, S200 단계에서, 객체 탐지 모델은 PSENet(Progressive Scale Expansion Network) 기반의 딥러닝 모델을 이용하여 문서 이미지를 포함한 학습 데이터로부터 형식, 글자 및 숫자 객체와, 그 위치를 탐지하고, 탐지율의 향상을 위해 학습을 수행할 수 있다.In addition, in step S200, the object detection model uses a PSENet (Progressive Scale Expansion Network)-based deep learning model to detect format, letter and numeric objects, and their location from training data including document images, and improve detection rate. To be able to perform learning.

또한, 객체 탐지 모델은 원본 이미지, 문서의 임의의 부분이 접혀진 이미지, 문서의 위치가 임의의 각도로 기울어진 이미지, 임의의 조도를 갖는 밝기가 조절된 이미지, 문서에 표시된 내용이 선명하지 않고 끊어진 연결선을 갖는 이미지, 문서의 임의의 부분이 굴곡진 이미지, 숫자와 연결선이 겹쳐진 이미지를 기반으로 학습 데이터를 학습할 수 있다.In addition, the object detection model includes an original image, an image in which an arbitrary part of the document is folded, an image in which the position of the document is inclined at an arbitrary angle, an image with an arbitrary light intensity, and the content displayed in the document is not clear and broken. Learning data can be learned based on an image with a connecting line, an image in which an arbitrary part of a document is curved, and an image in which numbers and connecting lines overlap.

계속해서, 문서 분석 장치(100)는 탐지된 형식, 글자 및 숫자 객체에 대한 사각형상 픽셀 안에서, OCR 모델을 이용하여 글자 및 숫자 정보를 인식하는 OCR 인식 단계(S300)를 수행한다.Subsequently, the document analysis apparatus 100 performs an OCR recognition step (S300) of recognizing letter and number information using an OCR model within a square pixel of the detected format, letter and number object.

S300 단계를 수행한 다음, 문서 분석 장치(100)는 임의의 문서로부터 고정화 되어 있는 텍스트 정보 즉, 항목에 대한 글자 정보를 항목 DB(300) 정보로 생성하고, 생성된 항목 DB(300) 정보는 데이터베이스(160)에 저장(S400)한다.After performing step S300, the document analysis apparatus 100 generates text information fixed from an arbitrary document, that is, character information about an item, as item DB 300 information, and the generated item DB 300 information is It is stored in the database 160 (S400).

상기 인식된 글자 정보를 항목 DB(300)와 비교하고, 인식된 글자를 항목 DB(300)의 글자 정보로 교정함에 따라 교정 여부를 결정하기 위한 오탈자 또는 미인식자의 유무를 판단하여 문서 분석 장치(100)는 NLP 기반의 교정을 수행(S500)한다.As the recognized character information is compared with the item DB 300, and the recognized character is corrected with the character information of the item DB 300, it is determined whether there is a typo or an unrecognized person to determine whether to correct the document analysis device ( 100) performs NLP-based calibration (S500).

즉, S500 단계에서 문서 분석 장치(100)는 자연어 처리(Natural Language Processing, NLP)를 통해 탐지된 글자에 대하여 분석한다.That is, in step S500, the document analysis apparatus 100 analyzes a character detected through natural language processing (NLP).

또한, S500 단계에서 문서 분석 장치(100)는 분석된 글자와 항목 DB(300) 정보 사이의 비교를 기반으로 분석된 글자에 대한 신뢰 점수를 산출하여 출력할 수도 있다.In addition, in step S500, the document analysis apparatus 100 may calculate and output a confidence score for the analyzed character based on a comparison between the analyzed character and the item DB 300 information.

계속해서, 문서 분석 장치(100)는 숫자 정보를 갖는 임의의 사각형상의 픽셀 위치를 시작 위치로 하여 좌측 방향과 상측 방향으로 이동하되, 글자 정보가 검색되면 이동중에 검색된 모든 사각형상의 픽셀들을 연결하고, 연결된 사각형상의 픽셀 위치에 상기 OCR 모델을 통해 교정된 글자 및 숫자 정보가 매칭되도록(S600)한다.Subsequently, the document analysis apparatus 100 moves in a left direction and an upward direction with a position of an arbitrary rectangular pixel having numeric information as a starting position, but when text information is searched, all the rectangular pixels retrieved during movement are connected, The letter and number information corrected through the OCR model are matched to the connected square pixel location (S600).

즉, 문서 분석 장치(100)는 시작 위치를 숫자 객체로 하여 시작 위치의 좌측 방향과 상측 방향으로 이동 및 연결한다.That is, the document analysis apparatus 100 moves and connects the starting position in a left direction and an upward direction by using the starting position as a numeric object.

이때, 시작 위치로부터 가장 가까운 좌측과 상측만을 우선 연결하고, 이후, 인접한 모든 객체들을 연결하며, 글자 객체(항목)가 나올때 까지 숫자 객체들을 검색하며 이동한다.At this time, only the closest left and upper sides are connected from the starting position first, then all adjacent objects are connected, and numeric objects are searched and moved until a text object (item) appears.

또한, 글자 객체를 포함한 항목 검색이 완료되면, 문서 분석 장치(100)는 인식된 글자 및 교정된 글자와 숫자를 항목(attribute) 별로 대응하여 매칭되도록 한다. In addition, when the search for an item including a text object is completed, the document analysis apparatus 100 matches the recognized letters and corrected letters and numbers according to each attribute.

또한, 문서 분석 장치(100)는 인식 정확도의 제고를 위한 보정을 수행하여 직사각형상의 충실한 재건과 이를 기반으로 하는 정확한 인식이 이루질 수 있도록 처리된 보정 및 재건 유형과 갯수 등을 수치화시킨 정보와, 인식된 글자 및 숫자의 신뢰도를 기반으로 신뢰 점수를 산출하고, 산출된 신뢰 점수를 기반으로 사용자가 확인할 수 있도록 디스플레이에 반영하여 시각적으로 표시(S700)되도록 한다.In addition, the document analysis apparatus 100 performs correction to improve the recognition accuracy, and the information obtained by numerically quantifying the type and number of correction and reconstruction processed so that the faithful reconstruction of a rectangular shape and accurate recognition based on this can be achieved, A confidence score is calculated based on the reliability of the recognized letters and numbers, and reflected on the display so that the user can check it based on the calculated confidence score to be visually displayed (S700).

또한, S700 단계에서 문서 분석 장치(100)는 문서 이미지가 기울어진 상태이면, 꼭지점에 의한 재건을 통해 수평 상태의 문서 이미지로 보정하고, 문서 이미지가 사다리꼴 형상의 이미지이면, 사다리꼴 보정을 통한 재건을 통해 사각형상의 문서 이미지로 변환할 수도 있다.In addition, in step S700, if the document image is in an inclined state, the document analysis device 100 corrects the document image in a horizontal state through reconstruction by vertices, and if the document image is a trapezoidal image, the document analysis apparatus 100 performs reconstruction through trapezoidal correction. It can also be converted to a rectangular document image.

또한, 문서 이미지가 구겨지거나 또는 접혀진 부분을 포함하면, 형상 또는 폼을 보정하기 위한 프로그램을 이용하여 사각형상의 문서 이미지로 변환함으로써, 폼, 형상, 위치 중 적어도 하나의 보정 수행에 따른 재건율과, 신뢰 점수를 산출할 수 있다.In addition, if the document image includes a wrinkled or folded portion, by converting the document image to a rectangular document image using a program for correcting the shape or form, the reconstruction rate according to the correction of at least one of form, shape, and position, Confidence score can be calculated.

또한 S700 단계에서 문서 분석 장치(100)는 신뢰 점수에 따라 글자 및 숫자의 사각형상 박스의 색상이 정상 객체 영역과, 틀린 객체 영역 및 보정 영역을 포함한 오류발생 객체 영역이 서로 다른 색상으로 표시되도록 한다.In addition, in step S700, the document analysis apparatus 100 causes the normal object area and the error-prone object area including the wrong object area and correction area to be displayed in different colors according to the confidence score. .

즉, 도 10과 같이, 신뢰 점수가 미리 설정된 기준 값 이상인 객체 영역은 재구성 이미지(900)에서 청색으로 표시하여 정상 객체 영역(910, 911)이 확인될 수 있도록 표시하고, 신뢰 점수가 기준 값 이하인 객체 영역은 재구성 이미지(900)에서 적색으로 표시하여 오류발생 객체 영역(920, 921, 922, 923, 924)이 확인될 수 있도록 표시되도록 한다.That is, as shown in FIG. 10, an object region having a confidence score equal to or greater than a preset reference value is displayed in blue in the reconstructed image 900 so that the normal object regions 910 and 911 can be identified, and the confidence score is less than the reference value. The object area is displayed in red on the reconstructed image 900 so that the error-prone object areas 920, 921, 922, 923, and 924 can be identified.

또한, 인식된 글자, 교정된 글자, 보정 및 재건을 통해 재구성된 폼을 재구성 이미지로 변환하여 최종 결과를 출력(S800)하고, 데이터베이스(170)에 저장되도록 한다.In addition, the recognized text, the corrected text, and the form reconstructed through correction and reconstruction are converted into a reconstructed image, and the final result is output (S800) and stored in the database 170.

따라서, OCR을 통해 인식된 정보에 대하여 신뢰 점수를 기반으로 정상 인식된 부분과, 틀린 부분 또는 의심되는 부분을 서로 다른 색상의 시각화 정보로 제공함으로써 사용자에 의한 신속하고 정확한 확인 가능하며, 사용자의 확인 작업시간을 감소시킬 수 있는 장점이 있다.Therefore, the user can quickly and accurately check the information recognized through OCR by providing the normally recognized part, the wrong part or the suspected part as visualization information of different colors based on the confidence score. There is an advantage that can reduce the working time.

또한, 병원, 보험사 등의 기관에서 사용하는 다양한 포맷의 영수증에 대하여 정확하고, 신뢰할 수 있는 사용정보를 제공할 수 있다.In addition, accurate and reliable usage information can be provided for receipts in various formats used by institutions such as hospitals and insurance companies.

상기와 같이, 본 발명의 바람직한 실시 예를 참조하여 설명하였지만 해당 기술 분야의 숙련된 당업자라면 하기의 특허청구범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.As described above, the present invention has been described with reference to preferred embodiments of the present invention, but those skilled in the art can variously modify and change the present invention within the scope not departing from the spirit and scope of the present invention described in the following claims. You will understand that you can do it.

또한, 본 발명의 특허청구범위에 기재된 도면번호는 설명의 명료성과 편의를 위해 기재한 것일 뿐 이에 한정되는 것은 아니며, 실시예를 설명하는 과정에서 도면에 도시된 선들의 두께나 구성요소의 크기 등은 설명의 명료성과 편의상 과장되게 도시되어 있을 수 있다.In addition, reference numerals in the claims of the present invention are provided for clarity and convenience of description, and are not limited thereto. In the process of describing the embodiments, the thickness of the lines shown in the drawings, the size of components, etc. May be exaggerated for clarity and convenience of description.

또한, 상술된 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례에 따라 달라질 수 있으므로, 이러한 용어들에 대한 해석은 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In addition, the above-described terms are terms defined in consideration of functions in the present invention and may vary according to the intention or custom of users and operators, so interpretation of these terms should be made based on the contents throughout the present specification. .

또한, 명시적으로 도시되거나 설명되지 아니하였다 하여도 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기재사항으로부터 본 발명에 의한 기술적 사상을 포함하는 다양한 형태의 변형을 할 수 있음은 자명하며, 이는 여전히 본 발명의 권리범위에 속한다. In addition, even if not explicitly shown or described, a person having ordinary knowledge in the technical field to which the present invention pertains can make various modifications including the technical idea according to the present invention from the description of the present invention. It is obvious, and this still belongs to the scope of the present invention.

또한, 첨부하는 도면을 참조하여 설명된 상기의 실시예들은 본 발명을 설명하기 위한 목적으로 기술된 것이며 본 발명의 권리범위는 이러한 실시예에 국한되지 아니한다.In addition, the above embodiments described with reference to the accompanying drawings are described for the purpose of describing the present invention, and the scope of the present invention is not limited to these embodiments.

100 : 문서 분석 장치 110 : 입력부
120 : 항목 DB 생성부 130 : 객체 탐지 모델링부
140 : OCR 모델링부 150 : 폼 구성 모델링부
160 : 신뢰도 평가부 170 : 데이터베이스
200 : 문서 210 : 항목
300 : 항목 DB 400 : 문서 이미지
410 : 탐지 영역 411 : 글자 객체
411a : NLP 객체 420 : 객체 탐지 결과
430 : OCR 인식 결과 431 : 예측 정보
432 : 신뢰 점수 500 : 시작 객체 박스
500a : 우측 객체 박스 510 : 좌측 객체 박스
520 : 상측 객체 박스 520a : 상측 객체 박스
600 : 횡방향 크기 610, 610a : 종방향 크기
700 : 좌측 연결선 710 : 상측 연결선
720 : 하측 연결선 800 : 문서 이미지
810 : 기준 객체 박스 810a : 우측 객체 박스
820 : 좌측 객체 박스 830 : 상측 객체 박스
840 : 좌측 연결선 841 : 상측 연결선
900 : 재구성 이미지 910, 911 : 정상 객체 영역
920, 921, 922, 923, 924 : 오류 발생 객체 영역100: document analysis device 110: input unit
120: Item DB generation unit 130: Object detection modeling unit
140: OCR modeling unit 150: Form configuration modeling unit
160: reliability evaluation unit 170: database
200: document 210: item
300: Item DB 400: Document image
410: detection area 411: text object
411a: NLP object 420: Object detection result
430: OCR recognition result 431: prediction information
432: confidence score 500: starting object box
500a: right object box 510: left object box
520: upper object box 520a: upper object box
600: transverse size 610, 610a: longitudinal size
700: left connecting line 710: upper connecting line
720: lower connecting line 800: document image
810: reference object box 810a: right object box
820: left object box 830: upper object box
840: left connecting line 841: upper connecting line
900: reconstructed image 910, 911: normal object area
920, 921, 922, 923, 924: object area with error

Claims

The object detection model is used to detect the location of at least one object among arbitrary forms, letters, and numbers in the recognition target image, and displays a square shape along the perimeter of the detected type, letter, and number object. Generates a pixel position value, recognizes letter and number information recognized in the square pixel using an OCR model, connects all adjacent square pixels based on the generated square pixel position value, and connects the connected square Including a document analysis device 100 for matching and displaying character and number information recognized through the OCR model at a pixel position on the image,
The document analysis device 100 calculates a confidence score of recognized letters and numbers based on a recognition rate using an OCR model, and reflects the calculated confidence score to the display information to be visually displayed,
The confidence score is an OCR-based document analysis system using self-reliability information, characterized in that the reconstruction rate calculated by performing at least one of a form, a shape, and a location using a correction model is additionally reflected.

The method of claim 1,
The document analysis device 100 displays a normal recognition area, an error-prone area including an incorrect area and a correction area as visualization information of different colors according to a confidence score, an OCR-based document using self-reliability information. Analysis system.

delete

An input unit 110 for receiving an image to be recognized;
In the received recognition target image, by using an object detection model, the position of at least one of a type, a letter, and a number object is detected, a square shape is displayed around the detected type, letter and number object, and the displayed square pixel position An object detection modeling unit 130 that generates a value;
An OCR modeling unit 140 for outputting letter and number information recognized within the rectangular pixel by using an OCR model;
As the recognized character is corrected with the character information of the item DB 300, a pixel position on an arbitrary rectangle having numeric information is used as a starting position and moves in the left direction and upward direction, but when the character information is searched, all searched during movement A form configuration modeling unit 150 that connects square pixels and matches the corrected letter and number information through the OCR model to the positions of the connected square pixels to be displayed;
A reliability evaluation unit 160 for calculating a confidence score of the recognized letters and numbers using a correction model, reflecting on the display based on the calculated confidence score to be visually displayed; And
OCR-based document analysis system using self-reliability information including a database 170 for storing the generated square pixel position value, recognized letter, number information, trust information, and a form of document data used by a specific institution .

The method of claim 4,
An item DB generation unit 120 for generating item DB 300 information for a preset character from an arbitrary document with respect to the character included in the recognition target image; further includes,
The form configuration modeling unit 150 matches the detected text object with item DB 300 information, and when the recognized text is corrected according to the matching result, the corrected text is reflected. OCR-based document analysis system using reliability information.

The method according to claim 4 or 5,
The form configuration modeling unit 150 uses self-reliability information, characterized in that the color of the rectangular box of letters and numbers is displayed in different colors between a normal recognition area and an error area including a wrong area and a correction area. OCR-based document analysis system.

a) The document analysis device 100 detects the location of at least one of an arbitrary format, letter, and number object using an object detection model in the received recognition target image, and a rectangle around the detected format, letter and number object Displaying a shape to generate a square pixel position value;
b) outputting, by the document analysis device 100, character and number information recognized within the detected rectangular pixels using the OCR model;
c) The document analysis device 100 uses the generated square pixel position value and the position of a random square pixel having numeric information based on the recognized letter and number information as a start position, and the left direction and the upper direction Moving to, but if the text information is searched, connecting all the square pixels searched during the movement, and matching the text and number information recognized through the OCR model to the connected square pixel positions to be displayed; And
d) calculating, by the document analysis device 100, a confidence score of the recognized letters and numbers using a correction model, reflecting on a display based on the calculated confidence score, and visually displaying the confidence score including; OCR-based document analysis method using information.

The method of claim 7,
In the step c), the document analyzing apparatus 100 generates an item DB 300 defining character (item) information used in an arbitrary document in order to compare it with the character information recognized in the recognition target image; OCR-based document analysis method using self-reliability information, characterized in that it further comprises.

The method of claim 8,
The step c) further comprises the step of matching the detected text object with item DB 300 information, and correcting the recognized text according to the matching result. OCR-based document analysis method using.

The method of claim 9,
The correction model in step d) calculates a confidence score based on the reconstruction rate according to the correction of at least one of form, shape, and location, and whether the corrected characters are reflected according to the matching result with the item DB 300 information. OCR-based document analysis method using self-reliability information, characterized in that.