KR20220106297A

KR20220106297A - verification system for achievements of faculty

Info

Publication number: KR20220106297A
Application number: KR1020210009072A
Authority: KR
Inventors: 강상길; 조명우; 이한음; 허청환
Original assignee: 인하대학교 산학협력단
Priority date: 2021-01-22
Filing date: 2021-01-22
Publication date: 2022-07-29
Anticipated expiration: 2041-01-22
Also published as: KR102550868B1

Abstract

본 발명은 교원 업적 검증 시스템에 관한 것으로서, 더욱 상세하게는 교원이 입력한 업적 정보를 검증하는 전체 과정을 자동으로 수행하도록 함으로써, 인력, 시간 및 비용을 상당량 절감할 수 있을 뿐만 아니라, 검증시에 동일성이 아니라 유사성을 판단하도록 하여 보다 정확하게 검증할 수 있는 교원 업적 검증 시스템에 관한 것이다.
상기한 목적을 달성하기 위한 본 발명은 교원들이 업적 정보를 입력하는 정보 입력 단계와, 입력 받은 업적 정보를 통하여 웹상에서 업적 관련 정보를 추출하는 정보 추출 단계와, 상기 정보 추출 단계에서 추출한 업적 관련 정보와 업적 정보를 비교하여 검증하는 데이터 검증 단계로 이루어지는 것을 특징으로 한다.The present invention relates to a teacher achievement verification system, and more particularly, by automatically performing the entire process of verifying achievement information input by a teacher, it is possible to significantly reduce manpower, time and cost, and to It relates to a teacher performance verification system that can be verified more accurately by judging similarity rather than identity.
The present invention for achieving the above object is an information input step in which teachers input achievement information, an information extraction step of extracting achievement-related information from the web through the received achievement information, and achievement-related information extracted in the information extraction step It is characterized in that it consists of a data verification step that compares and verifies achievement information.

Description

Teacher achievement verification system {verification system for achievements of faculty}

본 발명은 교원 업적 검증 시스템에 관한 것으로서, 더욱 상세하게는 교원이 입력한 업적 정보를 검증하는 전체 과정을 자동으로 수행하도록 함으로써, 인력, 시간 및 비용을 상당량 절감할 수 있을 뿐만 아니라, 검증시에 동일성이 아니라 유사성을 판단하도록 하여 보다 정확하게 검증할 수 있는 교원 업적 검증 시스템에 관한 것이다.The present invention relates to a teacher achievement verification system, and more particularly, by automatically performing the entire process of verifying achievement information input by a teacher, it is possible to significantly reduce manpower, time and cost, and to It relates to a teacher achievement verification system that can be verified more accurately by judging similarity rather than identity.

대학교 등에 재직하고 있는 교원들의 업적을 검증하기 위해서는 각 교원들이 작성한 논문, 특허, 저서나 역서에 대한 내용을 일일이 찾아보면서 대조하여야 하는데, 통상적으로 대조하여 검증할 논문은 약 3000건 이상, 특허는 약 400건 이상, 역서나 저서의 경우는 약 200건 이상이 해당된다.In order to verify the achievements of teachers working at universities, etc., it is necessary to search for and compare the contents of the thesis, patent, book, or translation written by each professor. In general, there are more than 3,000 papers to be collated and verified, and about 3,000 patents. More than 400, and in the case of translations or books, about 200 or more.

이렇게 상당한 양의 논문, 특허, 저서나 역서를 검증하기 위해서는 다수의 작업자가 수작업으로 직접 각 웹 사이트를 찾아다니면서 검증하여야 하고, 한번에 끝나는 것이 아니라 서로 교차 검증까지 거쳐야 하므로 다수의 인력이 상당한 시간을 소모하기 때문에 상당한 비용이 소요된다.In order to verify such a large amount of theses, patents, books, or translations, a large number of workers must manually visit each website to verify it, and it is not completed at once, but cross-verifies each other. Because of this, a considerable cost is incurred.

따라서, 전술한 검증 작업을 수작업이 아닌 특정 시스템을 이용하여 자동으로 수행하도록 할 경우 인력, 시간 및 비용을 절감하는 효과가 있다.Accordingly, when the above-described verification operation is performed automatically using a specific system instead of manually, there is an effect of reducing manpower, time, and cost.

이러한 검증 작업을 자동으로 수행하기 위한 기술의 일 예로 도 1 및 도 2에 도시된 바와 같은 한국공개특허 제10-2020-0082218호에 기재된 기술이 있는데, 그 기술적 특징은 비신뢰 데이터에 대한 신뢰도를 검증하는 서버에 의하여, 개인이 SNS 상에 개시한 게시물에서 키워드를 추출하는 단계; 웹 상에서 상기 키워드를 포함하는 뉴스 기사를 크롤링하는 단계; 및 상기 크롤링된 뉴스 기사의 개수를 기초로 하여 상기 개인이 SNS 상에 개시한 게시물의 신뢰도를 평가하는 단계를 포함하는 것을 특징으로 한다.As an example of a technology for automatically performing such a verification operation, there is a technology described in Korean Patent Application Laid-Open No. 10-2020-0082218 as shown in FIGS. 1 and 2, and its technical feature is to increase the reliability of untrusted data. extracting, by a server for verification, a keyword from a post that an individual has launched on SNS; crawling news articles including the keyword on the web; and evaluating the reliability of a post published by the individual on SNS based on the number of crawled news articles.

그런데, 한국공개특허 제10-2020-0082218호에 기재된 기술은 SNS 등에 업로드된 게시물의 내용을 웹 상에 공개된 뉴스 기사를 크롤링하여 신뢰도를 자동으로 검증하는 기술로서 자동으로 업로드된 정보를 검증하는 장점은 있으나, 검증시에 키워드가 정확하게 일치하는 경우만 고려하게 되므로 다양한 이유에 의해 키워드가 변형될 경우 정확하게 판단할 수 없는 문제점이 있다.However, the technology described in Korean Patent Application Laid-Open No. 10-2020-0082218 is a technology that automatically verifies reliability by crawling news articles published on the web for the contents of posts uploaded to SNS, etc. Although there is an advantage, there is a problem in that it is impossible to accurately determine when a keyword is modified for various reasons because only a case in which the keyword exactly matches is considered during verification.

한국공개특허 제10-2020-0082218호(2020.07.08.공개)Korean Patent Publication No. 10-2020-0082218 (published on July 8, 2020) 한국등록특허 제10-1153138호(2012.05.29.등록)Korean Patent Registration No. 10-1153138 (Registered on May 29, 2012)

본 발명은 상기한 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은 교원들이 기재한 업적 정보를 기반으로 하여 해당 정보가 있는 웹 사이트에서 크롤러 기능을 가지는 추출 모듈을 통하여 정보를 추출하여 업적 정보와 자동으로 비교하도록 함으로써, 전체 과정을 자동으로 수행할 수 있어 인력, 시간 및 비용을 상당량 절감할 수 있는 교원 업적 검증 시스템을 제공하는 것이다.The present invention has been devised to solve the above problems, and an object of the present invention is to extract information through an extraction module having a crawler function from a website having the corresponding information on the basis of the achievement information described by teachers to obtain achievement information It is to provide a teacher performance verification system that can automatically perform the entire process and significantly save manpower, time, and cost by automatically comparing it with

그리고, 본 발명의 다른 목적은 추출 모듈을 통하여 각 웹 사이트에서 추출한 정보와 업적 정보를 비교할 때, 업적 정보가 알파벳일 경우 모두 소문자로 전환하고, 한글일 경우 자음과 모음으로 분해하는 전처리 과정을 거치도록 하고, 검증시에 문자열 비교 방식(sequence matching)을 기본으로 하되 연속성에 가산점을 부가하도록 하여 유사성에 따라 검증하도록 함으로써, 검증 결과의 정확도를 높일 수 있는 교원 업적 검증 시스템을 제공하는 것이다.And, another object of the present invention is when comparing the information extracted from each website with the achievement information through the extraction module, if the achievement information is an alphabet, it is converted to all lowercase letters, and in the case of Hangul, it undergoes a preprocessing process of decomposing it into consonants and vowels. It is to provide a teacher performance verification system that can increase the accuracy of verification results by making verification according to similarity by adding additional points to continuity based on sequence matching at the time of verification.

이러한 문제점을 해결하기 위한 본 발명은;The present invention for solving these problems;

교원들이 입력한 업적 정보를 교원 DB에 저장하는 입력 업적 정보 DB 저장 단계와, 상기 입력 업적 정보를 통하여 웹상에서 업적 관련 정보를 추출하는 정보 추출 단계와, 상기 정보 추출 단계에서 추출한 업적 관련 정보와 상기 입력 업적 정보를 비교하여 검증하는 데이터 검증 단계로 이루어지는 것을 특징으로 한다.The input achievement information DB storage step of storing the achievement information input by the teachers in the teacher DB, the information extraction step of extracting achievement-related information from the web through the input achievement information, the achievement-related information extracted in the information extraction step and the It is characterized in that it consists of a data verification step of comparing and verifying input achievement information.

여기서, 상기 입력 업적 정보는 각 교원들이 입력한 자신의 논문, 특허, 저서 또는 역서에 대한 정보인 것을 특징으로 한다.Here, the input achievement information is characterized in that the information about their thesis, patent, book, or translation entered by each teacher.

그리고, 상기 정보 추출 단계는 상기 서버에 구비되는 추출 모듈을 통하여 논문, 특허, 저서 또는 역서에 대한 정보를 획득할 수 있는 웹 사이트에 접속하는 접속 단계와, 접속한 웹 사이트에서 크롤링 작업을 통하여 정보를 추출하는 데이터 추출 단계와, 상기 데이터 추출 단계에서 추출된 정보에서 업적 관련 정보만을 추출하기 위한 데이터 다듬기 단계로 이루어지는 것을 특징으로 한다.In addition, the information extraction step includes an access step of accessing a website from which information on a thesis, patent, book or translation can be obtained through an extraction module provided in the server, and information through a crawling operation on the accessed website It is characterized in that it consists of a data extraction step of extracting the data, and a data trimming step for extracting only achievement-related information from the information extracted in the data extraction step.

이때, 상기 데이터 다듬기 단계는 각 웹 사이트의 포맷 형식을 참조하여 추출된 정보에서 업적 관련 정보 이외의 특수문자를 포함한 관련없는 문자를 제거하는 것을 특징으로 한다.In this case, the data trimming step is characterized by removing irrelevant characters including special characters other than achievement-related information from the extracted information with reference to the format of each website.

한편, 상기 데이터 검증 단계는 상기 교원 DB에 저장된 입력 업적 정보와 상기 정보 추출 단계를 통하여 추출된 업적 관련 정보를 설정된 형식으로 변경하여 처리하는 전처리 단계와, 상기 전처리한 업적 관련 정보와 입력 업적 정보를 비교하여 검증하는 비교 단계로 이루어지는 것을 특징으로 한다.On the other hand, the data verification step includes a pre-processing step of changing the input achievement information stored in the faculty DB and the achievement-related information extracted through the information extraction step into a set format and processing, and the pre-processed achievement-related information and input achievement information It is characterized in that it consists of a comparison step of comparing and verifying.

여기서, 상기 전처리 단계는 상기 입력 업적 정보와 업적 관련 정보가 알파벳일 경우 소문자로 변환하고, 한글일 경우 자음과 모음으로 분리하는 것을 특징으로 한다.Here, the pre-processing step is characterized in that the input achievement information and achievement-related information is converted to a lowercase letter in the case of an alphabet, and separated into consonants and vowels in the case of Hangul.

이때, 상기 비교 단계는 문자열 비교(sequence matching) 방법 중 편집 거리(Levenstein Distance) 알고리즘에서 연속성에 가중치를 부가하여 상기 업적 정보와 업적 관련 정보를 비교하는 것을 특징으로 한다.In this case, the comparison step is characterized in that by adding a weight to the continuity in the editing distance (Levenstein Distance) algorithm among the string comparison (sequence matching) method, the achievement information and the achievement-related information are compared.

또한, 상기 비교 단계는 하기의 과정을 통하여 도출되는 가중치 편집 거리 값(D(i,j))으로 유사성을 판단하는 것을 특징으로 한다.In addition, the comparison step is characterized in that the similarity is determined by the weight editing distance value (D(i,j)) derived through the following process.

[과정][process]

1. 비교하고자 하는 문자열의 A[i]와 B[j]가 일치할 경우,1. If A[i] and B[j] of the string to be compared match,

D(i,j) = D(i-1,j-1)이고, D(i,j) = D(i-1,j-1),

만약 b == True 라면 w = w + 1 이고,If b == True then w = w + 1,

만약 b != True 라면 b = true 이다.If b != True then b = true.

2. 비교하고자 하는 문자열의 A[i]와 B[j]가 일치하지 않을 경우,2. If A[i] and B[j] of the string to be compared do not match,

D(i,j) = min( D(i-1,j)+1/w, D(i,j-1)+1/w, D(i-1,j-1)+1/w )이고,D(i,j) = min( D(i-1,j)+1/w, D(i,j-1)+1/w, D(i-1,j-1)+1/w ) ego,

b = False 이다.b = False.

{ D(i,j) = 가중치 편집 거리 값, 초기값인 D(0,0)=0,{ D(i,j) = weight edit distance value, initial value D(0,0)=0,

A[i] = 문자열 A의 i번째 문자,A[i] = i-th character of string A,

B[j] = 문자열 B의 j번째 문자,B[j] = jth character of string B,

w = 연속성 가중값(w의 초기값은 0),w = continuity weights (w is initialized to 0),

b = 연속성을 판단하기 위한 Boolean(b의 초기값은 False),b = Boolean to determine continuity (initial value of b is False),

i,j = 0 ~ n }i,j = 0 to n }

상기한 구성의 본 발명에 따르면, 교원들이 기재한 업적 정보를 기반으로 하여 해당 정보가 있는 웹 사이트에서 크롤러 기능을 가지는 추출 모듈을 통하여 정보를 추출하여 업적 정보와 자동으로 비교하도록 함으로써, 전체 과정을 자동으로 수행할 수 있어 인력, 시간 및 비용을 상당량 절감할 수 있는 효과가 있다.According to the present invention of the above configuration, based on the achievement information described by the teachers, information is extracted from the website having the corresponding information through an extraction module having a crawler function and automatically compared with the achievement information, so that the entire process is This can be done automatically, resulting in significant manpower, time and cost savings.

그리고, 본 발명은 추출 모듈을 통하여 각 웹 사이트에서 추출한 정보와 업적 정보를 비교할 때, 업적 정보가 알파벳일 경우 모두 소문자로 전환하고, 한글일 경우 자음과 모음으로 분해하는 전처리 과정을 거치도록 하고, 검증시에 문자열 비교 방식(sequence matching)을 기본으로 하되 연속성에 가산점을 부가하도록 하여 유사성에 따라 검증하도록 함으로써, 검증 결과의 정확도를 높일 수 있는 효과가 있다.And, in the present invention, when comparing the information extracted from each website with the achievement information through the extraction module, if the achievement information is an alphabet, it is converted to all lowercase letters, and in the case of Hangul, it undergoes a pre-processing process of decomposing it into consonants and vowels, At the time of verification, it is possible to increase the accuracy of the verification result by using the sequence matching method as a basis, but by adding an additional point to the continuity to perform verification according to the similarity.

도 1은 종래의 크롤링을 통한 검증 방법의 개략도이다.
도 2는 종래의 크롤링을 통한 검증 방법에서 키워드를 포함한 데이터를 추출하기 위한 웹 페이지의 예시도이다.
도 3은 본 발명에 따른 교원 업적 검증 시스템의 개념도이다.
도 4는 본 발명에 따른 교원 업적 검증 시스템의 블럭도이다.
도 5는 본 발명에 따른 교원 업적 검증 시스템의 흐름도이다.
도 6은 Sequnce matching 기법의 일 예인 일반적인 Levenstein Distance의 예시도이다.
도 7은 본 발명에 따른 교원 업적 검증 시스템에서 검증 완료시 결과를 보여주는 화면의 예시도이다.1 is a schematic diagram of a verification method through a conventional crawling.
2 is an exemplary view of a web page for extracting data including keywords in a verification method through a conventional crawling.
3 is a conceptual diagram of a teacher achievement verification system according to the present invention.
4 is a block diagram of a teacher achievement verification system according to the present invention.
5 is a flowchart of a teacher achievement verification system according to the present invention.
6 is an exemplary diagram of a general Levenstein Distance, which is an example of a sequence matching technique.
7 is an exemplary view of a screen showing a result upon completion of verification in the teacher achievement verification system according to the present invention.

이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시 예를 보다 상세하게 설명한다. 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다. 그리고, 본 발명은 다수의 상이한 형태로 구현될 수 있고, 기술된 실시 예에 한정되지 않음을 이해하여야 한다. Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings. The same reference numerals are used for the same components in the drawings, and repeated descriptions of the same components are omitted. And, it should be understood that the present invention may be implemented in many different forms and is not limited to the described embodiments.

도 3은 본 발명에 따른 교원 업적 검증 시스템의 개념도이고, 도 4는 본 발명에 따른 교원 업적 검증 시스템의 블럭도이고, 도 5는 본 발명에 따른 교원 업적 검증 시스템의 흐름도이고, 도 6은 Sequnce matching 기법의 일 예인 일반적인 Levenstein Distance의 예시도이고, 도 7은 본 발명에 따른 교원 업적 검증 시스템에서 검증 완료시 결과를 보여주는 화면의 예시도이다.Figure 3 is a conceptual diagram of a teacher achievement verification system according to the present invention, Figure 4 is a block diagram of the teacher achievement verification system according to the present invention, Figure 5 is a flow chart of the teacher achievement verification system according to the present invention, Figure 6 is a sequence It is an exemplary view of a general Levenstein Distance as an example of a matching technique, and FIG. 7 is an exemplary view of a screen showing a result when verification is completed in the teacher achievement verification system according to the present invention.

본 발명은 교원 업적 검증 시스템에 관한 것으로 도 3 내지 도 5에 도시된 바와 같이 그 구성은 교원들이 입력한 업적 정보를 서버(100)에 구비되는 교원 DB(110)에 저장하는 단계(S100)와 입력 받은 업적 정보를 통하여 웹상에서 업적 관련 정보를 추출하는 정보 추출 단계(S200)와 상기 정보 추출 단계(S200)에서 추출한 업적 관련 정보와 입력 업적 정보를 비교하여 검증하는 데이터 검증 단계(S300)로 이루어진다.The present invention relates to a teacher achievement verification system, and as shown in FIGS. 3 to 5 , the configuration includes the steps of storing achievement information input by teachers in the teacher DB 110 provided in the server 100 (S100) and It consists of an information extraction step (S200) of extracting achievement-related information from the web through the received achievement information and a data verification step (S300) of comparing and verifying the achievement-related information extracted in the information extraction step (S200) with the input achievement information .

여기서, 교원들은 도 3 및 도 4에 도시된 바와 같이 각각이 소지하고 있는 스마트폰, 데스크탑 컵퓨터, 테블릿 등의 단말기(200)를 통하여 통신모듈(120)이 구비되는 서버(100)에 접속하여 각 교원들의 업적 정보를 입력하게 된다.Here, the teachers connect to the server 100 provided with the communication module 120 through the terminals 200 such as smartphones, desktop cup computers, and tablets each possessed as shown in FIGS. 3 and 4 . Thus, the achievement information of each teacher is entered.

이때, 상기 서버(100)에는 교원 DB(110)가 구비되어 있어 각 교원들이 서버(100)에 접속하여 입력하는 업적 정보를 저장하게 되며, 이러한 교원 입력 업적 정보는 통상적으로 논문, 특허, 저서 또는 역서에 대한 정보가 포함된다.At this time, the server 100 is provided with a teacher DB 110 to store achievement information input by each teacher by accessing the server 100 , and such teacher input achievement information is typically a thesis, patent, book or Information about the translation is included.

그리고, 상기 정보 추출 단계(S200)는 논문, 특허, 저서 또는 역서에 대한 정보를 획득할 수 있는 웹 사이트(300)에 접속하는 접속 단계(S210)와 접속한 웹 사이트(300)에서 크롤링 작업을 통하여 정보를 추출하는 데이터 추출 단계(S220)와 상기 데이터 추출 단계(S220)에서 추출된 정보에서 업적 관련 정보만을 추출하기 위한 데이터 다듬기 단계(S230)로 이루어진다.In addition, the information extraction step (S200) includes the access step (S210) of accessing the website 300 that can obtain information on the thesis, patent, book or translation, and the crawling operation on the accessed website 300. It consists of a data extraction step (S220) of extracting information through the data extraction step (S220) and a data trimming step (S230) for extracting only achievement-related information from the information extracted in the data extraction step (S220).

여기서, 상기 서버(100)에는 추출 모듈(130)이 구비되는데, 상기 추출 모듈(130)은 크롤러의 기능을 수행하여 웹 사이트(300)에서 크롤링 작업을 통하여 정보를 추출하게 되는데, 상기 추출 모듈(130)에는 논문, 특허, 저서 또는 역서에 대한 정보를 획득할 수 있는 웹 사이트(300)의 정보가 저장되는 웹 DB(134)가 구비된다.Here, the server 100 is provided with an extraction module 130, and the extraction module 130 performs the function of a crawler to extract information from the website 300 through a crawling operation, the extraction module ( 130) is provided with a web DB 134 in which information of a web site 300 that can obtain information on a thesis, patent, book, or translation is stored.

그래서, 상기 접속 단계(S210)에서는 상기 추출 모듈(130)이 웹 DB(134)에 저장된 웹 사이트의 정보를 통하여 해당 웹 사이트(300)에 접속하게 되고, 데이터 추출 단계(S220)에서는 전술한 바와 같이 크롤링 작업을 통하여 입력 업적 정보와 관련이 있는 정보를 추출하여 상기 추출 모듈(130)에 구비되는 임시 저장부(132)에 저장하게 된다.So, in the access step (S210), the extraction module 130 accesses the website 300 through the information of the website stored in the web DB 134, and in the data extraction step (S220), as described above Similarly, information related to the input achievement information is extracted through a crawling operation and stored in the temporary storage unit 132 provided in the extraction module 130 .

한편, 상기 데이터 다듬기 단계(S230)에서는 교원들의 업적 관련 정보 이외의 정보를 삭제하여 다듬는 작업을 진행하게 되는데, 상기 웹 DB(134)에는 각 웹 사이트(300)의 포맷에 대한 정보를 참고하여 의미없는 문자를 삭제하게 된다.On the other hand, in the data trimming step (S230), the work is performed by deleting information other than the achievement-related information of the teachers. Deletes missing characters.

즉, 각 웹 사이트(300)에서 각 필드를 표시하는 포맷이 차이가 있는데, 예를 들어 논문에서 제목을 표시할 경우에도 제목의 앞과 뒤에 콜론(:)이나 세미콜론(;)을 쓸 수도 있고, 따옴표(")나 다른 특수 문자를 사용할 수도 있으며, 각 특수 문자와 제목 사이에 다수의 공백을 사용할 수도 있다.That is, there is a difference in the format for displaying each field in each website 300. For example, even when displaying a title in a thesis, a colon (:) or semicolon (;) may be written before and after the title, You can use quotation marks (") or other special characters, and you can also use multiple spaces between each special character and the title.

여기서, 상기 데이터 다듬기 단계(S230)에서는 상기 웹 DB(134)에는 각 웹 사이트(300)의 포맷 정보가 포함되어 있어 이를 참고하여 상기 추출 모듈(130)이 추출한 정보에서 업적 관련 정보 이외의 특수문자를 포함한 관련없는 문자를 제거하게 된다.Here, in the data trimming step (S230), the web DB 134 includes the format information of each web site 300, and the information extracted by the extraction module 130 with reference to this includes special characters other than achievement-related information. Removes extraneous characters including .

이렇게 상기 추출 모듈(130)에 의해 처리된 업적 관련 정보는 추출 모듈(130)에 구비되는 임시 저장부(132)에 저장되어 후술할 데이터 검증 단계(S300)에서 입력 업적 정보와 비교할 때, 보다 정확하게 비교할 수 있게 된다.In this way, the achievement-related information processed by the extraction module 130 is stored in the temporary storage unit 132 provided in the extraction module 130 and compared with the achievement information input in the data verification step (S300) to be described later, more accurately can be compared.

즉, 본 발명에서는 웹 사이트(300)에서 추출한 업적 관련 정보와 교원이 입력한 업적 정보를 비교할 때, 문자열 비교(Sequence Matching) 방법을 사용하게 되는데, 문자열 비교 방법은 문자열을 구성하는 각 문자를 순서대로 비교하여 일치하는 지를 판단하기 때문에 비교 대상 문자열에 웹 사이트(300)의 포맷을 구성하는 특수 문자가 포함되어 있다면 정확한 비교가 어렵게 되므로 이러한 의미없는 문자들을 제거하여 비교시에 정확도를 높이게 된다.That is, in the present invention, when comparing the achievement-related information extracted from the website 300 with the achievement information input by the teacher, a sequence matching method is used. If the comparison target string includes special characters constituting the format of the web site 300, it is difficult to make accurate comparisons because it is compared to determine whether they match.

그리고, 상기 데이터 검증 단계(S300)는 상기 교원 DB(110)에 저장된 입력 업적 정보와 상기 정보 추출 단계(S200)를 통하여 추출된 업적 관련 정보를 설정된 형식으로 변경하여 처리하는 전처리 단계(S310)와 상기 전처리한 업적 관련 정보와 입력 업적 정보를 비교하여 검증하는 비교 단계(S320)로 이루어진다.And, the data verification step (S300) is a pre-processing step (S310) of processing the input achievement information stored in the faculty DB 110 and the achievement-related information extracted through the information extraction step (S200) to a set format and processing; A comparison step (S320) of comparing and verifying the pre-processed achievement-related information with the input achievement information is performed.

여기서, 상기 전처리 단계(S310)는 교원 DB(110)에 저장된 입력 업적 정보와 상기 추출 모듈(130)을 통하여 추출된 업적 관련 정보가 알파벳일 경우에는 모두 소문자로 변환하여 통일하고, 한글일 경우에는 각 글자를 모두 자음과 모음으로 분리하여 교원 DB(110)에 별도로 할당된 부분에 저장하게 된다.Here, in the pre-processing step (S310), when the input achievement information stored in the teacher DB 110 and the achievement-related information extracted through the extraction module 130 are alphabetic, they are all converted to lowercase and unified, and in the case of Korean All letters are separated into consonants and vowels and stored in a part allocated separately to the teacher DB 110 .

즉, 한글일 경우를 예로 들면, 홍 길 동과 홍 갈 동을 비교할 때, 3글자 중에서 중간의 1글자가 차이가 나므로 1/3이 다르기 때문에 상당히 다르게 보이지만, 이를 자모음 단위로 분해하게 되면 ㅎㅗㅇㄱㅣㄹㄷㅗㅇ 과 ㅎㅗㅇㄱㅏㄹㄷㅗㅇ을 비교하게 되므로 9 음소 중에서 1가지 음소만 차이가 나므로 1/9이 다르기 때문에 유사성이 높아 보이게 된다.In other words, taking the case of Hangeul as an example, when comparing Hong Gil -dong and Hong Gal -dong, one middle letter out of the three letters is different, so 1/3 is different, so it looks quite different. When comparing ㅇㄱㅣㄴㅗㅇ and ㅎㅗㅇㄱㄹㅇㅇ, only one phoneme out of the 9 phonemes is different, so 1/9 is different, so the similarity seems to be high.

추가로 알파벳일 경우에는 이니셜(initail)로 표기하는 경우도 있기 때문에 이니셜로 변환한 것을 교원 DB(110)에 추가로 할당된 부분에 저장할 수도 있다.In addition, since initials are sometimes expressed in the case of alphabets, the converted initials may be stored in a portion additionally allocated to the faculty DB 110 .

한편, 상기 비교 단계(S330)는 문자열 비교(sequence matching) 방법 중 편집 거리(Levenstein Distance) 알고리즘에서 연속성에 가중치를 부가하여 상기 입력 업적 정보와 업적 관련 정보를 비교하여 검증하게 된다.Meanwhile, in the comparison step (S330), weight is added to continuity in the Levenstein Distance algorithm among the sequence matching methods to compare and verify the input achievement information and achievement-related information.

여기서, 일반적인 편집 거리(Levenstein Distance) 알고리즘은 두 문자열을 비교하여 동일한 문자열을 만들기 위해서 삽입(insertion), 삭제(deletion), 대체(replacement)의 3가지 연산 중 한가지를 몇 번을 수행하여야 하는지를 판단하는 것이다.Here, the general editing distance (Levenstein Distance) algorithm compares two strings and determines how many times one of three operations of insertion, deletion, and replacement should be performed to make the same string. will be.

이러한 과정을 프로그래밍 적으로 표현한 것을 살펴보면:Looking at the programmatic representation of this process:

D(i,j) = D(i-1,j-1)이고, D(i,j) = D(i-1,j-1),

D(i,j) = min( D(i-1,j), D(i,j-1), D(i-1,j-1) )이다.D(i,j) = min( D(i-1,j), D(i,j-1), D(i-1,j-1) ).

이때, D(i,j) = 편집 거리 값(전술한 3가지 연산을 몇번 수행하는지)을 의미하며 초기값인 D(0,0)=0 이고, 두가지 문자 열 A,B 에서 A[i] = 문자열 A의 i번째 문자, B[j] = 문자열 B의 j번째 문자을 의미하며 변수인 i와 j는 0 ~ n의 범위의 정수로서 각 문자열을 구성하는 글자의 수가 된다.At this time, D(i,j) = edit distance value (how many times the above three operations are performed), the initial value is D(0,0)=0, and A[i] in the two strings A, B = i-th character of character string A, B[j] = means j-th character of character string B

그래서, 문자열을 구성하는 각 문자를 비교하기 위하여 편집 거리(Levenstein Distance) 알고리즘을 풀이한 상기 과정 수행하여 편집 거리 값(D(i,j))을 도출할 수 있고, 편집 거리 값은 전술한 바와 같이 두가지 문자열을 비교하여 동일하게 바꾸기 위해서 전술한 3가지 연산 중 한가지를 몇번을 수행하여야 하는지를 나타내며, 이러한 편집 거리 값이 클수록 두가지 문자열의 유사성이 낮은 것을 의미한다.Therefore, in order to compare each character constituting the character string, the editing distance value (D(i,j)) can be derived by performing the above process of solving the editing distance (Levenstein Distance) algorithm, and the editing distance value is as described above. It indicates how many times one of the above three operations must be performed in order to compare and change two strings to be the same, and the larger the edit distance value, the lower the similarity between the two strings.

이러한 편집 거리(Levenstein Distance) 알고리즘의 예는 도 6에 도시된 바와 같이 문자열 ABC와 문자열 QWC를 비교한 테이블을 동하여 명확하게 알 수 있다.An example of such an editing distance (Levenstein Distance) algorithm can be clearly seen by moving a table comparing the character string ABC and the character string QWC as shown in FIG. 6 .

그런데, 일반적인 편집 거리(Levenstein Distance) 알고리즘에서는 단순히 차이가 나는 문자가 몇 개인지만을 확인할 수 있는 것으로서, 차이나는 문자의 개수가 많지만 좀더 유사성이 높은 문자열을 확인할 수 없게 된다.However, in a general editing distance (Levenstein Distance) algorithm, only a few different characters can be checked, and although the number of different characters is large, a character string with higher similarity cannot be identified.

예를 들자면 co m pa r is o n과 co n pa l is e n은 전혀 관련이 없는 문자열이지만, 밑줄친 3부분에서 차이가 있어 편집 거리는 3이 되며, compar ison 과 compar e 는 상당한 유사성을 가지지만 두 문자열은 밑줄친 4부분에서 차이가 있어 편집거리는 4가 되므로 유사성이 떨어진다고 판단하게 된다.For example, com pa r is o n and co n pa l is e n are completely unrelated strings, but there is a difference in the underlined 3 parts, so the edit distance becomes 3, and compar ison and compar e do not have a significant similarity. Since there is a difference in the 4 underlined parts of the dumpling strings, the editing distance becomes 4, so it is judged that the similarity is inferior.

그래서, 본 발명에서는 이러한 문제점을 해결하기 위하여 편집 거리(Levenstein Distance) 알고리즘을 사용하기는 하지만 연속적으로 일치하는 부분이 많을수록 유사성이 높다는 점을 고려하여 전술한 바와 같이 연속성에 가중치를 부가하여 유사성을 판단하게 된다.Therefore, in the present invention, although the Levenstein Distance algorithm is used to solve this problem, the similarity is determined by adding a weight to the continuity as described above in consideration of the fact that the more consecutive matching parts, the higher the similarity. will do

그리고, 편집 거리(Levenstein Distance) 알고리즘에 가중치를 부가한 과정을 프로그래밍 적으로 표현한 것을 살펴보면:And, looking at the programmatic representation of the weighting process of the Levenstein Distance algorithm:

D(i,j) = D(i-1,j-1)이고, D(i,j) = D(i-1,j-1),

만약 b == True 라면 w = w + 1 이고,If b == True then w = w + 1,

만약 b != True 라면 b = True 이다.If b != True then b = True.

b = False 이다.b = False.

A[i] = 문자열 A의 i번째 문자,A[i] = i-th character of string A,

B[j] = 문자열 B의 j번째 문자,B[j] = jth character of string B,

i,j = 0 ~ n }i,j = 0 to n }

여기서, 본 발명의 가중치가 부가된 편집 거리(Levenstein Distance) 알고리즘으로 두가지 문자열을 비교하여 보면, Here, when comparing two strings with the weighted Levenstein Distance algorithm of the present invention,

1. co m pa r is o n VS co n pa l is e n (3번의 연산이 필요함)1. com pa r is o n VS co n pa l is e n (requires 3 operations)

편집 거리 값 = 1 + 1/2 + 1/3 = 11/6 로서 약 1.83이 된다.The edit distance value = 1 + 1/2 + 1/3 = 11/6, which is about 1.83.

이때, 그 값의 도출 과정을 살펴보면 : w 값은 연속으로 일치하는 문자가 있을 경우 값이 증가하게(비교 당시의 b 값이 True일 경우) 되고 편집 거리 값의 초기값은 0이다.At this time, looking at the process of deriving the value: If there are consecutively matching characters, the value of w increases (when the b value at the time of comparison is True), and the initial value of the edit distance value is 0.

문자 'c'를 비교할 때, 두 문자가 동일하므로 D(1,1)=D(0,0)=0 이 되며, b 값이 초기값인 False 이므로 w 값은 0이고, 이후에 b값이 True로 변경된다.When comparing the character 'c', since the two characters are the same, D(1,1)=D(0,0)=0. changed to True.

문자 'o'를 비교할 때, 두 문자가 동일하므로 D(2,2)=D(1,1)=0 이 되며, b 값이 True 이므로 w 값은 1이 증가되어 1이되고, b 값은 변하지 않는다.When comparing the character 'o', since the two characters are the same, D(2,2)=D(1,1)=0 , since the value of b is True, the value of w is incremented by 1 to become 1, and the value of b is does not change

문자 'm'을 비교할 때, 두 문자가 일치하지 않으므로 D(3,3) = D(2,2)+1/w = 0+1/1이 되며, b 값은 False로 변경되고, w 값은 변하지 않아서 여전히 1이다.When comparing the character 'm', the two characters do not match, so D(3,3) = D(2,2)+1/w = 0+1/1, the value of b is changed to False, and the value of w does not change and is still 1.

문자 'p'을 비교할 때, 두 문자가 동일하므로 D(4,4)=D(3,3)=1 이 되며, 기존 b 값이 False이므로 w값은 변하지 않고 1이며 b는 true로 변경된다.When comparing the character 'p', since the two characters are the same, D(4,4)=D(3,3)=1. .

문자 'a'을 비교할 때, 두 문자가 동일하므로 D(5,5)=D(4,4)=1 이 되며, 기존 b 값이 True이므로 w값은 1이 증가되어 2가 되고, b는 변하지 않는다.When comparing the character 'a', since the two characters are the same, D(5,5)=D(4,4)=1 , and since the existing b value is True, the w value is increased by 1 to become 2, and b is does not change

문자 'r'을 비교할 때, 두 문자가 일치하지 않으므로 D(6,6) = D(5,5)+1/w = 1+1/2이 되며, b 값은 False로 변경되고, w 값은 변하지 않아서 여전히 2이다.When comparing the character 'r', the two characters do not match, so D(6,6) = D(5,5)+1/w = 1+1/2, the value of b is changed to False, and the value of w does not change and is still 2.

문자 'i'을 비교할 때, 두 문자가 동일하므로 D(7,7) = D(6,6) = 1+1/2이 되며, 기존 b 값이 False이므로 w값은 변하지 않고 1이며 b는 true로 변경된다.When comparing the character 'i', since the two characters are the same, D(7,7) = D(6,6) = 1+1/2. changed to true.

문자 's'을 비교할 때, 두 문자가 동일하므로 D(8,8)=D(7,7)=1+1/2이 되며, 기존 b 값이 True이므로 w값은 1이 증가되어 3이 되고, b는 변하지 않는다.When comparing the character 's', since the two characters are the same, D(8,8)=D(7,7)=1/2/2, and since the existing b value is True, the w value increases by 1 to become 3. , and b does not change.

문자 'o'을 비교할 때, 두 문자가 일치하지 않으므로 D(9,9) = D(8,8)+1/w = 1+1/2+1/3이 되며, b 값은 False로 변경되고, w 값은 변하지 않아서 여전히 3이다.When comparing the character 'o', the two characters do not match, so D(9,9) = D(8,8)+1/w = 1+1/2+1/3, and the value of b is changed to False , and the value of w does not change and is still 3.

문자 'n'을 비교할 때, 두 문자가 동일하므로 D(10,10) = D(9,9) = 1+1/2+1/3이 되며, 기존 b 값이 False이므로 w값은 변하지 않고 3이며 b는 true로 변경된다.When comparing the character 'n', since the two characters are the same, D(10,10) = D(9,9) = 1+1/2+1/3. Since the existing b value is False, the w value does not change. 3 and b is changed to true.

따라서, 최종 가중치가 적용된 편집 거리값인 D(10,10)값은 1+1/2+1/3이 된다.Therefore, the D(10,10) value, which is the final weighted edit distance value, becomes 1+1/2+1/3.

2. compar ison VS compar e (4번의 연산이 필요함)2. compar ison VS compar e (requires 4 operations)

편집 거리 값 = 1/5 + 1/5 + 1/5 + 1/5 = 4/5 로서 0.8이 된다.The edit distance value = 1/5 + 1/5 + 1/5 + 1/5 = 4/5, resulting in 0.8.

따라서, 본 발명의 가중치가 부가된 편집 거리(Levenstein Distance) 알고리즘으로 두가지 문자열을 비교하였을 때, 2번째의 경우가 편집 거리 값이 0.8로서 더 낮게 나와 유사성이 높은 것으로 판단되며 이는 실제 유사성과도 일치하는 결과가 도출되는 것을 알 수 있다.Therefore, when two strings are compared with the weighted Levenstein Distance algorithm of the present invention, in the second case, the edit distance value is lower as 0.8, indicating high similarity, which is consistent with the actual similarity. It can be seen that the resulting

그래서, 본 발명에서는 상기 데이터 검증 단계(S300)를 구성하는 비교 단계(S330)에서 가중치가 부가된 편집 거리(Levenstein Distance) 알고리즘을 사용하여 업적 정보와 업적 관련 정보를 비교함으로써, 교원의 입력시나 각 웹 사이트에 기재시 오탈자가 있더라도 유사성을 통하여 검증의 정확도를 높일 수 있게 된다.So, in the present invention, by comparing achievement information and achievement-related information using a weighted editing distance (Levenstein Distance) algorithm in the comparison step (S330) constituting the data verification step (S300), when inputting by a teacher or each Even if there is a typo when writing on the website, the accuracy of verification can be increased through similarity.

그리고, 상기 데이터 검증 단계(S300) 이후에는 검증 결과를 확인하기 위한 검증 결과 확인 단계(S400)가 더 수행될 수 있다.In addition, after the data verification step (S300), a verification result checking step (S400) for checking the verification result may be further performed.

여기서, 상기 데이터 검증 단계(S300)에서는 검증시 일치되지 않는 부분에 대한 정보를 검증 모듈(140)에 별도로 구비되는 결과 저장부(142)에 저장하게 되며, 상기 검증 결과 확인 단계(S400)에서는 상기 결과 저장부(142)에 저장된 정보를 참조하여 도 7에 도시된 바와 같이 간략한 코멘트와 함께 결과를 출력하게 된다.Here, in the data verification step (S300), information on the parts that do not match during verification is stored in the result storage unit 142 separately provided in the verification module 140, and in the verification result verification step (S400), the As shown in FIG. 7 with reference to the information stored in the result storage unit 142, the result is output together with a brief comment.

이때, 상기 검증 결과 확인 단계(S400)에서는 입력 데이터에 오류가 있을 경우에는 결과 정보를 해당 교원 DB(110)에 저장된 정보를 통하여 교원에게 알려주어 오류를 수정할 수 있도록 한다.At this time, in the verification result confirmation step (S400), if there is an error in the input data, the result information is notified to the teacher through the information stored in the corresponding teacher DB 110 so that the error can be corrected.

이상에서 본 발명의 바람직한 실시 예를 설명하였으나, 본 발명의 권리범위는 이에 한정되지 않으며, 본 발명의 실시 예와 실질적으로 균등한 범위에 있는 것까지 본 발명의 권리 범위가 미치는 것으로 본 발명의 정신을 벗어나지 않는 범위 내에서 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형 실시가 가능한 것이다.Although preferred embodiments of the present invention have been described above, the scope of the present invention is not limited thereto, and the scope of the present invention extends to those substantially equivalent to the embodiments of the present invention. Various modifications are possible by those of ordinary skill in the art to which the invention pertains without departing from the scope of the invention.

본 발명은 교원 업적 검증 시스템에 관한 것으로서, 더욱 상세하게는 교원이 입력한 업적 정보를 검증하는 전체 과정을 자동으로 수행하도록 함으로써, 인력, 시간 및 비용을 상당량 절감할 수 있을 뿐만 아니라, 검증시에 동일성이 아니라 유사성을 판단하도록 하여 보다 정확하게 검증할 수 있는 교원 업적 검증 시스템에 관한 것이다.The present invention relates to a teacher achievement verification system, and more particularly, by automatically performing the entire process of verifying achievement information input by a teacher, it is possible to significantly reduce manpower, time and cost, and to It relates to a teacher performance verification system that can be verified more accurately by judging similarity rather than identity.

100 : 서버 110 : 교원 DB
120,210 : 통신모듈 130 : 추출 모듈
140 : 검증 모듈 200 : 단말기
220 : 디스플레이부 300 : 웹 사이트
S100 : 정보 입력 단계
S200 : 정보 추출 단계
S300 : 데이터 검증 단계
S400 : 검증 결과 확인 단계100: Server 110: Faculty DB
120,210: communication module 130: extraction module
140: verification module 200: terminal
220: display unit 300: website
S100: information input step
S200: information extraction step
S300: data verification step
S400: Verification result confirmation step

Claims

An input achievement information DB storage step of storing the achievement information entered by the teachers in the teacher DB;
An information extraction step of extracting achievement-related information from the web through the input achievement information;
Teacher achievement verification system, characterized in that consisting of a data verification step of verifying by comparing the achievement-related information extracted in the information extraction step and the input achievement information.

According to claim 1,
The input achievement information is a teacher achievement verification system, characterized in that the information on their thesis, patent, book, or translation entered by each teacher.

3. The method of claim 2,
The information extraction step includes an access step of accessing a website that can obtain information on a thesis, patent, book or translation through an extraction module provided in the server;
A data extraction step of extracting information from the accessed website through crawling;
Teacher achievement verification system, characterized in that consisting of a data trimming step for extracting only achievement-related information from the information extracted in the data extraction step.

4. The method of claim 3,
The data trimming step is a teacher achievement verification system, characterized in that by removing irrelevant characters including special characters other than achievement-related information from the extracted information with reference to the format of each website.

4. The method of claim 3,
The data verification step includes a pre-processing step of processing the input achievement information stored in the teacher DB and the achievement-related information extracted through the information extraction step to a set format;
Teacher achievement verification system, characterized in that it comprises a comparison step of comparing and verifying the pre-processed achievement-related information and the input achievement information.

6. The method of claim 5,
In the pre-processing step, if the input achievement information and achievement-related information is an alphabet, it is converted to a lowercase letter, and in the case of Hangul, the teacher achievement verification system, characterized in that it is separated into consonants and vowels.

6. The method of claim 5,
The comparison step is a teacher achievement verification system, characterized in that by adding a weight to the continuity in the editing distance (Levenstein Distance) algorithm among the string comparison (sequence matching) method to compare the achievement information with the achievement-related information.

8. The method of claim 7,
In the comparison step, the teacher achievement verification system, characterized in that the similarity is determined by the weight editing distance value (D(i,j)) derived through the following process.

[process]
1. If A[i] and B[j] of the string to be compared match,
D(i,j) = D(i-1,j-1),
If b == True then w = w + 1,
If b != True then b = true.

2. If A[i] and B[j] of the string to be compared do not match,
D(i,j) = min( D(i-1,j)+1/w, D(i,j-1)+1/w, D(i-1,j-1)+1/w ) ego,
b = False.

{ D(i,j) = weight edit distance value, initial value D(0,0)=0,
A[i] = i-th character of string A,
B[j] = jth character of string B,
w = continuity weights (w is initialized to 0),
b = Boolean to determine continuity (initial value of b is False),
i,j = 0 to n }