KR102468975B1

KR102468975B1 - Method and apparatus for improving accuracy of recognition of precedent based on artificial intelligence

Info

Publication number: KR102468975B1
Application number: KR1020220071534A
Authority: KR
Inventors: 황원석; 조경연
Original assignee: 주식회사 엘박스
Priority date: 2022-06-13
Filing date: 2022-06-13
Publication date: 2022-11-28
Also published as: KR20230172376A

Abstract

Provided are a method and device for improving accuracy of case precedent recognition based on artificial intelligence. The method may include the steps of: generating multiple types of optical character recognition (OCR) data by processing original data containing the case precedent contents; extracting errors in the content of the case precedent using the multiple types of OCR data; and correcting the extracted errors.

Description

Artificial intelligence-based case law recognition accuracy improvement method and apparatus

본 개시는 인공지능 기반의 판례 인식의 정확도 향상 방법 및 장치에 관한 것이다. The present disclosure relates to a method and apparatus for improving the accuracy of case precedent recognition based on artificial intelligence.

최근, 디지털 저장 매체의 급속한 보급에 따라 기존에 지면으로 존재하였던 문서들에 대한 디지털화 작업이 활발히 전개되고 있다. 이와 같은 현상은 문서에 포함된 문자를 자동으로 인식하는 기술인 광학 문자 인식 기술 (Optical Character Recognition: OCR)의 발전에 따라 더욱 더 가속화되고 있는 실정이다.Recently, with the rapid spread of digital storage media, digitization of documents that previously existed on paper has been actively developed. This phenomenon is accelerating further with the development of Optical Character Recognition (OCR), which is a technology for automatically recognizing characters included in documents.

광학 문자 인식(OCR)은 사람이 쓰거나 기계로 인쇄한 문자의 영상을 이미지 스캐너로 획득하여 기계가 읽을 수 있는 문자로 변환하는 것이다.Optical Character Recognition (OCR) is the acquisition of images of human-written or machine-printed characters with an image scanner and conversion into machine-readable characters.

한편, 판결문은 그 특성상 현존하는 양도 많을 뿐만 아니라 새로운 판결이 끊임없이 생성되며, 동시에 기준이 되는 판례가 변경되는 경우도 있어 이를 분류, 정리하는 것이 매우 어렵다. 또한, 판결문을 모두 읽어보지 않는 이상 이것이 본인에게 중요한 판결인지 혹은 필요한 내용인지 알기 어렵다.On the other hand, it is very difficult to classify and organize judgments because, by their nature, there are not only a large number of existing judgments, but also new judgments are constantly created, and at the same time, there are cases in which the standard precedent is changed. Also, unless you read all the rulings, it is difficult to know whether this is an important ruling for you or whether it is necessary.

등록특허공보 제10-1516684호, 2015.04.24.Registered Patent Publication No. 10-1516684, 2015.04.24.

본 개시가 해결하고자 하는 과제는 인공지능 기반의 판례 인식의 정확도 향상 방법 및 장치를 제공하는 것이다.The problem to be solved by the present disclosure is to provide a method and apparatus for improving the accuracy of case precedent recognition based on artificial intelligence.

다만, 본 개시가 해결하고자 하는 과제는 상기된 바와 같은 과제로 한정되지 않으며, 또 다른 과제들이 존재할 수 있다.However, the problem to be solved by the present disclosure is not limited to the above problem, and other problems may exist.

상술한 과제를 해결하기 위한 본 개시의 일 면에 따른 인공지능 기반의 판례 인식의 정확도 향상 방법은, 판례 내용이 포함된 원본 데이터를 가공하여 복수의 OCR(Optical Character Recognition) 데이터를 생성하는 단계, 상기 복수의 OCR 데이터를 이용하여 상기 판례 내용의 오류를 추출하는 단계 및 상기 추출된 오류를 교정하는 단계를 포함하고, 상기 오류 추출 단계는, 검증대상 글자의 양방향 문맥(context)을 기준으로 상기 복수의 OCR 데이터를 분석하여 제1 오류를 추출하는 단계를 포함할 수 있다.A method for improving the accuracy of artificial intelligence-based precedent recognition according to an aspect of the present disclosure for solving the above problems includes generating a plurality of OCR (Optical Character Recognition) data by processing original data containing precedent contents, Extracting an error in the content of the precedent using the plurality of OCR data and correcting the extracted error, wherein the error extraction step includes the plurality of errors based on the bidirectional context of the character to be verified. It may include extracting a first error by analyzing OCR data of .

또한, 상기 OCR 데이터 생성 단계는, 상기 원본 데이터에 대해 OCR을 기 설정된 횟수만큼 반복하여 상기 횟수에 상응하는 개수의 OCR 데이터를 생성하는 것일 수 있다.In addition, the OCR data generating step may include generating OCR data corresponding to the number of times by repeating OCR on the original data a predetermined number of times.

또한, 상기 복수의 OCR 데이터 각각은, 상기 횟수만큼 반복 수행된 OCR의 각각의 정확도에 따라 동일한 위치의 문자가 상이하게 표현될 수 있다.In addition, in each of the plurality of OCR data, characters at the same position may be differently expressed according to the accuracy of each OCR performed repeatedly the number of times.

또한, 상기 정확도는, 상기 원본 데이터 내에서 문자로 인식된 부분과 문자가 아닌 이미지로 인식된 부분의 비율에 기초하여 산출될 수 있다.In addition, the accuracy may be calculated based on a ratio of a part recognized as a text and a part recognized as an image in the original data.

또한, 상기 제1 오류 추출 단계는, 상기 검증대상 글자의 양방향 문맥을 기준으로 상기 복수의 OCR 데이터 각각을 비교하고, 상기 비교된 결과에 기초하여 상기 복수의 OCR 데이터 간의 차이가 존재하는 글자를 추출하는 것일 수 있다.In addition, the first error extraction step compares each of the plurality of OCR data based on the bidirectional context of the character to be verified, and extracts a character having a difference between the plurality of OCR data based on the compared result. it may be

또한, 상기 오류 교정 단계는, 상기 제1 오류에 대해 교정 방향을 설정하고, 상기 제1 오류에 상기 설정된 교정 방향을 적용하는 것일 수 있다.The error correction step may include setting a correction direction for the first error and applying the set correction direction to the first error.

또한, 상기 교정 방향은, 잘못된 문자를 삭제하는 제1 유형, 누락된 문자를 삽입하는 제2 유형, 복수의 문자 중 하나를 선택하는 제3 유형 및 잘못된 문자를 삭제하고 삭제된 자리에 새로운 문자를 삽입하는 제4 유형을 포함할 수 있다.In addition, the correction direction includes a first type of deleting an erroneous character, a second type of inserting a missing character, a third type of selecting one of a plurality of characters, and a new character after deleting the erroneous character and replacing the deleted character. A fourth type of insertion may be included.

상술한 과제를 해결하기 위한 본 개시의 다른 면에 따른 인공지능 기반의 판례 인식의 정확도 향상 장치는, 통신부, 인공지능 기반의 판례 인식의 정확도 향상을 위한 적어도 하나의 프로세스를 저장하고 있는 메모리 및 상기 프로세스에 따라 동작하는 프로세서를 포함하고, 상기 프로세서는, 상기 프로세스를 기반으로, 판례 내용이 포함된 원본 데이터를 가공하여 복수의 OCR(Optical Character Recognition) 데이터를 생성하고, 상기 복수의 OCR 데이터를 이용하여 상기 판례 내용의 오류를 추출하고, 상기 추출된 오류를 교정하고, 상기 프로세서는 상기 오류 추출 시에, 검증대상 글자의 양방향 문맥(context)을 기준으로 상기 복수의 OCR 데이터를 분석하여 제1 오류를 추출할 수 있다.An apparatus for improving the accuracy of artificial intelligence-based precedent recognition according to another aspect of the present disclosure for solving the above problems includes a communication unit, a memory for storing at least one process for improving the accuracy of artificial intelligence-based case precedent recognition, and the and a processor operating according to a process, wherein the processor generates a plurality of OCR (Optical Character Recognition) data by processing original data including precedent contents based on the process, and uses the plurality of OCR data. to extract an error in the contents of the precedent, correct the extracted error, and when extracting the error, the processor analyzes the plurality of OCR data based on the bidirectional context of the character to be verified to determine the first error can be extracted.

또한, 상기 프로세서는 상기 OCR 데이터 생성 시에, 상기 원본 데이터에 대해 OCR을 기 설정된 횟수만큼 반복하여 상기 횟수에 상응하는 개수의 OCR 데이터를 생성할 수 있다.In addition, when generating the OCR data, the processor may repeat OCR on the original data a predetermined number of times to generate OCR data corresponding to the number of times.

이 외에도, 본 개시를 구현하기 위한 다른 방법, 다른 시스템 및 상기 방법을 실행하기 위한 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공될 수 있다.In addition to this, another method for implementing the present disclosure, another system, and a computer readable recording medium recording a computer program for executing the method may be further provided.

본 개시의 기타 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Other specific details of the disclosure are included in the detailed description and drawings.

상술한 본 개시에 의하면, 이미지 형태의 판례에 대해 OCR을 여러 번 수행해서 획득된 복수의 OCR 데이터에 대한 비교 분석을 통해 OCR 인식 시 발생된 오류를 추출하고 이를 교정함으로써 사용자에게 보다 정확도 높은 텍스트 형태의 판례를 제공할 수 있다.According to the present disclosure described above, errors generated during OCR recognition are extracted and corrected through comparative analysis of a plurality of OCR data obtained by performing OCR several times for precedents in the form of images, thereby providing users with more accurate text form. case law can be provided.

본 개시의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 본 개시에 따른 인공지능 기반의 판례 인식의 정확도 향상 장치의 블록도이다.
도 2는 본 개시에 따른 인공지능 기반의 판례 인식의 정확도 향상 방법의 순서도이다.
도 3은 도 2의 단계 S120의 구체적인 방법의 순서도이다.
도 4는 본 개시에 따른 제1 오류를 설명하기 위한 도면이다.
도 5는 본 개시에 따른 오류 교정을 설명하기 위한 도면이다.
도 6은 도 2의 각 단계의 수행에 이용되는 인공지능 알고리즘을 설명하기 위한 도면이다.1 is a block diagram of an apparatus for improving the accuracy of case precedent recognition based on artificial intelligence according to the present disclosure.
2 is a flowchart of a method for improving the accuracy of case precedent recognition based on artificial intelligence according to the present disclosure.
3 is a flowchart of a specific method of step S120 of FIG. 2 .
4 is a diagram for explaining a first error according to the present disclosure.
5 is a diagram for explaining error correction according to the present disclosure.
6 is a diagram for explaining an artificial intelligence algorithm used for performing each step of FIG. 2 .

본 개시 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다. 본 개시가 실시예들의 모든 요소들을 설명하는 것은 아니며, 본 개시가 속하는 기술분야에서 일반적인 내용 또는 실시예들 간에 중복되는 내용은 생략한다. 명세서에서 사용되는 ‘부, 모듈, 부재, 블록’이라는 용어는 소프트웨어 또는 하드웨어로 구현될 수 있으며, 실시예들에 따라 복수의 '부, 모듈, 부재, 블록'이 하나의 구성요소로 구현되거나, 하나의 '부, 모듈, 부재, 블록'이 복수의 구성요소들을 포함하는 것도 가능하다. Like reference numbers designate like elements throughout this disclosure. The present disclosure does not describe all elements of the embodiments, and general content or overlapping content between the embodiments in the technical field to which the present disclosure belongs is omitted. The term 'unit, module, member, or block' used in the specification may be implemented as software or hardware, and according to embodiments, a plurality of 'units, modules, members, or blocks' may be implemented as one component, It is also possible that one 'part, module, member, block' includes a plurality of components.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐 아니라, 간접적으로 연결되어 있는 경우를 포함하고, 간접적인 연결은 무선 통신망을 통해 연결되는 것을 포함한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only the case of being directly connected but also the case of being indirectly connected, and indirect connection includes being connected through a wireless communication network. do.

또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.In addition, when a certain component is said to "include", this means that it may further include other components without excluding other components unless otherwise stated.

명세서 전체에서, 어떤 부재가 다른 부재 "상에" 위치하고 있다고 할 때, 이는 어떤 부재가 다른 부재에 접해 있는 경우뿐 아니라 두 부재 사이에 또 다른 부재가 존재하는 경우도 포함한다.Throughout the specification, when a member is said to be located “on” another member, this includes not only a case where a member is in contact with another member, but also a case where another member exists between the two members.

제 1, 제 2 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위해 사용되는 것으로, 구성요소가 전술된 용어들에 의해 제한되는 것은 아니다. Terms such as first and second are used to distinguish one component from another, and the components are not limited by the aforementioned terms.

단수의 표현은 문맥상 명백하게 예외가 있지 않는 한, 복수의 표현을 포함한다.Expressions in the singular number include plural expressions unless the context clearly dictates otherwise.

각 단계들에 있어 식별부호는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 실시될 수 있다.In each step, the identification code is used for convenience of description, and the identification code does not explain the order of each step, and each step may be performed in a different order from the specified order unless a specific order is clearly described in context. have.

이하 첨부된 도면들을 참고하여 본 개시의 작용 원리 및 실시예들에 대해 설명한다.Hereinafter, the working principle and embodiments of the present disclosure will be described with reference to the accompanying drawings.

본 명세서에서 '본 개시에 따른 인공지능 기반의 판례 인식의 정확도 향상 장치'는 연산처리를 수행하여 사용자에게 결과를 제공할 수 있는 다양한 장치들이 모두 포함된다. 예를 들어, 본 개시에 따른 인공지능 기반의 판례 인식의 정확도 향상 장치는, 컴퓨터, 서버 장치 및 휴대용 단말기를 모두 포함하거나, 또는 어느 하나의 형태가 될 수 있다.In the present specification, the 'apparatus for improving the accuracy of recognizing artificial intelligence-based precedents according to the present disclosure' includes various devices capable of providing results to users by performing calculation processing. For example, the apparatus for improving the accuracy of case law recognition based on artificial intelligence according to the present disclosure may include a computer, a server device, and a portable terminal, or may be in any one form.

여기에서, 상기 컴퓨터는 예를 들어, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop), 태블릿 PC, 슬레이트 PC 등을 포함할 수 있다.Here, the computer may include, for example, a laptop computer, a desktop computer, a laptop computer, a tablet PC, a slate PC, and the like equipped with a web browser.

상기 서버 장치는 외부 장치와 통신을 수행하여 정보를 처리하는 서버로써, 애플리케이션 서버, 컴퓨팅 서버, 데이터베이스 서버, 파일 서버, 게임 서버, 메일 서버, 프록시 서버 및 웹 서버 등을 포함할 수 있다.The server device is a server that processes information by communicating with an external device, and may include an application server, a computing server, a database server, a file server, a game server, a mail server, a proxy server, and a web server.

상기 휴대용 단말기는 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), WiBro(Wireless Broadband Internet) 단말, 스마트 폰(Smart Phone) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치와 시계, 반지, 팔찌, 발찌, 목걸이, 안경, 콘택트 렌즈, 또는 머리 착용형 장치(head-mounted-device(HMD) 등과 같은 웨어러블 장치를 포함할 수 있다.The portable terminal is, for example, a wireless communication device that ensures portability and mobility, and includes a Personal Communication System (PCS), a Global System for Mobile communications (GSM), a Personal Digital Cellular (PDC), a Personal Handyphone System (PHS), and a PDA. (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), WiBro (Wireless Broadband Internet) terminal, smart phone ) and wearable devices such as watches, rings, bracelets, anklets, necklaces, glasses, contact lenses, or head-mounted-devices (HMDs). can include

도 1은 본 개시에 따른 인공지능 기반의 판례 인식의 정확도 향상 장치의 블록도이다.1 is a block diagram of an apparatus for improving the accuracy of case precedent recognition based on artificial intelligence according to the present disclosure.

도 2는 본 개시에 따른 인공지능 기반의 판례 인식의 정확도 향상 방법의 순서도이다.2 is a flowchart of a method for improving the accuracy of case precedent recognition based on artificial intelligence according to the present disclosure.

도 3은 도 2의 단계 S120의 구체적인 방법의 순서도이다.3 is a flowchart of a specific method of step S120 of FIG. 2 .

도 4는 본 개시에 따른 제1 오류를 설명하기 위한 도면이다.4 is a diagram for explaining a first error according to the present disclosure.

도 5는 본 개시에 따른 오류 교정을 설명하기 위한 도면이다.5 is a diagram for explaining error correction according to the present disclosure.

도 6은 도 2의 각 단계의 수행에 이용되는 인공지능 알고리즘을 설명하기 위한 도면이다.6 is a diagram for explaining an artificial intelligence algorithm used for performing each step of FIG. 2 .

본 개시의 인공지능 기반의 판례 인식의 정확도 향상 장치(100)(이하, 정확도 향상 장치)는 이미지 형태의 판례 데이터를 획득하면, 이미지를 텍스트로 변환하는 OCR을 수행하고, OCR 데이터가 포함할 수 있는 오류들을 추출 및 교정하여 판례 인식의 정확도를 향상시킬 수 있다.The artificial intelligence-based case precedent recognition accuracy improvement device 100 (hereinafter referred to as the accuracy improvement device) of the present disclosure performs OCR to convert the image into text when obtaining case precedent data in the form of an image, and the OCR data may include It is possible to improve the accuracy of precedent recognition by extracting and correcting existing errors.

도 1을 참조하면, 정확도 향상 장치(100)는 통신부(110), 메모리(120) 및 프로세서(130)를 포함할 수 있다.Referring to FIG. 1 , an accuracy improving device 100 may include a communication unit 110 , a memory 120 and a processor 130 .

상기 통신부(110)는 외부 장치(미도시)와 통신을 가능하게 하는 하나 이상의 구성 요소를 포함할 수 있으며, 예를 들어, 유선통신 모듈, 무선통신 모듈, 근거리 통신 모듈, 위치정보 모듈 중 적어도 하나를 포함할 수 있다. 여기서, 외부 장치(미도시)는 사용자의 단말기 및 서버 장치일 수 있지만, 이에 제한되는 것은 아니다.The communication unit 110 may include one or more components enabling communication with an external device (not shown), for example, at least one of a wired communication module, a wireless communication module, a short-distance communication module, and a location information module. can include Here, the external device (not shown) may be a user's terminal and a server device, but is not limited thereto.

상기 메모리(120)는 정확도 향상 장치(100)의 다양한 기능을 지원하는 데이터와, 프로세서(130)의 동작을 위한 프로그램을 저장할 수 있고, 입/출력되는 데이터들(예를 들어, 음악 파일, 정지영상, 동영상 등)을 저장할 있고, 본 장치에서 구동되는 다수의 응용 프로그램(application program 또는 애플리케이션(application)), 정확도 향상 장치(100)의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 이러한 응용 프로그램 중 적어도 일부는, 무선 통신을 통해 외부 서버로부터 다운로드될 수 있다. The memory 120 may store data supporting various functions of the accuracy improving device 100 and a program for the operation of the processor 130, and input/output data (eg, music files, stop images, videos, etc.), and can store a plurality of application programs (applications) running in the device, data for the operation of the accuracy improving device 100, and commands. At least some of these application programs may be downloaded from an external server through wireless communication.

보다 구체적으로, 메모리(120)는 인공지능 기반의 판례 인식의 정확도 향상을 위한 적어도 하나의 프로세스를 저장하여, 프로세서(130)가 상기 프로세스를 기반으로 본 개시의 인공지능 기반의 판례 인식의 정확도 향상 방법을 수행할 수 있도록 할 수 있다.More specifically, the memory 120 stores at least one process for improving the accuracy of AI-based case precedent recognition, so that the processor 130 improves the accuracy of AI-based precedent recognition of the present disclosure based on the process. method can be done.

이러한, 메모리(120)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), SSD 타입(Solid State Disk type), SDD 타입(Silicon Disk Drive type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(random access memory; RAM), SRAM(static random access memory), 롬(read-only memory; ROM), EEPROM(electrically erasable programmable read-only memory), PROM(programmable read-only memory), 자기 메모리, 자기 디스크 및 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. 또한, 메모리(120)는 본 장치(100)와는 분리되어 있으나, 유선 또는 무선으로 연결된 데이터베이스가 될 수도 있다.The memory 120 may be a flash memory type, a hard disk type, a solid state disk type, a silicon disk drive type, or a multimedia card micro type. micro type), card type memory (eg SD or XD memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable EEPROM (EEPROM) It may include a storage medium of at least one type of a programmable read-only memory (PROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, and an optical disk. In addition, the memory 120 is separated from the apparatus 100, but may be a database connected by wire or wirelessly.

상기 프로세서(130)는 상기 응용 프로그램과 관련된 동작 외에도, 통상적으로 정확도 향상 장치(100)의 전반적인 동작을 제어할 수 있다. 프로세서(130)는 위에서 살펴본 구성요소들을 통해 입력 또는 출력되는 신호, 데이터, 정보 등을 처리하거나 메모리(120)에 저장된 응용 프로그램을 구동함으로써, 외부 장치(미도시)에 적절한 정보 또는 기능을 제공 또는 처리할 수 있다.In addition to operations related to the application program, the processor 130 may generally control overall operations of the accuracy improving device 100 . The processor 130 provides appropriate information or functions to an external device (not shown) by processing signals, data, information, etc. input or output through the components described above or by driving an application program stored in the memory 120, or can be dealt with

또한, 프로세서(130)는 이하의 도 2 내지 도 6에서 설명되는 본 개시에 따른 다양한 실시예들을 정확도 향상 장치(100) 상에서 구현하기 위하여, 위에서 살펴본 구성요소들을 중 어느 하나 또는 복수를 조합하여 제어할 수 있다. In addition, the processor 130 controls any one or a plurality of the components described above in order to implement various embodiments according to the present disclosure described in FIGS. 2 to 6 below on the accuracy improving device 100. can do.

구체적으로, 프로세서(130)는 판례 내용이 포함된 원본 데이터를 가공하여 복수의 OCR(Optical Character Recognition) 데이터를 생성하고, 상기 복수의 OCR 데이터를 이용하여 상기 판례 내용의 오류를 추출하고, 상기 추출된 오류를 교정할 수 있다. 이때, 프로세서(130)는, 상기 오류 추출 시에, 어순(word order)을 기준으로 상기 복수의 OCR 데이터를 분석하여 제1 오류를 추출하고, 데이터베이스에 기 저장된 판례 관련 표현을 기준으로 상기 복수의 OCR 데이터를 분석하여 제2 오류를 추출할 수 있다.Specifically, the processor 130 generates a plurality of OCR (Optical Character Recognition) data by processing the original data including the precedent contents, extracts an error in the precedent contents using the plurality of OCR data, and extracts an error in the precedent contents. errors can be corrected. At this time, when extracting the error, the processor 130 extracts a first error by analyzing the plurality of OCR data based on word order, and then extracts the plurality of OCR data based on a precedent-related expression pre-stored in the database. A second error may be extracted by analyzing the OCR data.

도 1에 도시된 구성 요소들의 성능에 대응하여 적어도 하나의 구성요소가 추가되거나 삭제될 수 있다. 또한, 구성 요소들의 상호 위치는 시스템의 성능 또는 구조에 대응하여 변경될 수 있다는 것은 당해 기술 분야에서 통상의 지식을 가진 자에게 용이하게 이해될 것이다.At least one component may be added or deleted corresponding to the performance of the components shown in FIG. 1 . In addition, it will be easily understood by those skilled in the art that the mutual positions of the components may be changed corresponding to the performance or structure of the system.

한편, 도 1에서 도시된 각각의 구성요소는 소프트웨어 및/또는 Field Programmable Gate Array(FPGA) 및 주문형 반도체(ASIC, Application Specific Integrated Circuit)와 같은 하드웨어 구성요소를 의미한다.Meanwhile, each component shown in FIG. 1 means software and/or hardware components such as a Field Programmable Gate Array (FPGA) and an Application Specific Integrated Circuit (ASIC).

이하에서는 도 2 내지 도 6을 참조하여, 본 개시의 인공지능 기반의 판례 인식의 정확도 향상 방법에 대해 상세히 설명하도록 한다. 본 개시에 따른 방법의 각각의 동작은 정확도 향상 장치(100)에 의해 수행되는 것으로 이해될 수 있지만, 이에 제한되는 것은 아니다.Hereinafter, with reference to FIGS. 2 to 6, a method for improving the accuracy of case precedent recognition based on artificial intelligence according to the present disclosure will be described in detail. Each operation of the method according to the present disclosure may be understood to be performed by the accuracy improving device 100, but is not limited thereto.

도 2를 참조하면, 정확도 향상 장치(100)의 프로세서(130)는 판례 내용이 포함된 원본 데이터를 가공하여 복수의 OCR(Optical Character Recognition) 데이터를 생성할 수 있다(S110).Referring to FIG. 2 , the processor 130 of the accuracy improving device 100 may generate a plurality of OCR (Optical Character Recognition) data by processing original data including the case law contents (S110).

여기서, 원본 데이터는 통신부(110)를 통해 외부 장치(미도시)로부터 수신될 수 있다. 또한, 원본 데이터는 판례 내용이 포함된 판례 데이터로서, 이미지 형태의 데이터일 수 있다. Here, the original data may be received from an external device (not shown) through the communication unit 110 . In addition, the original data is precedent data including precedent contents, and may be data in the form of an image.

판례 데이터를 텍스트 형태로 필요로 하는 사용자가 이미지 형태의 원본 데이터를 정확도 향상 장치(100)에 제공하면, 정확도 향상 장치(100)는 원본 데이터에 대한 OCR 인식을 수행하여 사용자에게 판례를 텍스트 형태로 제공할 수 있다. 이때, 정확도 향상 장치(100)는 OCR 인식의 정확도를 향상시키기 위해 OCR을 복수회 수행하며, OCR 수행 결과인 복수의 OCR 데이터에 대한 비교 분석을 통해 보다 정확한 판례 내용을 사용자에게 제공할 수 있다.When a user who needs precedent data in text form provides the original data in the form of an image to the accuracy improving device 100, the accuracy improving device 100 performs OCR recognition on the original data to provide the user with the precedent in text form. can provide At this time, the accuracy improving apparatus 100 may perform OCR multiple times to improve the accuracy of OCR recognition, and provide more accurate content of a precedent to the user through comparative analysis of a plurality of OCR data as a result of OCR performance.

구체적으로, 프로세서(130)는 상기 원본 데이터에 대해 OCR을 기 설정된 횟수만큼 반복하여 상기 횟수에 상응하는 개수의 OCR 데이터를 생성할 수 있다.Specifically, the processor 130 may repeat OCR on the original data a predetermined number of times to generate OCR data corresponding to the number of times.

예를 들어, 기 설정된 횟수가 3회라면, 프로세서(130)는 동일한 원본 데이터에 대해 OCR을 3회 수행하여, 제1 OCR 데이터, 제2 OCR 데이터 및 제3 OCR 데이터를 생성할 수 있다.For example, if the preset number of times is three, the processor 130 may perform OCR on the same original data three times to generate first OCR data, second OCR data, and third OCR data.

이때, 복수의 OCR 데이터 각각은, 상기 횟수만큼 반복 수행된 OCR의 각각의 정확도에 따라 동일한 위치의 문자가 상이하게 표현될 수 있다.In this case, in each of the plurality of OCR data, characters at the same location may be differently expressed according to the accuracy of each OCR performed repeatedly the number of times.

여기서, 정확도는, 상기 원본 데이터에 대한 각 OCR 수행의 품질 점수를 수치화한 값일 수 있다. 즉, 동일한 데이터에 대해 OCR을 반복 수행했을 때, 각각의 OCR 데이터의 정확도가 상이할 수 있고, 이로 인해, 각 OCR 데이터에 포함된 텍스트, 즉 인식된 텍스트가 상이할 수 있다. 예를 들어, 1차 OCR 수행 시에 특정 위치의 문자가 “이”로 표현되고, 2차 OCR 수행 시에 동일 위치의 문자가 “0ㅣ”로 표현될 수 있다.Here, the accuracy may be a value obtained by quantifying quality scores of each OCR performed on the original data. That is, when OCR is repeatedly performed on the same data, the accuracy of each OCR data may be different, and thus, the text included in each OCR data, that is, the recognized text may be different. For example, when performing the first OCR, a character at a specific position may be expressed as “this”, and when performing the second OCR, a character at the same position may be expressed as “0ㅣ”.

실시예에 따라, 상기 정확도는, 상기 원본 데이터 내에서 문자로 인식된 부분과 문자가 아닌 이미지로 인식된 부분의 비율에 기초하여 산출될 수 있다. 그러나, 이에 제한되지 않고, 다양한 변수에 기초하여 상기 각 OCR 데이터의 정확도가 산출될 수 있다. 문자가 아닌 이미지로 인식된 부분의 비율이 더 클수록 해당 OCR 데이터의 정확도는 낮을 수 있다.Depending on the embodiment, the accuracy may be calculated based on a ratio of a part recognized as a text and a part recognized as an image in the original data. However, the accuracy of each OCR data may be calculated based on various variables without being limited thereto. The higher the ratio of the part recognized as an image rather than text, the lower the accuracy of the corresponding OCR data.

실시예에 따라, 복수의 OCR 데이터 중에서 산출된 정확도가 기 설정된 값 이상인 OCR 데이터만을 분석 대상 데이터로 활용할 수 있다. 즉, 정확도가 너무 떨어지는 데이터는 버리고 정확도가 어느 기준 이상되는 데이터만을 오류 추출을 위한 분석 데이터로 활용할 수 있다.Depending on the embodiment, only OCR data whose accuracy calculated from among a plurality of OCR data is equal to or greater than a preset value may be used as analysis target data. That is, data with too low accuracy may be discarded, and only data with accuracy higher than a certain standard may be used as analysis data for error extraction.

그런 다음, 정확도 향상 장치(100)의 프로세서(130)는 상기 복수의 OCR 데이터를 이용하여 상기 판례 내용의 오류를 추출할 수 있다(S120).Then, the processor 130 of the accuracy improving device 100 may extract an error in the content of the precedent using the plurality of OCR data (S120).

도 3을 참조하면, 프로세서(130)는 검증대상 글자의 양방향 문맥(context)을 기준으로 상기 복수의 OCR 데이터를 분석하여 제1 오류를 추출할 수 있다(S121).Referring to FIG. 3 , the processor 130 may extract a first error by analyzing the plurality of OCR data based on the bi-directional context of the character to be verified (S121).

여기서, 제1 오류는 OCR 인식 과정에서 발생한 에러를 의미할 수 있다.Here, the first error may mean an error generated in the OCR recognition process.

상기 프로세서(130)는 상기 검증대상 글자의 양방향 문맥을 기준으로 상기 복수의 OCR 데이터 각각을 비교하고, 상기 비교된 결과에 기초하여 상기 복수의 OCR 데이터 간의 차이가 존재하는 어순을 제1 오류로 추출할 수 있다.The processor 130 compares each of the plurality of OCR data based on the bidirectional context of the character to be verified, and extracts a word order in which a difference between the plurality of OCR data exists as a first error based on the compared result. can do.

도 4를 참조하면, 원본 데이터 내에 포함된 판례 내용 중 "피고인을 징역 6개월에 처한다. 이 판례 확정일로부터 1년간…” 이라는 문장에 대해 2회의 OCR을 수행한 경우, 첫번째 OCR의 결과인 제1 OCR 데이터는 “피고인을 징역역 6개월에 처한다. 01 판례 확정이로부터 년간…”로 생성되고, 두번째 OCR의 결과인 제2 OCR 데이터는 “피고인을 징역 6개월에 치한다. 이 판례 확정일로부터 1년간…”로 생성될 수 있다. 프로세서(130)는 제1 및 제2 OCR 데이터를 이용하여 검증대상 글자에 대해서 해당 글자의 좌우에 위치한 모든 글자들을 분석해 확률값을 계산함으로써 제1 오류를 판단할 수 있다. 예를 들어, 프로세서(130)는 도 4에 도시된 OCR 데이터에서 검증대상 글자가 “6”인 경우, 제1 및 제2 OCR 데이터에서 해당 글자의 좌우에 있는 모든 글자들을 분석하여 확률값을 계산할 수 있다. 프로세서(130)는 이렇게 양방향 문맥에 따라 계산된 확률값이 기 설정된 값보다 낮은 경우 해당 검증대상 글자에 대해 제1 오류가 존재하는 것으로 판단할 수 있다.Referring to FIG. 4, when two OCRs are performed on the sentence "The accused is sentenced to 6 months in prison. One year from the date of confirmation of this precedent..." among the contents of the precedent included in the original data, the first OCR result, the first OCR, is performed. The OCR data is created as “The accused is sentenced to 6 months in prison. 01 years from the final decision of the precedent…”, and the second OCR data, which is the result of the second OCR, is “The accused is sentenced to 6 months in prison. The processor 130 may determine the first error by analyzing all the letters located on the left and right of the letter to be verified using the first and second OCR data and calculating a probability value. For example, when the character to be verified is “6” in the OCR data shown in FIG. If the probability value calculated according to the bi-directional context is lower than a preset value, the processor 130 may determine that a first error exists with respect to the corresponding character to be verified.

다시 도 3을 참조하면, 프로세서(130)는 데이터베이스에 기 저장된 판례 관련 표현을 기준으로 상기 복수의 OCR 데이터를 분석하여 제2 오류를 추출할 수 있다(S122).Referring back to FIG. 3 , the processor 130 may extract a second error by analyzing the plurality of OCR data based on a precedent-related expression pre-stored in the database (S122).

여기서, 제2 오류는 판례 원문 내에 이미 포함되어 있던 오기를 의미할 수 있다. 즉, 본 개시의 정확도 향상 장치(100)는 OCR 인식 시에 발생한 에러뿐만 아니라, OCR 인식은 제대로 됐지만 애초에 원문에서 오기로 기재되어 있던 문자까지 추출함으로써 사용자에게 보다 정확한 결과물을 제공할 수 있다.Here, the second error may refer to a misspelling already included in the text of the precedent. That is, the accuracy improving apparatus 100 of the present disclosure can provide a more accurate result to the user by extracting not only errors generated during OCR recognition, but also characters that were originally written as misspelled from the original text even though the OCR recognition was successful.

상기 프로세서(130)는 상기 복수의 OCR 데이터 각각을 상기 판례 관련 표현과 비교하고, 상기 비교된 결과에 기초하여 음소 단위의 오기를 제2 오류로 추출할 수 있다.The processor 130 may compare each of the plurality of OCR data with the precedent-related expression, and extract a phoneme-unit misspelling as a second error based on the comparison result.

여기서, 상기 데이터베이스에는 판례에서 자주 쓰이는 용어에 대해 올바른 표현 및 틀린 표현(자주 쓰이는 오기)이 매칭되어 저장될 수 있다. Here, correct expressions and incorrect expressions (frequently used misspellings) for terms frequently used in court cases may be matched and stored in the database.

실시예에 따라, 프로세서(130)는 상기 데이터베이스를 이용하여 복수의 OCR 데이터 각각을 분석하여, 상기 데이터베이스에 틀린 표현으로 저장된 문자가 OCR 데이터 내에 존재하는지를 판단할 수 있다. 그리고, 프로세서(130)는 틀린 표현으로 판단된 문자를 제2 오류로 추출할 수 있다.Depending on the embodiment, the processor 130 may analyze each of the plurality of OCR data using the database to determine whether a character stored in the database as an incorrect representation exists in the OCR data. Also, the processor 130 may extract the character determined to be an incorrect expression as a second error.

실시예에 따라, 데이터베이스에 특정 단어에 대해 올바른 표현 및 틀린 표현이 매칭되어 저장되는데, 이때 해당 단어의 음절 및 음소 위치에 따라 오기 유형을 분류하여 틀린 표현이 저장될 수 있다. Depending on the embodiment, a correct expression and an incorrect expression for a specific word are matched and stored in the database. At this time, the incorrect expression may be stored by classifying the misspelling type according to the position of the syllable and phoneme of the corresponding word.

구체적으로, 두글자의 특정 단어 중 첫번째 음절의 첫번째 음소가 틀린 표현은 해당 단어의 제1 오기로 저장되고, 첫번째 음절의 두번째 음소가 틀린 표현은 해당 단어의 제2 오기로 저장되고, 첫번째 음절의 세번째 음소가 틀린 표현은 해당 단어의 제3 오기로 저장되고, 두번째 음절의 첫번째 음소가 틀린 표현은 해당 단어의 제4 오기로 저장되고, 두번째 음절의 두번째 음소가 틀린 표현은 해당 단어의 제5 오기로 저장되고, 두번째 음절의 세번째 음소가 틀린 표현은 해당 단어의 제6 오기로 저장되 수 있다. 그러나 이에 제한되지 않고, 각 단어의 음절 개수 및 각 음절의 음소 개수에 따라 각 단어의 오기 유형이 분류될 수 있다.Specifically, among two specific words, an expression in which the first phoneme of the first syllable is incorrect is stored as the first misspelling of the word, an expression in which the second phoneme of the first syllable is incorrect is stored as the second misspelling of the word, and The expression with the third phoneme incorrect is stored as the third misspelling of the word, the expression with the incorrect first phoneme of the second syllable is stored as the fourth misspelling of the word, and the expression with the incorrect second phoneme of the second syllable is the fifth misspelling of the word. , and an expression in which the third phoneme of the second syllable is incorrect may be stored as the sixth misspelling of the corresponding word. However, the misspelling type of each word may be classified according to the number of syllables of each word and the number of phonemes of each syllable, without being limited thereto.

예를 들어, 데이터베이스에는 "판결"이 올바른 표현으로 저장되고, 그에 매칭되는 틀린 표현으로서, “탄결, 찬결”이 제1 오기(첫번째 음절의 첫번째 음소가 틀린 표현)로 저장되고, “펀결, 핀결”이 제2 오기(첫번째 음절의 두번째 음소가 틀린 표현)로 저장되고, “팡결, 팜결”이 제3 오기(첫번째 음절의 세번째 음소가 틀린 표현)로 저장되고, “판렬, 판뎔”이 제4 오기(두번째 음절의 첫번째 음소가 틀린 표현)로 저장되고, “판걸, 판걀”이 제5 오기(두번째 음절의 두번째 음소가 틀린 표현)로 저장되고, “판경, 판??”이 제6 오기(두번째 음절의 세번째 음소가 틀린 표현)로 저장될 수 있다.For example, in the database, "judgment" is stored as a correct expression, and as an incorrect expression matching it, "decision, approval" is stored as the first misspelling (an expression in which the first phoneme of the first syllable is incorrect), and "pun-gyeol, pin-gyeol" are stored in the database. ”is stored as the 2nd misspelling (incorrect expression of the second phoneme of the first syllable), “Panggyeol, Pamgyeol” is stored as the 3rd misspelling (incorrect expression of the 3rd phoneme of the first syllable), and “pannyeol, pandyeon” is the 4th misspelling. It is stored as an incorrect expression (the first phoneme of the second syllable is incorrect), “Pangeol, Pangyal” is stored as the fifth incorrect expression (the second phoneme of the second syllable is incorrectly expressed), and “Pangyeong, Pan??” is the sixth incorrect expression ( The third phoneme of the second syllable may be stored as an incorrect expression).

다시 도 2를 참조하면, 정확도 향상 장치(100)의 프로세서(130)는 상기 추출된 오류를 교정할 수 있다(S130).Referring back to FIG. 2 , the processor 130 of the accuracy improving device 100 may correct the extracted error (S130).

프로세서(130)는 상기 제1 오류 및 상기 제2 오류에 대해 교정 방향을 설정하고, 상기 제1 오류 및 상기 제2 오류에 상기 설정된 교정 방향을 적용할 수 있다.The processor 130 may set correction directions for the first error and the second error, and apply the set correction direction to the first error and the second error.

실시예에 따라, 상기 교정 방향은, 제1 유형, 제2 유형, 제3 유형, 제4 유형 및 제5 유형을 포함할 수 있다.Depending on the embodiment, the correction direction may include a first type, a second type, a third type, a fourth type, and a fifth type.

제1 유형(DELETE)은 잘못된 문자를 삭제하는 교정일 수 있다. The first type (DELETE) may be correction of deleting an erroneous character.

제2 유형(INSERT)은 누락된 문자를 삽입하는 교정일 수 있다.The second type (INSERT) may be a correction that inserts missing characters.

제3 유형(SELECT)은 복수의 문자 중 하나를 선택하는 교정일 수 있다.The third type (SELECT) may be correction for selecting one of a plurality of characters.

제4 유형(GENERATE(replace))은 잘못된 문자를 삭제하고 삭제된 자리에 새로운 문자를 삽입하는 교정일 수 있다.A fourth type (GENERATE (replace)) may be a correction in which an erroneous character is deleted and a new character is inserted in the deleted position.

제5 유형(NO EDIT(remain))은 복수의 OCR 데이터들이 서로 동일한 경우 교정이 필요 없는 어순으로 판단하고, 인식 데이터(OCR 데이터)를 변경하는 교정 작업을 수행하지 않고 기존 데이터를 유지하는 것일 수 있다.In the fifth type (NO EDIT (remain)), if a plurality of OCR data are identical to each other, the word order that does not require correction is determined, and the existing data is maintained without performing a correction operation to change the recognition data (OCR data). have.

도 5는 상기 도 4에 도시된 제1 OCR 데이터 및 제2 OCR 데이터에서 추출된 오류에 대한 교정 방법을 도시하였다.FIG. 5 illustrates a correction method for errors extracted from the first OCR data and the second OCR data shown in FIG. 4 .

이때, 오류 교정은 음절 별로 수행될 수 있다. 예를 들어, 도 4에 도시된 바와 같이, "피고인을 징역 6개월에 처한다. 이 판례 확정일로부터 1년간…”에 대해서 각 음절 별로, 즉 “피”, “고”, “인”,…,“1”, “년”, “간” 각각에 대해 교정을 수행할 수 있다.In this case, error correction may be performed for each syllable. For example, as shown in FIG. 4, for each syllable, that is, “Phi”, “Go”, “In”,…, Calibration can be performed for each of “1”, “Year” and “Interval”.

도 5를 참조하면, 제1 및 제2 OCR 데이터 간의 차이가 없는 “피”, “고”에 대해서는 제5 유형(NO EDIT(remain))의 교정을 수행할 수 있다. Referring to FIG. 5 , a fifth type (NO EDIT (remain)) correction may be performed for “blood” and “high” where there is no difference between the first and second OCR data.

제1 OCR 데이터에서 동일한 음절이 중복된 “역”에 대해서는 제1 유형(DELETE)의 교정을 수행할 수 있다. In the first OCR data, the first type (DELETE) correction may be performed for the “reverse” in which the same syllable is duplicated.

제1 및 제2 OCR 데이터 간 차이가 있는 부분인 “처” 및 “치”에 대해서는 둘 중에 올바른 것인 “처”를 선택(select 1)하는 제3 유형(SELECT)의 교정을 수행할 수 있다.For the difference between the first and second OCR data, “wife” and “chi”, a third type (SELECT) correction of selecting (select 1) the correct “wife” out of the two can be performed. .

제1 및 제2 OCR 데이터 간 차이가 있는 부분인 “0” 및 “이”에 대해서는 둘 중에 올바른 것인 “이”를 선택(select 2)하는 제3 유형(SELECT)의 교정을 수행할 수 있다.For “0” and “this”, which are the parts where there is a difference between the first and second OCR data, a third type (SELECT) of correcting “this” among the two can be selected (select 2). .

제1 OCR 데이터에서 필요 없는 표현인 “1”에 대해서는 제1 유형(DELETE)의 교정을 수행할 수 있다. The first type (DELETE) correction may be performed on “1”, which is an unnecessary expression in the first OCR data.

제1 및 제2 OCR 데이터 간 차이가 있는 부분인 “이” 및 “일”에 대해서는 둘 중에 올바른 것인 “일”를 선택(select 2)하는 제3 유형(SELECT)의 교정을 수행할 수 있다.A third type (SELECT) correction of selecting (select 2) the correct “one” out of the two for “this” and “one” that is a difference between the first and second OCR data may be performed. .

제1 및 제2 OCR 데이터 간 차이가 있는 부분인 “ ”(공백) 및 “1”에 대해서는 둘 중에 올바른 것인 “1”를 선택(select 2)하는 제3 유형(SELECT)의 교정을 수행할 수 있다.For “ ” (blank) and “1”, which are the differences between the first and second OCR data, a third type (SELECT) correction of selecting (select 2) the correct “1” out of the two can be performed. can

이하에서는 도 6을 참조하여, 본 개시의 인공지능 기반의 판례 인식의 정확도 향상 방법의 각 단계(S110 내지 S130)를 수행할 때 이용되는 인공지능 알고리즘을 설명하도록 한다.Hereinafter, with reference to FIG. 6, an artificial intelligence algorithm used when performing each step (S110 to S130) of the method for improving the accuracy of case precedent recognition based on artificial intelligence according to the present disclosure will be described.

단계 S110에서, 프로세서(130)는 수신된 판례 원본 데이터를 OCR 복수회 수행하고, 수행 결과로서 생성된 복수의 OCR 데이터에 대해 Sequence Aligner을 수행할 수 있다.In step S110, the processor 130 may perform OCR on the received precedent original data a plurality of times, and perform sequence aligner on a plurality of OCR data generated as a result of the execution.

단계 S120에서, 프로세서(130)는 Sequence Aligner된 데이터에 대해 Embedding, Self-Attention, Feed Forward를 수행할 수 있다. 동시에, 프로세서(130)는 상기 판례 원본 데이터에 대해 Image Embedding을 수행하고, 수행된 결과 데이터와 상기 Self Attention된 데이터를 Cross-Attention하여 Feed Forward를 수행할 수 있다.In step S120, the processor 130 may perform Embedding, Self-Attention, and Feed Forward on the sequence-aligned data. At the same time, the processor 130 may perform Image Embedding on the precedent original data, and perform Feed Forward by cross-Attentioning the resultant data and the Self Attention data.

단계 S130에서, 상기 Sequence Aligner된 데이터에 대해 Embedding, Self-Attention, Feed Forward를 수행한 결과 데이터는 Character Generator를 거쳐 Tag interpreter로 입력되고, 상기 판례 원본 데이터에 대해 Image Embedding을 수행하고, 수행된 결과 데이터와 상기 Self Attention된 데이터를 Cross-Attention하여 Feed Forward를 수행한 결과 데이터는 Character Tagger를 거쳐 Tag interpreter로 입력될 수 있다. 그리고, Tag interpreter 텍스트 데이터를 출력할 수 있다.In step S130, the data obtained by performing Embedding, Self-Attention, and Feed Forward on the sequence-aligned data is input to the tag interpreter through a character generator, image embedding is performed on the original data of the precedent, and the result is performed. Cross-Attention of the data and the Self Attention data to perform Feed Forward, the resulting data can be input to the Tag interpreter through Character Tagger. And, tag interpreter text data can be output.

본 개시의 실시예에서 이용하는 트랜스 포머(Transformer는 기존의 모델들과 다르게 RNN이나 CNN을 사용하지 않고, Attention을 이용할 수 있다.Unlike existing models, the transformer used in the embodiment of the present disclosure may use attention without using RNN or CNN.

본 개시의 실시예에 따른 정확도 향상 장치(100)는 트랜스포머를 이용함으로써, 기존 모델들에 비해 더 많은 병렬화를 가능하게 하며, 학습 시간 또한 현저하게 감소시키는 성능 향상을 보이게 된다The accuracy improving apparatus 100 according to an embodiment of the present disclosure enables more parallelization compared to existing models by using a transformer, and shows performance improvement that significantly reduces learning time.

도 2 및 도 4는 단계 S110 내지 단계 S130을 순차적으로 실행하는 것으로 기재하고 있으나, 이는 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 도 2 및 도 4에 기재된 순서를 변경하여 실행하거나 단계 단계 S110 내지 단계 S130을 병렬적으로 실행하는 것으로 다양하게 수정 및 변형하여 적용 가능할 것이므로, 도 2 및 도 4는 시계열적인 순서로 한정되는 것은 아니다.2 and 4 describe that steps S110 to S130 are sequentially executed, but this is only an example of the technical idea of this embodiment, and those with ordinary knowledge in the art to which this embodiment belongs Since it will be possible to change and execute the order described in FIGS. 2 and 4 without departing from the essential characteristics of the present embodiment or to perform steps S110 to S130 in parallel, it will be possible to apply various modifications and variations, so FIG. 2 and FIG. 4 is not limited to a time-series order.

한편, 상술한 설명에서, 단계 S110 내지 단계 S130은 본 개시의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다.Meanwhile, in the above description, steps S110 to S130 may be further divided into additional steps or combined into fewer steps according to an embodiment of the present disclosure. Also, some steps may be omitted if necessary, and the order of steps may be changed.

한편, 개시된 실시예들은 컴퓨터에 의해 실행 가능한 명령어를 저장하는 기록매체의 형태로 구현될 수 있다. 명령어는 프로그램 코드의 형태로 저장될 수 있으며, 프로세서에 의해 실행되었을 때, 프로그램 모듈을 생성하여 개시된 실시예들의 동작을 수행할 수 있다. 기록매체는 컴퓨터로 읽을 수 있는 기록매체로 구현될 수 있다.Meanwhile, the disclosed embodiments may be implemented in the form of a recording medium storing instructions executable by a computer. Instructions may be stored in the form of program codes, and when executed by a processor, create program modules to perform operations of the disclosed embodiments. The recording medium may be implemented as a computer-readable recording medium.

컴퓨터가 읽을 수 있는 기록매체로는 컴퓨터에 의하여 해독될 수 있는 명령어가 저장된 모든 종류의 기록 매체를 포함한다. 예를 들어, ROM(Read Only Memory), RAM(Random Access Memory), 자기 테이프, 자기 디스크, 플래쉬 메모리, 광 데이터 저장장치 등이 있을 수 있다. Computer-readable recording media include all types of recording media in which instructions that can be decoded by a computer are stored. For example, there may be read only memory (ROM), random access memory (RAM), magnetic tape, magnetic disk, flash memory, optical data storage device, and the like.

이상에서와 같이 첨부된 도면을 참조하여 개시된 실시예들을 설명하였다. 본 개시가 속하는 기술분야에서 통상의 지식을 가진 자는 본 개시의 기술적 사상이나 필수적인 특징을 변경하지 않고도, 개시된 실시예들과 다른 형태로 본 개시가 실시될 수 있음을 이해할 것이다. 개시된 실시예들은 예시적인 것이며, 한정적으로 해석되어서는 안 된다.As above, the disclosed embodiments have been described with reference to the accompanying drawings. Those skilled in the art to which the present disclosure belongs will understand that the present disclosure may be implemented in a form different from the disclosed embodiments without changing the technical spirit or essential features of the present disclosure. The disclosed embodiments are illustrative and should not be construed as limiting.

100: 정확도 향상 장치
110: 통신부
120: 메모리
130: 프로세서100: accuracy improving device
110: communication department
120: memory
130: processor

Claims

In the method performed by the device,
Generating a plurality of Optical Character Recognition (OCR) data by processing the original data including the contents of the precedent;
extracting errors in the content of the precedent using the plurality of OCR data; and
Correcting the extracted error based on a preset correction direction; Including,
The error extraction step,
extracting a first error by analyzing the plurality of OCR data based on the bi-directional context of the character to be verified; and
Extracting a second error by analyzing the plurality of OCR data based on an expression related to a judgment pre-stored in a database; including,
The OCR data generation step,
OCR is repeated for the original data a preset number of times to generate OCR data corresponding to the number of times, but only data with accuracy greater than a preset value among the plurality of OCR data is analyzed for error extraction and correction. used as data
The accuracy is calculated based on a ratio of a part recognized as a character and a part recognized as an image other than text in the original data,
The first error is an error generated when recognizing the OCR,
The second error is an error described in the precedent content itself included in the original data,
The error correction step is
Setting the correction direction for the first error and the second error, and applying the set correction direction to the first error and the second error;
The correction direction for the first error is set based on a result of comparing each of the plurality of OCR data based on the bidirectional context of the character to be verified and extracting a character having a difference between the plurality of OCR data, A first type of deleting an erroneous character at the same position in each of the plurality of OCR data, a second type of inserting a missing character at the same position in each of the plurality of OCR data, and the same position in each of the plurality of OCR data A third type of selecting a correct character from among a plurality of characters for and a fourth type of deleting an incorrect character for the same position in each of the plurality of OCR data and inserting a new character in the deleted position,
A method for improving the accuracy of case precedent recognition based on artificial intelligence.

According to claim 1,
The second error extraction step,
Comparing each of the plurality of OCR data with the expression related to the sentence, and extracting different misspellings for each phoneme unit based on the compared result,
A method for improving the accuracy of case precedent recognition based on artificial intelligence.

According to claim 2,
The misspellings include a first misspelling in which the first phoneme of the first syllable of a two-letter word is incorrect, a second misspelling in which the second phoneme of the first syllable is incorrect, a third misspelling in which the third phoneme of the first syllable is incorrect, and the second syllable of the word. Including a fourth error in which the first phoneme is incorrect, a fifth error in which the second phoneme of the second syllable is incorrect, and a sixth error in which the third phoneme of the second syllable is incorrect.
A method for improving the accuracy of case precedent recognition based on artificial intelligence.

delete

A program stored in a computer readable recording medium to be combined with a computer to execute the method of any one of claims 1, 2 and 3.

communications department;
A memory storing at least one process for improving the accuracy of artificial intelligence-based precedent recognition; and
A processor operating according to the process; includes,
The processor, based on the process,
A plurality of OCR (Optical Character Recognition) data is generated by processing original data including precedent content, errors in the precedent content are extracted using the plurality of OCR data, and based on a preset correction direction, the extracted correct errors,
When the processor extracts the error,
Extracting a first error by analyzing the plurality of OCR data based on the bidirectional context of the character to be verified,
Extracting a second error by analyzing the plurality of OCR data based on an expression related to the judgment pre-stored in the database;
When the processor generates OCR data,
OCR is repeated for the original data a preset number of times to generate OCR data corresponding to the number of times, but only data with accuracy greater than a preset value among the plurality of OCR data is analyzed for error extraction and correction. used as data
The accuracy is calculated based on a ratio of a part recognized as a character and a part recognized as an image other than text in the original data,
The first error is an error generated when recognizing the OCR,
The second error is an error described in the precedent content itself included in the original data,
When correcting errors, the processor
Setting the correction direction for the first error and the second error, and applying the set correction direction to the first error and the second error;
The correction direction for the first error is set based on a result of comparing each of the plurality of OCR data based on the bidirectional context of the character to be verified and extracting a character having a difference between the plurality of OCR data, A first type of deleting an erroneous character at the same position in each of the plurality of OCR data, a second type of inserting a missing character at the same position in each of the plurality of OCR data, and the same position in each of the plurality of OCR data A third type of selecting a correct character from among a plurality of characters for and a fourth type of deleting an incorrect character for the same position in each of the plurality of OCR data and inserting a new character in the deleted position,
AI-based precedent recognition accuracy improvement device.

According to claim 9,
When the processor extracts the second error,
Comparing each of the plurality of OCR data with the expression related to the sentence, and extracting different misspellings for each phoneme unit based on the compared result,
The misspellings include a first misspelling in which the first phoneme of the first syllable of a two-letter word is incorrect, a second misspelling in which the second phoneme of the first syllable is incorrect, a third misspelling in which the third phoneme of the first syllable is incorrect, and a second misspelling in the second syllable of the word. Including a fourth error in which the first phoneme is incorrect, a fifth error in which the second phoneme of the second syllable is incorrect, and a sixth error in which the third phoneme of the second syllable is incorrect.
AI-based precedent recognition accuracy improvement device.