KR102625312B1

KR102625312B1 - Layout-Independent License Plate Detection and Recognition System Based on Attention Method

Info

Publication number: KR102625312B1
Application number: KR1020220084028A
Authority: KR
Inventors: 강동중; 서태문
Original assignee: 주식회사 파시디엘
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2024-01-16

Abstract

본 발명은 딥러닝을 통해 차량 번호판을 포함하는 번호판 인식 시스템에 관한 것으로, 촬영된 번호판 이미지를 입력하는 입력부와; 상기 입력된 번호판의 이미지를 직사각형 바운딩 박스(bounding box)와 네 개 모서리의 꼭짓점 좌표를 활용하여 번호판의 경사, 회전 등의 왜곡을 보정 가능하게 학습하여 검출하는 번호판 검출부와; 상기 번호판 검출부에 검출된 이미지를 분할하여 어텐션(attention) 기반의 알고리즘을 통해 번호판 내부의 문자를 인식하는 문자 인식부;를 포함하는 것을 특징으로 한다.
이에, 본 발명에 따르면, 차량 번호판의 이미지가 눈이 오거나 흐린 날씨에 촬영된 경우, 채도나 명도 등이 흐릿한 경우, 비스듬하게 촬영된 경우를 포함하는 노이즈를 갖는 경우 이러한 노이즈에 강건하고, 차량, 오토바이, 차량의 스티커 등 다양한 종류의 번호판 내지 부착 내용에도 적용이 될 수 있다.The present invention relates to a license plate recognition system including a vehicle license plate through deep learning, comprising: an input unit for inputting a photographed license plate image; a license plate detection unit that learns and detects the input license plate image by using a rectangular bounding box and four corner vertex coordinates to correct distortions such as tilt and rotation of the license plate; Characterized by comprising a character recognition unit that divides the image detected by the license plate detection unit and recognizes characters inside the license plate through an attention-based algorithm.
Accordingly, according to the present invention, when an image of a vehicle license plate has noise, including when it is photographed in snowy or cloudy weather, when saturation or brightness is blurred, or when it is photographed at an angle, it is robust to such noise, and the vehicle, It can be applied to various types of license plates or attachments, such as stickers on motorcycles and vehicles.

Description

Vehicle license plate recognition system independent of license plate layout type based on attention method {Layout-Independent License Plate Detection and Recognition System Based on Attention Method}

본 발명은 어텐션 방식에 기반을 둔 번호판 레이아웃 타입에 독립적인 차량 번호판 인식 시스템에 관한 것으로, 촬영된 번호판의 이미지에 노이즈가 있더라도 이러한 노이즈에 강건하고 다양한 종류의 레이아웃을 갖는 번호판에도 적용이 가능하도록 번호판을 검출하고 인식하는 딥러닝의 학습 구조를 개선한 어텐션 방식에 기반을 둔 번호판 레이아웃 타입에 독립적인 차량 번호판 인식 시스템에 관한 것이다.The present invention relates to a vehicle license plate recognition system independent of the license plate layout type based on the attention method. Even if there is noise in the captured license plate image, the present invention relates to a license plate recognition system that is robust to such noise and can be applied to license plates with various types of layouts. This is about a vehicle license plate recognition system independent of license plate layout type based on an attention method that improves the learning structure of deep learning to detect and recognize.

자동차 번호판 인식 시스템은 차량 관리, 디지털 감시 시스템, 지능형 교통시스템 등과 같은 광범위한 분야에서 사용된다. License plate recognition systems are used in a wide range of fields such as fleet management, digital surveillance systems, and intelligent transportation systems.

종래의 CCTV 영상분석을 통한 자동차 번호판 인식 시스템은 3 단계의 절차로 구성된다. (1) 차량 위치 검출. (2) 문자 영역 검출 (3) 문자 인식 이다. 최근, 딥러닝 기반 기술 발달로 인해 컨볼루션(convolution) 연산이 포함된 심층 신경망을 활용하여 높은 인식 성능을 보여주고 있다. The car license plate recognition system through conventional CCTV video analysis consists of a three-step process. (1) Vehicle location detection. (2) Character area detection (3) Character recognition. Recently, due to the development of deep learning-based technology, high recognition performance has been shown by utilizing deep neural networks that include convolution operations.

하지만 이러한 발전에도 불구하고 여전히 다양한 환경에서 얻어지는 이미지 데이터에서는 낮은 인식률은 보이거나 한 가지 레이아웃에 대해서만 시스템이 동작하는 등의 단점이 있다. 여기서 말하는 레이아웃은 번호판의 형태, 문자 배열, 번호판 색 등을 의미한다. However, despite these developments, there are still disadvantages such as low recognition rates in image data obtained in various environments or the system operating only for one layout. The layout referred to here refers to the shape of the license plate, character arrangement, and license plate color.

이에 따라 번호판 이미지 데이터가 얻어지는 해상도, 배경, 위치, 조명, 회전, 왜곡과 같은 다양한 환경에 대응할 수 있으며 다양한 번호판에 대해서 레이아웃을 독립적으로 적용 할 수 있는 인식 시스템에 대한 요구가 증가되고 있다. Accordingly, there is an increasing demand for a recognition system that can respond to various environments such as resolution, background, location, lighting, rotation, and distortion in which license plate image data is obtained, and can independently apply layouts to various license plates.

[관련 기술 문헌][Related technical literature]

등록특허공보 제10-2272279호 (2021.07.02. 공고)Registered Patent Publication No. 10-2272279 (announced on July 2, 2021)

공개특허공보 제10-2021-0080291호 (2021.08.30. 공개)Public Patent Publication No. 10-2021-0080291 (published on August 30, 2021)

본 발명의 목적은, 차량 번호판의 이미지가 눈이 오거나 흐린 날씨에 촬영된 경우, 채도나 명도 등이 흐릿한 경우, 비스듬하게 촬영된 경우를 포함하는 노이즈를 갖는 경우 이러한 노이즈에 강건한 어텐션 방식에 기반을 둔 번호판 레이아웃 타입에 독립적인 차량 번호판 인식 시스템을 제공하는 것이다.The purpose of the present invention is to use an attention method that is robust to noise when an image of a vehicle license plate is photographed in snowy or cloudy weather, when saturation or brightness is blurred, or when photographed at an angle. The goal is to provide a vehicle license plate recognition system that is independent of the license plate layout type.

또한, 본 발명의 다른 목적은, 다양한 종류의 번호판에도 적용이 될 수 있는 어텐션 방식에 기반을 둔 번호판 레이아웃 타입에 독립적인 차량 번호판 인식 시스템을 제공하는 것이다.Additionally, another object of the present invention is to provide a vehicle license plate recognition system independent of license plate layout type based on an attention method that can be applied to various types of license plates.

또한, 본 발명의 또 다른 목적은, 번호판의 검출 및 인식 성능을 증대시킬 수 있는 어텐션 방식에 기반을 둔 번호판 레이아웃 타입에 독립적인 차량 번호판 인식 시스템을 제공하는 것이다.Additionally, another object of the present invention is to provide a vehicle license plate recognition system independent of license plate layout type based on an attention method that can increase license plate detection and recognition performance.

본 발명의 목적은, 딥러닝을 통해 차량 번호판을 포함하는 번호판 인식 시스템에 있어서, 촬영된 번호판 이미지를 입력하는 입력부와; 상기 입력된 번호판의 이미지를 직사각형 바운딩 박스(bounding box)와 네 개 모서리의 꼭짓점 좌표를 활용하여 번호판의 경사, 회전 등의 왜곡을 보정 가능하게 학습하여 번호판 위치를 검출하는 번호판 검출부와; 상기 번호판 검출부에 검출된 이미지를 분할하여 어텐션(attention) 기반의 알고리즘을 통해 번호판 내부의 문자를 인식하는 문자 인식부;를 포함하되, 상기 문자 인식부는, 잔여 블록 (Residual Block)을 포함하는 인식 특징추출모듈과, 상기 인식 특징추출모듈에서 입력된 문자의 잔여 디폼어블 블록 (ResDformable Block)을 갖는 디폼어블 어텐션 단계를 (deformable attention stage) 포함하는 인식 헤드모듈을 포함하며, 상기 인식 특징추출모듈은 컨벌루션 연산, 배치 정규화(BN), 최대 풀링, 디폼어블 주의 단계 및 잔여 블록 (Residual Block)을 설정된 인자만큼 배치하여 행렬을 포함하는 숫자로 이루어진 특징맵과 숫자를 통해 각종 연산으로 나타내어지는 신경망의 인자들이 서로 연산되고, 상기 인식 헤드모듈은 상기 인식 특징추출모듈에서 입력된 상기 인식 헤드모듈은 상기 인식 특징추출모듈에서 입력된 상기 특징맵의 정보를 쪼개서 서로간의 관계성을 파악하는 릴레이션 어텐션 (Relation Attention), 평행 어텐션 (Parallel Attention), 문자 디코딩 과정을 포함하는 것을 특징으로 하는 번호판 인식 시스템에 의하여 달성된다.An object of the present invention is to provide a license plate recognition system including a vehicle license plate through deep learning, including an input unit for inputting a photographed license plate image; a license plate detection unit that detects the license plate position by learning the input license plate image using a rectangular bounding box and four corner vertex coordinates to correct distortions such as tilt and rotation of the license plate; A character recognition unit that divides the image detected by the license plate detection unit and recognizes characters inside the license plate through an attention-based algorithm, wherein the character recognition unit has a recognition feature including a residual block. A recognition head module including an extraction module and a deformable attention stage having a residual deformable block (ResDformable Block) of characters input from the recognition feature extraction module, wherein the recognition feature extraction module performs a convolutional attention stage. By arranging operations, batch normalization (BN), max pooling, deformable attention steps, and residual blocks as many as set factors, the neural network factors represented by various operations are obtained through a feature map consisting of numbers including a matrix and numbers. They are calculated with each other, and the recognition head module splits the information of the feature map input from the recognition feature extraction module and determines the relationship between them. Relation Attention , Parallel Attention, is achieved by a license plate recognition system characterized by including a character decoding process.

또한, 상기 번호판 검출부는, 컨볼루션(convolution) 연산, 최대 풀링(Max Pooling) 및 잔여 블록(Residual Block)을 설정된 횟수만큼 수행한 후, 컨볼루션 연산을 포함하는 검출 특징추출모듈과, 직사각형의 중심점, 폭 및 너비를 활용하여 번호판의 상기 바운딩 박스를 검출하는 제1브랜치와 상기 네 개 모서리의 꼭짓점 좌표를 활용하여 번호판을 검출하는 제2브랜치를 포함하는 검출 헤드모듈을 포함하는 것이 바람직하다.In addition, the license plate detection unit performs convolution operation, max pooling, and residual block a set number of times, and then detects the detection feature extraction module including the convolution operation, and the center point of the rectangle. , it is preferable to include a detection head module including a first branch that detects the bounding box of the license plate using the width and the second branch that detects the license plate using the vertex coordinates of the four corners.

또한, 상기 검출 특징추출모듈의 상기 컨볼루션 연산 단계는, 컨블루션 연산, 배치 정규화(BN, Batch normalization), 렐루(ReLU) 함수를 포함하는 것이 바람직하다.In addition, the convolution operation step of the detection feature extraction module preferably includes a convolution operation, batch normalization (BN), and ReLU function.

또한, 상기 제1브랜치는 상기 검출 특징추출모듈에서 입력된 이미지의 중심점의 히트맵을 생성하는 과정과, 상기 직사각형의 폭과 높이를 설정하는 과정과, 오프셋(offset)을 산출하는 과정을 포함하는 것이 바람직하다.In addition, the first branch includes a process of generating a heat map of the center point of the image input from the detection feature extraction module, a process of setting the width and height of the rectangle, and a process of calculating an offset. It is desirable.

또한, 상기 제2브랜치는 상기 검출 특징추출모듈에 입력된 이미지에서 상기 직사각형의 히트맵을 생성하는 과정과, 상기 네 개의 꼭짓점 좌표를 설정된 방향에 대하여 순차적으로 검출하는 과정과, 오프셋(offset)을 산출하는 과정을 포함하는 것이 바람직하다.In addition, the second branch includes a process of generating the rectangular heat map from the image input to the detection feature extraction module, a process of sequentially detecting the coordinates of the four vertices in a set direction, and an offset. It is desirable to include a calculation process.

삭제delete

이에, 본 발명에 따르면, 차량 번호판의 이미지가 눈이 오거나 흐린 날씨에 촬영된 경우, 채도나 명도 등이 흐릿한 경우, 비스듬하게 촬영된 경우를 포함하는 노이즈를 갖는 경우 이러한 노이즈에 강건한 어텐션 방식에 기반을 둔 번호판 레이아웃 타입에 독립적인 차량 번호판 인식 시스템을 제공할 수 있다.Accordingly, according to the present invention, when an image of a vehicle license plate has noise, including when it is photographed in snowy or cloudy weather, when saturation or brightness is blurred, or when it is photographed at an angle, an attention method that is robust to such noise is used. It is possible to provide a vehicle license plate recognition system that is independent of the license plate layout type.

또한, 차량, 오토바이, 차량의 스티커 등 다양한 종류의 번호판 내지 부착 내용에도 적용이 될 수 있는 어텐션 방식에 기반을 둔 번호판 레이아웃 타입에 독립적인 차량 번호판 인식 시스템을 제공할 수 있다.In addition, it is possible to provide a vehicle license plate recognition system independent of the license plate layout type based on the attention method that can be applied to various types of license plates or attachment contents such as vehicles, motorcycles, and vehicle stickers.

또한, 번호판의 검출 및 인식 성능을 증대시킬 수 있는 어텐션 방식에 기반을 둔 번호판 레이아웃 타입에 독립적인 차량 번호판 인식 시스템을 제공할 수 있다.In addition, it is possible to provide a vehicle license plate recognition system independent of the license plate layout type based on an attention method that can increase license plate detection and recognition performance.

도 1은 본 발명의 일실시예에 따른 번호판 인식 시스템을 개략적으로 설명하는 흐름도,
도 2는 도 1의 흐름을 그림을 포함하여 보여주는 개관도,
도 3은 번호판 검출부를 설명하기 위한 그림,
도 4는 문자 인식부의 인식 특징추추모듈을 설명하기 위한 흐름도 및 주요부 구조도,
도 5는 문자 인식부의 문자 헤드모듈을 설명하기 위한 흐름도 및 주요부 구조도,
도 6은 실험에 사용된 데이터셋의 사진,
도 7은 입력된 이미지와 본 발명에 따라 검출되고 인식된 결과를 보여주는 사진이다.1 is a flowchart schematically explaining a license plate recognition system according to an embodiment of the present invention;
Figure 2 is an overview diagram showing the flow of Figure 1 including pictures;
Figure 3 is a diagram for explaining the license plate detection unit;
Figure 4 is a flowchart and main structure diagram for explaining the recognition feature inference module of the character recognition unit;
Figure 5 is a flowchart and main structure diagram for explaining the character head module of the character recognition unit;
Figure 6 is a photo of the dataset used in the experiment;
Figure 7 is a photograph showing the input image and the results detected and recognized according to the present invention.

본 발명의 일실시예에 따른 어텐션 방식에 기반을 둔 번호판 레이아웃 타입에 독립적인 차량 번호판 인식 시스템(10000, 이하에서 ‘번호판 인식 시스템’이라 함)에 대하여 도 1 내지 도 7을 참조하여 구체적으로 설명하면 다음과 같다.A vehicle license plate recognition system (10000, hereinafter referred to as 'license plate recognition system') independent of the license plate layout type based on the attention method according to an embodiment of the present invention will be described in detail with reference to FIGS. 1 to 7. If you do so, it is as follows.

도 1은 본 발명의 일실시예에 따른 번호판 인식 시스템을 개략적으로 설명하는 흐름도이고, 도 2는 도 1의 흐름을 그림을 포함하여 보여주는 개관도이며, 도 3은 번호판 검출부를 설명하기 위한 그림이고, 도 4는 문자 인식부의 인식 특징추추모듈을 설명하기 위한 흐름도 및 주요부 구조도이며, 도 5는 문자 인식부의 문자 헤드모듈을 설명하기 위한 흐름도 및 주요부 구조도이고, 도 6은 실험에 사용된 데이터셋의 사진이며, 도 7은 본 발명에 따라 검출되고 인식된 결과를 보여주는 사진이다.Figure 1 is a flowchart schematically explaining a license plate recognition system according to an embodiment of the present invention, Figure 2 is an overview diagram showing the flow of Figure 1 including pictures, and Figure 3 is a diagram for explaining the license plate detection unit. , Figure 4 is a flowchart and main structure diagram for explaining the recognition feature extraction module of the character recognition unit, Figure 5 is a flowchart and main structure diagram for explaining the character head module of the character recognition unit, and Figure 6 is a diagram of the data set used in the experiment. This is a photo, and Figure 7 is a photo showing the results detected and recognized according to the present invention.

본 발명을 보다 상세하게 설명하기에 앞서, 본 발명은 다양한 변경을 가할 수 있고 여러 가지 형태를 가질 수 있는 바, 구현예(態樣, aspect)(또는 실시예)들을 본문에 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 개시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Before explaining the present invention in more detail, since the present invention can make various changes and take various forms, implementation examples (or embodiments) will be described in detail in the text. . However, this is not intended to limit the present invention to a specific disclosed form, and should be understood to include all changes, equivalents, and substitutes included in the spirit and technical scope of the present invention.

각 도면에서 동일한 참조부호, 특히 십의 자리 및 일의 자리 수, 또는 십의 자리, 일의 자리 및 알파벳이 동일한 참조부호는 동일 또는 유사한 기능을 갖는 부재를 나타내고, 특별한 언급이 없을 경우 도면의 각 참조부호가 지칭하는 부재는 이러한 기준에 준하는 부재로 파악하면 된다.In each drawing, the same reference signs, especially the tens and ones digits, or the same tens, one's, and alphabets, indicate members having the same or similar functions, and unless otherwise specified, each of the drawings The member indicated by the reference sign can be understood as a member that complies with these standards.

또 각 도면에서 구성요소들은 이해의 편의 등을 고려하여 크기나 두께를 과장되게 크거나(또는 두껍게) 작게(또는 얇게) 표현하거나, 단순화하여 표현하고 있으나 이에 의하여 본 발명의 보호범위가 제한적으로 해석되어서는 안 된다.In addition, in each drawing, the components are expressed exaggeratedly large (or thick), small (or thin), or simplified in size or thickness in consideration of convenience of understanding, etc., but as a result, the scope of protection of the present invention is interpreted as limited. It shouldn't be.

본 명세서에서 사용한 용어는 단지 특정한 구현예(태양, 態樣, aspect)(또는 실시예)를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, ~포함하다~ 또는 ~이루어진다~ 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in this specification are merely used to describe specific implementation examples (sun, aspect, aspect) (or examples), and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, terms such as ~include~ or ~consist of~ are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are not intended to indicate the presence of one or more other features. It should be understood that this does not exclude in advance the possibility of the existence or addition of elements, numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person of ordinary skill in the technical field to which the present invention pertains. Terms defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and unless explicitly defined in the present application, should not be interpreted in an ideal or excessively formal sense. No.

본 발명의 일실시예에 따른 번호판 인식 시스템(1000)은, 도 1 내지 도 5에 도시된 바와 같이, 딥러닝을 활용하고, 촬영된 번호판 이미지를 입력하는 입력부(미도시)와; 상기 입력된 번호판의 이미지를 직사각형 바운딩 박스(bounding box)와 네 개 모서리의 꼭짓점 좌표를 활용하여 번호판의 경사, 회전 등의 왜곡을 보정 가능하게 학습하여 검출하는 번호판 검출부(1100)와; 상기 번호판 검출부(1100)에 검출된 이미지를 분할하여(쪼개어, 나누여) 어텐션(attention) 기반의 알고리즘을 통해 번호판 내부의 문자를 인식하는 문자 인식부(1300);를 포함하는 것이 바람직하다.As shown in FIGS. 1 to 5, the license plate recognition system 1000 according to an embodiment of the present invention utilizes deep learning and includes an input unit (not shown) that inputs a captured license plate image; A license plate detection unit 1100 that learns and detects the input license plate image by using a rectangular bounding box and the vertex coordinates of the four corners to correct distortions such as tilt and rotation of the license plate; It is preferable to include a character recognition unit 1300 that divides (splits, divides) the image detected by the license plate detection unit 1100 and recognizes characters inside the license plate through an attention-based algorithm.

입력부(미도시)는 단일 연산 장치로 효율적으로 차량 번호판 인식을 수행하기 위해서 병렬 처리 모듈을 통해 복수의 입력 영상을 동시에 처리한다.The input unit (not shown) simultaneously processes multiple input images through a parallel processing module in order to efficiently perform vehicle license plate recognition with a single processing unit.

병렬 처리 모듈은 CCTV 등의 입력 영상을 연산 장치의 CPU의 코어를 병렬로 활용하여 복수의 입력영상을 지연 없이 번호판 검출부(1100)로 전달할 수 있도록 한다.The parallel processing module allows input images such as CCTV to be transmitted to the license plate detection unit 1100 without delay by utilizing the cores of the CPU of the computing device in parallel.

보통 종래기술에서는 복수의 입력 영상을 처리 할 때 단일 영상을 순차적으로 처리하는 방식으로 처리하게 된다. 이러한 방식의 비효율적인 문제점을 개선하고자 본 발명에서는 멀티 프로세싱 알고리즘을 적용하여 병렬 연산 모듈을 구성하게 되었다.Usually, in the prior art, when processing a plurality of input images, a single image is processed sequentially. In order to improve the inefficiency of this method, the present invention applies a multi-processing algorithm to construct a parallel operation module.

번호판 검출부(1100)는 검출 특징추출모듈(1130)과 검출 헤드모듈(1150)을, 문자 인식부(1300)는 인식 특징추출모듈(1330)과 인식 헤드모듈(1350)을 포함하는 것이 바람직하다.It is preferable that the license plate detection unit 1100 includes a detection feature extraction module 1130 and a detection head module 1150, and the character recognition unit 1300 includes a recognition feature extraction module 1330 and a recognition head module 1350.

본 발명의 번호판 인식 시스템의 실시예로 자동차 번호판, 오토바이 번호판, 차량의 유리 앞면에 부착된 스티커 등에 포함된 문자(영문, 한문, 한글 등과 숫자)를 추출하고 인식하는 것에 대하여만 설명한다. 하지만 본 발명은 선박 마크, 비디오 및 응용 프로그램에서의 문자, 마크 등을 인식하는 영역까지 본 발명의 기술적 사상을 해하지 않는 범위 이내에서 확장 가능하다.As an embodiment of the license plate recognition system of the present invention, only the extraction and recognition of characters (English letters, Chinese characters, Korean numbers, etc.) included in car license plates, motorcycle license plates, stickers attached to the front of a vehicle's glass, etc. will be described. However, the present invention can be expanded to the area of recognizing characters, marks, etc. in ship marks, videos, and application programs, without harming the technical idea of the present invention.

또한, 차량 번호판은 각 국가별로 문자 또는/및 숫자 표현 방식도 상이합니다. 이러한 상이하고 다양한 차량 번호판에도 본 발명이 적용됨은 물론이다.Additionally, vehicle license plates display letters and/or numbers differently in each country. Of course, the present invention is applied to these different and diverse vehicle license plates.

<번호판 감지 (Licenser Plate Detection)><Licenser Plate Detection>

딥 러닝 기반 객체 감지 방법은 일반적으로 심층 신경망을 사용하여 특징을 추출하고 위치 매개변수를 회귀한다. 이러한 방법은 세 가지 범주로 나눌 수 있다.Deep learning-based object detection methods typically use deep neural networks to extract features and regress location parameters. These methods can be divided into three categories.

첫째, 앵커 박스(anchor box) 기반 2단계 방식, 둘째, 앵커 박스 기반 1단계 방식, 셋째, 앵커 없는 방식. 앵커 박스는 이미지의 대상 개체가 위치할 수 있는 위치에서 특정 높이와 너비의 미리 정의된 경계 박스 세트이다. 신경망은 회귀 방법을 사용하여 앵커 박스를 다듬어 경계 박스를 예측한다. First, a two-step method based on an anchor box, second, a one-step method based on an anchor box, and third, an anchorless method. An anchor box is a set of predefined bounding boxes of a specific height and width where the target object in the image can be located. The neural network uses regression methods to refine the anchor box and predict the bounding box.

대표적으로 RCNN (Recurrent Convolution Neural Nestwork) 시리즈와 같은 2단계 방식에서는 RPN (Region Proposal Network)이 ROI (Region of Interest)를 생성한다. 이것은 기능맵에 투영되고 풀링 단계를 거친다. 완전히 연결된 헤드는 경계 박스를 조정하는 과정을 거친다. Typically, in two-stage methods such as the RCNN (Recurrent Convolution Neural Nestwork) series, RPN (Region Proposal Network) generates ROI (Region of Interest). This is projected onto the feature map and goes through a pooling step. A fully connected head goes through a process of adjusting its bounding box.

YOLO 시리즈와 같은 원스텝 방식은 ROI를 투영하고 풀링하는 단계를 제거하여 단일 신경망을 사용하여 수행된다. One-step methods such as the YOLO series are performed using a single neural network, eliminating the steps of projecting and pooling the ROI.

앵커 박스를 사용하면 특히 작은 물체에 대해 높은 감지 성능과 성능을 향상시킬 수 있다. 그러나 세 가지 단점이 있다. The use of anchor boxes allows for high detection performance and improved performance, especially for small objects. However, there are three drawbacks.

첫째, 앵커 박스 방법은 미리 정의된 매우 큰 경계 박스 세트가 필요하다. Retinanet 에서는 100K 이상을 사용했습니다. 이로부터 포지티브/네거티브 카테고리에 속하는 박스들 간에 불균형 문제가 발생한다. First, the anchor box method requires a very large set of predefined bounding boxes. Retinanet used over 100K. This creates an imbalance problem between boxes belonging to the positive/negative categories.

둘째, 앵커 박스를 생성하는 과정에서 미리 정의된 박스의 매개변수를 결정해야 한다. 이는 사용자 개입을 최소화해야 하는 자동화 시스템 측면에서 단점이다.Second, in the process of creating an anchor box, the parameters of the predefined box must be determined. This is a disadvantage for automated systems that require minimal user intervention.

마지막으로 앵커 기반 방법은 NMS (Non-Maximum-Suppression)로 줄여야 하는 겹치는 경계 박스를 많이 예측한다. 이러한 단점을 해결하기 위해 최근에는 앵커 프리(anchor-free) 방법을 사용하는 새로운 객체 감지 신경망이 제안되었다. 본 발명에서는 센터넷의 개념을 이용하여 차량 번호판 검출을 위해 일반적인 앵커가 없는 객체 검출 신경망이 채택된다. 이것은 차량 번호판 응용 프로그램을 확장할 뿐만 아니라 순차적 감지-인식 (detection-recognition) 접근 방식을 사용하여 다른 표시된 문자 데이터셋에도 사용할 수 있다.Finally, anchor-based methods predict many overlapping bounding boxes that need to be reduced to Non-Maximum-Suppression (NMS). To solve these shortcomings, a new object detection neural network using an anchor-free method was recently proposed. In the present invention, a general anchorless object detection neural network is adopted to detect vehicle license plates using the concept of centernet. This not only extends license plate applications, but can also be used on other marked character datasets using a sequential detection-recognition approach.

이를 구체적으로 설명하면 다음과 같다.This is explained in detail as follows.

- 차량 번호판 검출을 위한 검출 특징추출모듈 (detection feature extraction module) (1130)- Detection feature extraction module (1130) for vehicle license plate detection

본 발명에 따른 검출 헤드모듈(1150)은 픽셀 수준에서 손실 함수를 계산해야 하므로 공간 정보를 축소하지 않는 도 3의 좌측에 도시된 바와 같이 간단하고 가벼운 백본 네트워크 (light weight backbone network)를 포함한다. 검출 특징추출모듈(1130)은 ResNet18의 일부 레이어를 기능 추출기로 채택하고 컨볼루션 연산, 배치 정규화 (BN, batch normalization), 최대 풀링 (max pooling), 드롭아웃 (dropout) 및 ReLU 를 활성화 함수로 사용한다. 이러한 네트워크의 아키텍처는 <표 1>에 나타나 있다.The detection head module 1150 according to the present invention includes a simple and light weight backbone network as shown on the left side of FIG. 3, which does not reduce spatial information because the loss function must be calculated at the pixel level. The detection feature extraction module 1130 adopts some layers of ResNet18 as a feature extractor and uses convolution operation, batch normalization (BN), max pooling, dropout, and ReLU as activation functions. do. The architecture of this network is shown in <Table 1>.

InputInput OperatorOperator OutputOutput 512 x 512 x 3512 x 512 x 3 Conv 3x3 + BN + ReLUConv 3x3 + BN + ReLU 512 x 512 x 64512 x 512 x 64 512 x 512 x 64 512 x 512 x 64 Conv 3x3 + BN + ReLUConv 3x3 + BN + ReLU 512 x 512 x 127512 x 512 x 127 512 x 512 x128512x512x128 MaxPoolMaxPool 256 x 256 x 128256 x 256 x 128 256 x 256 x 128256 x 256 x 128 Residual BlockResidual Block 256 x 256 x 128256 x 256 x 128 256 x 256 x 128256 x 256 x 128 MaxPoolMaxPool 128 x 128 x 128128 x 128 x 128 128 x 128 x 128 128 x 128 x 128 Residual BlockResidual Block 128 x 128 x 128128 x 128 x 128 128 x 128 x 128128 x 128 x 128 Conv 3x3 + BN + ReLUConv 3x3 + BN + ReLU 128 x 128 x 256128 x 128 x 256 128 x 128 x 256128 x 128 x 256 Conv 1x1Conv 1x1 128 x 128 x 1024128 x 128 x 1024

차량 번호판 감지의 경우 차량 번호판 영역의 특징은 입력 차량 이미지에 대해 명확하다. 따라서 단일 규모의 단순 네트워크는 특징을 추출하기에 충분하다. <표 1>에서와 같이 입력 RGB 이미지가 주어지면 512 * 512 해상도로 크기가 조정되고 백본 네트워크를 통해 전달된다. 여기서 비선형 특징을 풍부하게 표현하기 위해 출력 특징맵의 크기를 4번 다운 샘플링하고 채널을 1024로 설정한다.In the case of vehicle license plate detection, the characteristics of the vehicle license plate area are clear for the input vehicle image. Therefore, a simple network at a single scale is sufficient to extract features. As shown in <Table 1>, given the input RGB image, it is resized to 512 * 512 resolution and delivered through the backbone network. Here, in order to express non-linear features richly, the size of the output feature map is down-sampled four times and the channel is set to 1024.

- 검출 헤드모듈 (detection head module) (1150)- Detection head module (1150)

본 발명에서는 앞에서 설명한 바와 같이 앵커 프리 방식을 위한 검출 헤드모듈(1150)을 포함한다. 그림 2와 같이 검출 헤드모듈(1150)은 검출 특징추출모듈(1130)의 마지막 단계를 입력으로 받아 두 개의 브랜치(branch)를 통해 출력을 예측한다.As described above, the present invention includes a detection head module 1150 for the anchor-free method. As shown in Figure 2, the detection head module 1150 receives the last step of the detection feature extraction module 1130 as input and predicts the output through two branches.

입력 이미지 I ∈ R^W×H×3 인 경우(이하에서 특별한 설명이 없으면 ‘W’는 직사각형의 폭, ‘H’는 직사각형의 높이로 가정함), 첫 번째 브랜치(도 3의 우측 상부의 점선으로 표시된 사각 부분)는 다음의 3개의 출력 (히트맵, 박스 높이 및 너비, 오프셋)을 예측한다. If the input image I ∈ R ^W The square part marked with ) predicts the following three outputs (heatmap, box height and width, and offset).

중심점 히트맵 Y _heatmap ∈ R^W/4×H/4×1, 두 번째, 박스 높이 및 너비 Y _w,h ∈ R^W/4×H/4×2, 3> 크기 오프셋 Y _offsets ∈ R^W/4×H/4×2이다. 그런 다음 출력이 디코딩되어 경계 박스를 찾는다. Center point heatmap Y _heatmap ∈ R ^W/4×H/4×1 , second, box height and width Y _w,h ∈ R ^W/4×H/4×2 , 3> size offset Y _offsets ∈ R ^W/ It is ^4×H/4×2 . The output is then decoded to find the bounding box.

또한 두 번째 브랜치(도 3의 우측 하부의 점선으로 표시된 사각 부분)는 3개의 출력 코너 포인트 히트맵 Y _corner ∈ R^W/4×H/4×4, 코너 포인트 좌표 Y _coords ∈ R^W/4×H/4×8, 좌표 오프셋 Y _offsets ∈ R^W/4×H/4×2를 예측한다. 두 번째 출력은 또한 4개의 코너 포인트를 찾기 위해 디코딩됩니다. 감지된 모서리 점을 사용하여 수정된 차량 번호판 패치 (LP Patch)를 얻을 수 있다.Additionally, the second branch (the dotted square portion in the lower right of Figure 3) has three output corner point heatmaps Y _corner ∈ R ^W/4×H/4×4 , corner point coordinates Y _coords ∈ R ^{W/4× H/4×8} , predict coordinate offset Y _offsets ∈ R ^W/4×H/4×2 . The second output is also decoded to find the four corner points. A modified vehicle license plate patch (LP Patch) can be obtained using the detected corner points.

<차량 번호판 인식 (License Plate Recognition)><License Plate Recognition>

CNN(Convolution Neural Networks)이 풍부한 데이터의 특징 공간을 학습할 수 있음이 실험적으로 입증되었으며, CNN 기반의 다양한 광학 문자 인식 신경망이 제안된다. 기존의 방법은 차량 번호판 인식을 연속 레이블 문제로 간주하고 CTC 손실 (Connectionist Temporal Classification Loss)을 사용하여 접근한다. 차량 번호판 인식을 위해 슬라이딩 윈도우 방식의 단일 클래스 검출기(single class detector)도 사용된다. CTC loss의 단점은 네트워크의 특징 맵(feature map)이 실제 문자 시퀀스를 고려하여 재배열되어야 하고, 실제 데이터에서 발생할 수 있는 노이즈 데이터에 대해서는 성능이 좋지 않다는 점이다.It has been experimentally proven that CNN (Convolution Neural Networks) can learn the feature space of rich data, and various CNN-based optical character recognition neural networks are proposed. Existing methods consider license plate recognition as a continuous label problem and approach it using CTC loss (Connectionist Temporal Classification Loss). A sliding window type single class detector is also used for vehicle license plate recognition. The disadvantage of CTC loss is that the network's feature map must be rearranged considering the actual character sequence, and its performance is poor for noise data that may occur in real data.

결정적으로 앞서 언급한 방법들을 다양한 레이아웃에 적용하기 위해서는 분류 모듈이나 네트워크 수정이 필요하다. 차량 번호판 인식은 현실 세계에서 높은 정확도를 요구하기 때문에 대상 데이터셋(datasets)에 대해 전이 학습 이 필수적이므로 이러한 수정 불가능한 블랙박스 유형 시스템에는 한계가 있다. 본 발명은 이러한 문제를 해결하기 위하여 잔여 디폼어블 블록 (residual deformable block)이 있는 잡음에 강하고 레이아웃 독립적인 인식 네트워크를 설계하기 위해 제시된 어텐션 기반 엔코더-디코더 (attention based encoder-decoder) 방법을 수정하여 적용하였다.Crucially, in order to apply the aforementioned methods to various layouts, modification of the classification module or network is necessary. Since vehicle license plate recognition requires high accuracy in the real world, transfer learning is essential for target datasets, so this unmodifiable black box type system has limitations. To solve this problem, the present invention applies a modified attention-based encoder-decoder method to design a noise-resistant and layout-independent recognition network with residual deformable blocks. did.

- 인식 특징추출모듈 (Recognition Feature Extraction Module) (1330)- Recognition Feature Extraction Module (1330)

차량 번호판 패치(LP patch)는 탐지 네트워크의 결과인 차량 번호판 위치를 사용하여 수정된다. 보정된 이미지는 I_rectified ∈ R^W×H×3이며 차량 번호판의 특성상 W = 100, H = 32의 직사각형으로 설정하였다. 특징맵에서 캐릭터 위치에 집중하고 상위 레벨 캐릭터의 특징을 추출하기 위해 앞에서 설명한 새로운 어텐션 기반 백본 네트워크(attention-based backbone network)를 설계한다. 잔여 디폼어블 블록 (Residual Deformable Block)이 있는 2개(설정되 횟수 2개의 경우)의 디폼어블 어텐션 단계 (Deformable Attention Stage)를 포함하는 새로운 백본 네트워크를 제안한다. The vehicle license plate patch (LP patch) is modified using the vehicle license plate location resulting from the detection network. The corrected image is I _rectified ∈ R ^W×H×3 , and due to the characteristics of the vehicle license plate, it was set to a rectangle with W = 100 and H = 32. To focus on character positions in the feature map and extract high-level character features, we design a new attention-based backbone network described earlier. We propose a new backbone network that includes two (for a set number of times) Deformable Attention Stages with Residual Deformable Blocks.

이 단계에서 컨볼루션 필터(convolution filter)는 그림 4와 같이 디폼어블 컨볼루션(deformable convolution)에 의한 합리적인 수용 필드를 가지므로 공간 왜곡에 강하고 지속적인 다운샘플링(downsampling)과 업샘플링(upsampling)을 통해 다단계 특성을 커버할 수 있다. At this stage, the convolution filter has a reasonable receptive field by deformable convolution as shown in Figure 4, so it is resistant to spatial distortion and multi-stage through continuous downsampling and upsampling. characteristics can be covered.

또한, 저수준 특징(두번째/중앙 부분의 사각형 박스 중 우측 영역 - 입력된 부분에서 좌측으로 분기 되는 영역 - 으로 입력된 이미지에 근접된 특징을 갖는 영역)과 고수준 특징(두 번째/중앙 부분의 사각형 박스 중 좌측 영역 - 입력된 부분에서 좌측으로 분기 되는 영역 - 으로 높은 수준의 입력인 이미지로부터 더 멀어진 특징을 갖는 영역)의 합성곱을 통해 어텐션을 수행함으로써, 문자 특징을 효율적으로 추출할 수 있다. In addition, low-level features (the right area of the square box in the second/center part - the area branching to the left from the input part) and high-level features (areas with features close to the input image) and high-level features (the square box in the second/center part) Character features can be extracted efficiently by performing attention through convolution of the central left region (the region branching to the left from the input portion) and the region with features further away from the high-level input image.

아울러 디폼어블 어텐션 단계에서 잔여 디폼어블 블록 (residual deformable attention)이 복수로 포함되어 있다. 이러한 잔여 디폼어블 블록은 도 4의 우측 영역에 보는 바와 같이 ‘offset’라는 과정을 거치므로 정형화된 직사각형 형태뿐만 아니라 정형화된 정사각형과 먼 영역도 포함시킬 수 있다. In addition, a plurality of residual deformable blocks (residual deformable attention) are included in the deformable attention step. These remaining deformable blocks go through a process called ‘offset’, as shown in the right area of Figure 4, so they can include not only the standardized rectangular shape but also areas that are far from the standardized square.

즉, 잔여 디폼어블 블록을 어텐션 기반 네트워크로 인해 본 발명에 따른 문자 인식부(1300)는 여러 줄, 곡선 및 크기가 다른 단어와 같은 불규칙한 문자 또는/ 및 마크 데이터의 특징을 효율적으로 추출할 수 있다. 제안된 네트워크 아키텍처는 <표 2>와 같다.In other words, by using the remaining deformable blocks as an attention-based network, the character recognition unit 1300 according to the present invention can efficiently extract features of irregular characters or/and mark data such as multiple lines, curves, and words of different sizes. . The proposed network architecture is shown in <Table 2>.

InputInput OperatorOperator OutputOutput 32 x 100 x 332 x 100 x 3 Conv 3x3 + BN + ReLUConv 3x3 + BN + ReLU 32 x 100 x 3232 x 100 x 32 32 x 100 x 32 32 x 100 x 32 MaxPoolMaxPool 16 x 50 x 32 16 x 50 x 32 16 x 50 x 3216 x 50 x 32 Residual BlockResidual Block 16 x 50 x 6416 x 50 x 64 16 x 50 x 6416 x 50 x 64 Deformable Attention Stage Deformable Attention Stage 16 x 50 x 6416 x 50 x 64 16 x 50 x 6416 x 50 x 64 Residual Block (S = 2)Residual Block (S = 2) 8 x 25 x 1288 x 25 x 128 8 x 25 x 128 8 x 25 x 128 Deformable Attention Stage Deformable Attention Stage 8 x 25 x 1288 x 25 x 128 8 x 25 x 1288 x 25 x 128 Residual BlockResidual Block 8 x 25 x 2568 x 25 x 256 8 x 25 x 2568 x 25 x 256 Residual BlockResidual Block 8 x 25 x 10248 x 25 x 1024

<표 2>에서 보는 바와 같이 인식 특징추출모듈(1330)은, 컨블루션 연산, 맥스 풀링(Max Pooing), 잔여 블록(Residual Block), 디폼어블 어텐션 단계와 잔여 블록이 설정된 횟수(본 발명의 경우 2회)만큼 수행되고 2회에 거쳐 잔여 블록 과정을 수행한다.As shown in <Table 2>, the recognition feature extraction module 1330 performs convolution operation, max pooling, residual block, deformable attention step, and the number of times the residual block is set (in the case of the present invention). 2 times) and the remaining block process is performed 2 times.

여기서 컨볼류션 연산 과정은 Conv 3*3, 배치 정규화 (BN, Batch Normalization), ReLU 과정을 포함하는 것이 바람직하다.Here, the convolution operation process preferably includes Conv 3*3, batch normalization (BN), and ReLU processes.

- 인식 헤드모듈 (Character Recognition Head Module) (1350)- Character Recognition Head Module (1350)

인식 특징추출모듈(1330)은 추출기는 완전 연결 계층(fully connected layer)을 포함하지 않기 때문에 생성된 특징맵은 공간 정보를 유지하고 있다. Since the recognition feature extraction module 1330 does not include a fully connected layer as an extractor, the generated feature map maintains spatial information.

여기서, c = 1024로 특징맵의 채널이고, k = W/4×H/4로 순차 기능 토큰(sequential feature token)의 갯수이다. 그리고, 인식 헤드모듈(1350)을 통과한 후 최종 디코딩된 문자를 얻는다. 인식 헤드모듈(1350)은 도 5에 도시된 바와 같이, 3개의 모듈을 포함한다.Here, c = 1024, which is the channel of the feature map, and k = W/4×H/4, which is the number of sequential feature tokens. Then, after passing through the recognition head module 1350, the final decoded character is obtained. The recognition head module 1350 includes three modules, as shown in FIG. 5.

3개의 모듈은 공간 관련 어텐션 모듈 (spacial relation attention module), 병렬 어텐션 모듈 (Parallel Attention Module) 및 문자 디코딩 모듈 (Character Decoding Module)이다. 이 단계에서 모델은 어텐션 기반 메커니즘을 통해 위치 인코딩의 유사성을 계산하고 출력 문자를 예측한다.The three modules are the spatial relation attention module, parallel attention module, and character decoding module. In this step, the model calculates the similarity of positional encoding and predicts the output character through an attention-based mechanism.

이 방법을 사용하면 레이아웃 독립적인 번호판과 다른 표시 문자를 인식할 수 있다는 장점을 갖는다.This method has the advantage of being able to recognize layout-independent license plates and other display characters.

공간 관련 어텐션 모듈(Attn_r) : 식 (1)-(5)에서 특징맵은 셀프 어텐션 (self attention)을 수행한다. 이 과정은 공간적 유사성을 고려하여 새로운 특징맵을 만든다. 주어진 특징맵 (feature map) I ∈ R^k×c가 공간 관련 어텐션 모듈의 입력으로 주어졌을 때(어텐션 모듈의 입력으로 특성맵이 주어진 경우), PE(Position Embedding Vector)는 연속 토큰인 MLP (Multi-Layer Perceptron), MSA (Multi-head Self-Attention), PWFF (Position Wise FeedFoward) 및 LN (Layer Normalization) 방법을 포함하고 있다. PWFF는 MLP의 또 다른 유형이다. 여기서 L을 2로 설정하였다. 출력 노드 (output node)는 독립적이며 병렬로 최적화될 수 있다.Space-related attention module (Attn _r ): In equations (1)-(5), the feature map performs self attention. This process creates a new feature map by considering spatial similarity. When a given feature map I ∈ R ^k×c is given as an input to a space-related attention module (if a feature map is given as an input to the attention module), the PE (Position Embedding Vector) is -Layer Perceptron), MSA (Multi-head Self-Attention), PWFF (Position Wise FeedFoward), and LN (Layer Normalization) methods. PWFF is another type of MLP. Here, L is set to 2. Output nodes are independent and can be optimized in parallel.

식(1) Equation (1)

식(2) Equation (2)

식(3) Equation (3)

. 식(4) . Equation (4)

식(5) Equation (5)

식 (2)에서 MSA의 구체적인 과정은 다음과 같다. SA는 표준 qkv(쿼리, 키, 값) 셀프 어텐션이다. 만약, X ∈ R^k×c가 임의의 순차 특징맵 (arbitrary sequential feature map)이라면, 학습 가능한 가중치 행렬 W_q, W_k, W_v ∈ R^c×c, W_o ∈ R^kh×c, h는 헤드의 수이고 ch는 c×h이다.In equation (2), the specific process of MSA is as follows. SA is standard qkv (query, key, value) self-attention. If X ∈ R ^k×c is an arbitrary sequential feature map, learnable weight matrices W _q , W _k , W _v ∈ R ^c×c , W _o ∈ R ^kh×c , h are The number of heads and ch is c×h.

, , . 식(6) , , . Equation (6)

. 식(7) . Equation (7)

. 식(8) . Equation (8)

병렬 어텐션모듈(Attn_p): 이 모듈 작동은 식 (9)를 따른다. 공간 관련 어텐션 모듈의 출력에 학습 가능한 가중치를 곱한 다음 인식 특징추출모듈(1330)의 출력 특징맵을 사용하여 어텐션을 수행한다. 이 과정에서 공간 관련어텐션을 고려한 특성과 이전 특성을 모두 사용한다. 여기서 W₁ ∈ R^c×c, W₂ ∈ R^n×c, n은 인식 가능한 최대 문자 길이이다.Parallel attention module (Attn _p ): The operation of this module follows equation (9). The output of the space-related attention module is multiplied by a learnable weight, and then attention is performed using the output feature map of the recognition feature extraction module 1330. In this process, both features considering space-related attention and previous features are used. Here, W ₁ ∈ R ^c×c , W ₂ ∈ R ^n×c , and n is the maximum recognizeable character length.

식(9) Equation (9)

문자 디코딩 모듈(CDM): 문자 디코딩 모듈은 이전 결과를 출력 문자로 디코딩(복호화)한다. 인코더와 유사하게 디코더 레이어의 두 레이어는 출력 노드 간의 관계적 어텐션을 위해 쌓인다. 그런 후, CDM은 Softmax 연산을 통해 출력 문자를 예측한다. 이 과정은 식 (10)과 같다.Character Decoding Module (CDM): The character decoding module decodes (decodes) the previous result into output characters. Similar to the encoder, the two layers of the decoder layer are stacked for relational attention between output nodes. Then, CDM predicts the output characters through Softmax operation. This process is equivalent to equation (10).

식(10) Equation (10)

<최적화 (Optimization)><Optimization>

- 번호판 검출 단계: - License plate detection steps:

경계 박스 위치를 찾는 경우 네트워크에는 입력 이미지 (I ∈ R^W×H×3)에 대한 차량 번호판의 중심 위치, 너비와 높이 및 오프셋이 필요하다. When finding the bounding box location, the network needs the center location, width and height, and offset of the license plate for the input image (I ∈ R ^W×H×3 ).

중심점은 히트 맵 Y_xyc ∈ [0, 1]^W/4×H/4×1에서 주어진다. 이것은 객체의 중심점 P_x, P_y 좌표 (σ_p는 객체 크기 대응 표준 편차임)에 가우시안 커널 (Gaussian kernel)을 적용하여 생성한다. The center point is given by the heat map Y _xyc ∈ [0, 1] ^W/4×H/4×1 . This is generated by applying a Gaussian kernel to the object's center point P _x and P _y coordinates (σ _p is the standard deviation corresponding to the object size).

중심점 손실 함수 (center point loss function)는 식(11)에 표시된 것처럼 초점 손실이 있는 패널티 감소 픽셀 와이즈 로지스틱 회귀 (penalty-reduced pixel wise logistic regression)이다. 많은 수의 쉬운 부정(배경)에 작은 가중치를 주고 소수의 어려운 긍정(키포인트)에 큰 가중치를 준다. 따라서 학습 단계에서 다수의 부정에 의해 손실이 압도되는 것을 방지한다.The center point loss function is a penalty-reduced pixel wise logistic regression with focus loss as shown in equation (11). A small weight is given to a large number of easy negatives (background) and a large weight is given to a small number of difficult positives (key points). This prevents the loss from being overwhelmed by multiple negations during the learning phase.

식(11) Equation (11)

이 공식에서 N은 중심점의 수로 1로 설정하고 , α, β는 초점 손실에 대한 하이퍼파라미터 (hyperparameter)이고, 본 발명에서는 α는 2를, β는 4를 각각 적용하였다. In this formula, N is the number of center points and is set to 1, α and β are hyperparameters for focus loss, and in the present invention, α is set to 2 and β is set to 4.

입력영상의 해상도를 4배율로 다운 샘플링하였기 때문에 공간 정보가 손상된다. 오프셋 항이 이를 보상한다. 오프셋 손실은 식 (12)와 같이 L1 손실을 통해 최적화된다.Because the resolution of the input image is downsampled by a factor of 4, spatial information is damaged. The offset term compensates for this. The offset loss is optimized through L1 loss as shown in equation (12).

. 식(12) . Equation (12)

. 식(13) . Equation (13)

마지막으로 경계 박스의 너비와 높이가 최적화된다. 식 (13)과 같이 L1 Loss를 이용하여 최적화하였다. 오프셋 손실 및 크기 손실은 중심점 위치에서만 계산되었다. P_xy는 다운샘플링 전의 포인트 위치이다. S_k는 경계 박스의 너비와 높이이다.Finally, the width and height of the bounding box are optimized. It was optimized using L1 Loss as shown in equation (13). Offset loss and size loss were calculated only at the center point location. P _xy is the point location before downsampling. S _k is the width and height of the bounding box.

오프셋 손실과 크기 손실의 기능은 L1 손실을 사용하여 큰 오차의 경우에 큰 영향을 받지 않는다는 것이다. 마찬가지로 모서리 점에 있어서도 모서리 점 각각에 대해 식 (11)을 사용하여 손실을 계산한다. L1 손실은 코너 포인트의 좌표와 오프셋을 찾는 데 사용된다. 마지막으로 가중치를 고려하여 조인트 로스 (Joint Loss) 함수를 생성하여 6개의 손실 함수를 구축한다. 우리는 실험에서 가중치 인자 λ_size = 0.05, λ_{off, c} = 0.05, λ_coord = 1 로 설정하였다.The advantage of the offset loss and magnitude loss is that they are not significantly affected in the case of large errors using the L1 loss. Similarly, for corner points, the loss is calculated using equation (11) for each corner point. L1 loss is used to find the coordinates and offset of corner points. Finally, considering the weights, a joint loss function is created to construct six loss functions. In the experiment, we set the weighting factors λ _size = 0.05, λ _{off, c} = 0.05, and λ _coord = 1.

식(14) Equation (14)

- 번호판 인식 단계: - License plate recognition steps:

손실 함수를 인식하기 위해 식 (15)에서 크로스 엔트로피 로스 (cross entropy loss)를 채택하였다. 여기서 yj는 참값을 나타내고 Pj는 예측값을 의미한다.To recognize the loss function, cross entropy loss was adopted in equation (15). Here, yj represents the true value and Pj represents the predicted value.

. 식(15) . Equation (15)

<실험 (Experiments)> <Experiments>

- 데이터셋 및 세팅 (Datasets and Setting)- Datasets and Setting

CCPD(Chinese City Parking Dataset) : 감지 및 인식 성능을 향상시키기 위해서는 학습을 위한 대규모 데이터 세트가 필수적이다. 이러한 데이터를 수동으로 수집하는 것은 시간과 비용이 많이 들 수 있다. 최근 공개된 CCPD 데이터는 총 250k 크기의 차량 번호파 이미지다. 중국에서 수집된 이 데이터 세트에는 ccpd-base, ccpd-weather (눈이 오는 날씨와 같이 비정상적인 날씨에 촬영된 이미지), ccpd-tilt (차량 번호판의 이미지가 경사지게 촬영된 경우), ccpd-rotate (차량 번호판의 이미지가 회전되어 촬영된 경우), ccpd-fn, ccpd-db (어두운 곳에서 촬용 되어 채도, 명도 등이 매우 불량한 경우) 및 ccpd-challenge (인식이 어려운 경우) 의 7개 세트가 포함되어 있다. ccpd-base의 절반은 학습용으로 사용되었고 나머지 100k 및 기타 하위 데이터셋은 테스트에 사용되었다.CCPD (Chinese City Parking Dataset): A large-scale dataset for learning is essential to improve detection and recognition performance. Collecting this data manually can be time-consuming and expensive. The recently released CCPD data is a vehicle number wave image with a total size of 250k. This dataset, collected in China, includes ccpd-base, ccpd-weather (images taken during unusual weather, such as snowy weather), ccpd-tilt (images of vehicle license plates taken at an angle), and ccpd-rotate (vehicle images taken at an angle). Includes 7 sets: (when the license plate image is rotated), ccpd-fn, ccpd-db (when the image is taken in a dark place and the saturation and brightness are very poor), and ccpd-challenge (when recognition is difficult) there is. Half of ccpd-base was used for training, and the remaining 100k and other sub-datasets were used for testing.

AOLP(Application-Oriented License Plate) : AOLP는 2,049개의 대만 번호판 이미지로 구성된다. AC (access control, 681개 이미지), LE(law enforcement, 757개 이미지) 및 RP(road patrol, 611개 이미지)의 세 가지 하위 데이터 세트로 나우어진다. 구체적으로 AC는 차량이 일정한 통로를 감속 또는 정차한 상태에서 통과하는 경우, LE는 도로변 카메라에 차량이 찍힌 경우, RP는 움직이는 다른 차량에 의해 차량이 캡처되는 경우를 의미한다. 이 데이터 세트는 제안된 네트워크가 큰 데이터셋뿐만 아니라 상대적으로 작은 데이터셋에 대해 강력하게 작동함을 입증하는 데 사용됩니다. 2k 이미지의 절반은 학습에 사용되고 나머지는 테스트에 사용되었다.AOLP (Application-Oriented License Plate): AOLP consists of 2,049 Taiwanese license plate images. It is divided into three sub-data sets: AC (access control, 681 images), LE (law enforcement, 757 images), and RP (road patrol, 611 images). Specifically, AC means when a vehicle passes through a certain passage while slowing down or stopped, LE means when a vehicle is captured by a roadside camera, and RP means when a vehicle is captured by another moving vehicle. This dataset is used to demonstrate that the proposed network performs robustly for large datasets as well as relatively small datasets. Half of the 2k images were used for training and the rest for testing.

VBLPD(베트남 자전거 번호판 데이터 세트) : 제안된 네트워크의 레이아웃 독립성을 입증하기 위해 두 줄로 된 베트남 오토바이 자동차 데이터셋이 사용되었다. 숫자 영역의 위치에 대한 정답값만 제공되기 때문에 텍스트에 대한 라벨링을 수동으로 추가하였다. 주차장에서 얻은 총 2,000장의 오토바이 차량 번호판 이미지의 절반은 학습용으로, 나머지 절반은 테스트용으로 사용되었다.VBLPD (Vietnam Bicycle License Plate Dataset): A two-line Vietnamese motorcycle car dataset was used to demonstrate the layout independence of the proposed network. Because only the correct value for the location of the number field is provided, labeling for the text was added manually. Half of a total of 2,000 motorcycle license plate images obtained from the parking lot were used for training, and the other half were used for testing.

KHPC(Korea Handicap Parking Card): 차량 번호판 이외의 감지-인식 순차 방식에서 표시 문자 문제에 제안된 네트워크의 적용 가능성을 검증하기 위한 데이터셋입이다. 핸디캡카드는 우리나라 장애인용 주차카드로 흰색과 노란색이 있으며 중앙부분에 0부터 9까지의 4자리 숫자가 있다. 카드 중앙에 있는 4자리 숫자를 인식하는 것이 문제이다. 실제 데이터 수집의 어려움으로 인해 20k 가상 이미지와 0.5k 수집된 실제 데이터를 사용하여 데이터 합성을 통해 학습했으며 수집된 실제 데이터 중 0.5k만 테스트용으로 사용하였다.KHPC (Korea Handicap Parking Card): This is a dataset to verify the applicability of the proposed network to the display character problem in sequential detection-recognition methods other than vehicle license plates. A handicap card is a parking card for people with disabilities in Korea. It comes in white and yellow and has a four-digit number from 0 to 9 in the center. The problem is recognizing the four-digit number in the center of the card. Due to the difficulty of collecting real data, it was trained through data synthesis using 20k virtual images and 0.5k collected real data, and only 0.5k of the collected real data was used for testing.

모든 실험은 Intel i-99900k CPU와 NVIDIA QUADRO 8000 GPU를 사용하여 수행되었다. 우리의 신경망은 Adam 옵티마이저 (Optimizer)를 사용하여 훈련되었으며 실행률은 초기에 0.001로 설정되었다가 지수 전략에 의해 감소되었다. 또한 데이터 증대 기법 (data augmentation technique)을 사용하여 과적합 (overfitting)을 방지한다. 그리고 데이터 증대 방법 (data augmentation technique)을 위해 transition, rotation (-20° ~ +20°), color jitter, blur를 적용한다(<표 3> 참조). All experiments were performed using Intel i-99900k CPU and NVIDIA QUADRO 8000 GPU. Our neural network was trained using the Adam optimizer and the run rate was initially set to 0.001 and then reduced by an exponential strategy. Additionally, data augmentation techniques are used to prevent overfitting. And for data augmentation technique, transition, rotation (-20° to +20°), color jitter, and blur are applied (see <Table 3>).

Sub-DatasetSub-Dataset QuantityQuantity DescriptionDescription CCPD-base1 (training)CCPD-base1 (training) 100k100k Images of cars in common scenesImages of cars in common scenes CCPD-base2 (testing) CCPD-base2 (testing) 100k100k Images of cars in common scenesImages of cars in common scenes CCPD-weather CCPD-weather 10k10k Images taken on a rainy day, snow day or fog dayImages taken on a rainy day, snow day or fog day CCPD-tilt CCPD-tilt 30k30k Images at horizontal tilt and vertical tiltImages at horizontal tilt and vertical tilt CCPD-rotateCCPD-rotate 10k10k Images at horizontal rotate Images at horizontal rotation CCPD-fnCCPD-fn 20k20k Images obtained from a relatively far or nearImages obtained from a relatively far or near CCPD-db CCPD-db 10k10k Images obtained in dark or extremely bright placesImages obtained in dark or extremely bright places CCPD-challengeCCPD-challenge 50k50k The most challenging imageThe most challenging image

- 평가 기준 (Evaluation Criterion)- Evaluation Criterion

검출 정확도 (DA, Detection Accuracy)를 측정하기 위해 IoU 를 사용하여 모델에서 예측한 실제 경계 박스와 경계 박스의 중첩 영역 값을 계산하였다. 겹치는 영역이 임계값보다 크거나 같으면 TD (True Detection)로 정의하고, 그렇지 않으면 FD (False Detection)로 정의한다. 검출 정확도의 성능은 λ = 0.7을 기준으로 평가되었다.To measure detection accuracy (DA), IoU was used to calculate the overlap area between the actual bounding box and the bounding box predicted by the model. If the overlapping area is greater than or equal to the threshold, it is defined as TD (True Detection), otherwise it is defined as FD (False Detection). The performance of detection accuracy was evaluated based on λ = 0.7.

. 식(16) . Equation (16)

인식 정확도 (RA, Recognition Accuracy)는 True Detection 패치에 대해서만 평가되었다. 정확도는 식 (17)과 같이 계산되었다. 일치하는 문자는 문자가 모델에 의해 올바르게 예측되었음을 의미한다.Recognition Accuracy (RA) was evaluated only for True Detection patches. Accuracy was calculated as equation (17). A matching character means that the character was correctly predicted by the model.

(%). 식(17) (%). Equation (17)

<결과><Result>

- CCPD에서의 결과 (Results on CCPD- Results on CCPD

CCPD 데이터에 대한 검출 결과 및 인식 결과는 각각 <표 4> 및 <표 5>에 나타내었다. 비교를 위해 전통적인 실험도 종래 방법을 사용하여 수행하였다. 기존의 에지 감지 알고리즘 (Edge detection Algorithm)은 차량 번호판의 위치를 감지하고 자른다. 히스토그램의 피크(peak)와 밸리(valley)를 사용하여 차량 번호판 문자를 분할하고 2개의 SVM 분류기를 학습하여 차량 번호판의 대상 지방 및 기타 문자를 인식한다. The detection results and recognition results for CCPD data are shown in <Table 4> and <Table 5>, respectively. For comparison, traditional experiments were also performed using conventional methods. The existing edge detection algorithm detects and cuts the location of the vehicle license plate. The peaks and valleys of the histogram are used to segment license plate characters, and two SVM classifiers are trained to recognize target regions and other characters in license plates.

MTCNN + LPRnet은 경량의 오픈 소스 ALPR 프레임워크이다. 차량 번호판 영역은 MTCNN을 사용하여 감지되었다. 문자 인식은 LPRnet을 사용하여 수행되었다. MTCNN + LPRnet is a lightweight, open source ALPR framework. Vehicle license plate areas were detected using MTCNN. Character recognition was performed using LPRnet.

WPOD + OCR 은 OCRnet을 통해 인식을 수행하는 매우 새로운 신경망이다. WPOD + OCR is a very new neural network that performs recognition through OCRnet.

RPnet 은 매우 우수한 종단 간(end-to-end) 탐지 및 인식 신경망이다. RPnet is a very good end-to-end detection and recognition neural network.

SLPnet 은 최근 제안된 suffle block 기반 LPDR 방법론이다. SLPnet is a recently proposed suffle block-based LPDR methodology.

플레이트 감지는 종단 간(end-to-end) 형식으로 앵커 프리 방식으로 수행된 다음 관심 영역을 잘라서 인식한다. Plate detection is performed anchor-free in an end-to-end format and then the region of interest is cropped and recognized.

위의 방법들은 CCPD 데이터 중 베이스(base)에 대한 성능을 제공한다. The above methods provide performance for the base of CCPD data.

본 발명에 따른 번호판 인식 시스템((1000), 필요에 따라 ‘네트워크’라 함)는 98.56%의 검출 정확도와 87.36%의 인식 정확도를 기록하였다. The license plate recognition system ((1000), optionally referred to as ‘network’) according to the present invention recorded a detection accuracy of 98.56% and a recognition accuracy of 87.36%.

본 발명에 따른 번호판 인식 시스템(1000)은 하나의 이미지에 대해 하나의 예측 경계 박스를 생성하여 소수의 거짓 양성(small number of false-positives)으로 고성능을 시연하였다. The license plate recognition system 1000 according to the present invention demonstrated high performance with a small number of false-positives by generating one prediction bounding box for one image.

이 실험 데이터에서 알 수 있는 바와 같이 충분한 데이터를 활용할 수 있다면 어텐션 기반 문자 추론을 사용하면 CTC 손실을 사용하는 기존의 문자 추론 방법과 달리 잡음이 있거나 왜곡된 문자에서도 고성능을 얻을 수 있었다. 학습 시 소량의 하위 데이터 세트를 추가하여 학습에 전이 학습을 사용하면 더 높은 성능이 가능할 것으로 기대된다.As can be seen from this experimental data, if sufficient data is available, attention-based character inference can be used to achieve high performance even with noisy or distorted characters, unlike existing character inference methods that use CTC loss. It is expected that higher performance will be possible if transfer learning is used for training by adding a small amount of sub-data sets during training.

또한 본 발명에 따른 번호판 인식 시스템(1000)은 종단 간 방식으로 시스템을 훈련하고 비디오 또는 응용 프로그램과 같은 더 다양한 시나리오에서 사용하도록 확장할 수 있을 것이다.Additionally, the license plate recognition system 1000 according to the present invention will be able to train the system in an end-to-end manner and expand it for use in more diverse scenarios such as video or applications.

MethodMethod Base (%)Base (%) weather (%)weather (%) tilt (%)tilt (%) rotate (%)rotate (%) fn (%)fn (%) db (%)db (%) challenge (%)challenge (%) Avg (%)Avg (%) Edge-basedEdge-based 91.6491.64 91.5391.53 90.2990.29 90.2990.29 90.5190.51 90.3890.38 89.6889.68 90.6290.62 MTCNN MTCNN 99.6999.69 97.1697.16 96.4796.47 95.1495.14 97.3397.33 96.3596.35 83.2783.27 95.0695.06 WPOD WPOD 99.299.2 98.298.2 96.396.3 94.694.6 94.394.3 95.195.1 93.493.4 95.8795.87 RPnet RPnet 99.399.3 83.683.6 93.293.2 94.794.7 85.385.3 89.589.5 92.892.8 91.291.2 OursOurs 99.9499.94 99.4999.49 99.299.2 99.2099.20 98.0298.02 99.2799.27 94.894.8 98.5698.56

MethodMethod Base (%)Base (%) weather (%)weather (%) tilt (%)tilt (%) rotate (%)rotate (%) fn (%)fn (%) db (%)db (%) challenge (%)challenge (%) Avg (%)Avg (%) Edge-based+SVMEdge-based+SVM 81.7081.70 81.4081.40 57.8357.83 53.7653.76 71.5371.53 62.0862.08 61.6161.61 67.1367.13 MTCNN
+LPRnetMTCNN
+LPRnet 90.3090.30 91.5591.55 79.9579.95 56.3156.31 90.1190.11 86.8986.89 60.6260.62 79.3979.39 WPOD +
OCRWPOD+
OCR 90.7690.76 90.8890.88 91.0691.06 92.2192.21 64.8864.88 82.8682.86 64.4064.40 82.4382.43 RPnet RPnet 92.3692.36 89.5389.53 87.8387.83 86.5186.51 65.1665.16 84.4384.43 62.2562.25 81.1581.15 SLPnetSLPnet 88.1488.14 88.5188.51 83.0783.07 84.0684.06 63.2263.22 75.1075.10 62.9762.97 77.8677.86 OursOurs 99.8399.83 97.4897.48 88.2688.26 94.1194.11 83.1383.13 73.3273.32 75.3875.38 87.3687.36

MethodMethod AOLP(%)AOLP (%) VBLPD (%)VBLPD (%) KHPC (%)KHPC (%) MTCNN + LPRnetMTCNN + LPRnet 91.3591.35 -- -- WPOD + OCRWPOD + OCR 94.2094.20 -- -- RPnet RPnet 91.8591.85 -- -- OursOurs 92.3092.30 88.0088.00 99.9999.99

이에, 본 발명에 따르면, 차량 번호판의 이미지가 눈이 오거나 흐린 날씨에 촬영된 경우, 채도나 명도 등이 흐릿한 경우, 비스듬하게 촬영된 경우를 포함하는 노이즈를 갖는 경우 이러한 노이즈에 강건한 어텐션 방식에 기반을 둔 번호판 레이아웃 타입에 독립적인 차량 번호판 인식 시스템을 제공할 수 있다.Accordingly, according to the present invention, when an image of a vehicle license plate has noise, including when it is photographed in snowy or cloudy weather, when saturation or brightness is blurred, or when it is photographed at an angle, it is based on an attention method that is robust to such noise. It is possible to provide a vehicle license plate recognition system that is independent of the license plate layout type.

여기서, 본 발명의 여러 실시예를 도시하여 설명하였지만, 본 발명이 속하는 기술 분야의 통상의 지식을 가진 당업자라면 본 발명의 원칙이나 정신에서 벗어나지 않으면서 본 실시예를 변형할 수 있음을 알 수 있을 것이다. 발명의 범위는 첨부된 청구항과 그 균등물에 의해 정해질 것이다.Here, several embodiments of the present invention have been illustrated and described, but those skilled in the art will recognize that modifications can be made to the present embodiments without departing from the principles or spirit of the present invention. will be. The scope of the invention will be determined by the appended claims and their equivalents.

1000 : 번호판 인식 시스템
1100 : 번호판 검출부 1130 : 검출 특징추출모듈
1150 : 검출 헤드모듈
1300 : 문자 인식부 1330 : 인식 특징추출모듈
1350 : 인식 헤드모듈1000: License plate recognition system
1100: License plate detection unit 1130: Detection feature extraction module
1150: Detection head module
1300: Character recognition unit 1330: Recognition feature extraction module
1350: Recognition head module

Claims

In a license plate recognition system including vehicle license plates through deep learning,
an input unit for inputting a photographed license plate image;
a license plate detection unit that detects the license plate position by learning the input license plate image to correct for distortion, including tilt and rotation, of the license plate using a rectangular bounding box and four corner vertex coordinates;
It includes a character recognition unit that divides the image detected by the license plate detection unit and recognizes characters inside the license plate through an attention-based algorithm,
The character recognition unit performs a deformable attention stage having a recognition feature extraction module including a residual block and a residual deformable block (ResDformable Block) of the character input from the recognition feature extraction module. It includes a recognition head module that includes,
The recognition feature extraction module arranges convolution operation, batch normalization (BN), maximum pooling, deformable attention step, and residual block as many as set factors, and performs various operations through the number and feature map consisting of numbers including a matrix. The factors of the neural network represented by are calculated with each other,
The recognition head module splits the information of the feature map input from the recognition feature extraction module and uses relation attention (Relation Attention), parallel attention ( Parallel Attention), a license plate recognition system characterized by including a character decoding process.

According to paragraph 1,
The license plate detection unit,
A detection feature extraction module containing a combination of activation functions including convolution of a deep learning neural network, max pooling, batch normalization (BN), and ReLU functions,,
A detection head module including a first branch for detecting the bounding box of a license plate using the center point, width, and width of a rectangle, and a second branch for detecting the license plate using vertex coordinates of the four corners. License plate recognition system.

delete

According to paragraph 2,
The first branch includes a process of generating a heat map of the center point of the image input from the detection feature extraction module, a process of setting the width and height of the rectangle, and a process of calculating an offset. License plate recognition system.

According to paragraph 2,
The second branch includes a process of generating the rectangular heat map from the image input to the detection feature extraction module, a process of sequentially detecting the coordinates of the four vertices in a set direction, and calculating an offset. A license plate recognition system comprising a process.

delete