CN107301414B

CN107301414B - Chinese positioning, segmenting and identifying method in natural scene image

Info

Publication number: CN107301414B
Application number: CN201710484646.5A
Authority: CN
Inventors: 陈凯; 韦建; 何建华; 周异; 黄征; 杜保发; 周文贵; 查宏远
Original assignee: Xiamen Shangji Network Technology Co ltd
Current assignee: Xiamen Shangji Network Technology Co ltd
Priority date: 2017-06-23
Filing date: 2017-06-23
Publication date: 2020-07-07
Anticipated expiration: 2037-06-23
Also published as: CN107301414A

Abstract

The invention provides a Chinese positioning, segmenting and identifying method in a natural scene image. The method comprises the steps of carrying out primary character positioning on an original picture through an FASText model, extracting a candidate character region, carrying out pre-segmentation on the candidate character region, then identifying a single character part of the pre-segmented character region, and carrying out further single character segmentation and identification on the field part. The method utilizes the accurate extraction of character stroke characteristics and the strong character recognition capability of a deep residual error neural network, combines a path tree method, simply and effectively realizes the purposes of Chinese positioning and recognition, and can be applied to various natural scenes without supervision and training.

Description

Chinese positioning, segmenting and identifying method in natural scene image

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a Chinese positioning, segmenting and identifying method in a natural scene image.

Background

Text recognition in natural scenes is a very important visual detection target, and texts in images have much useful information and are very important for understanding and acquiring visual contents. There are many related text recognition applications including road signs, license plates, tickets, etc.

Generally, the conventional OCR technology is influenced by a complex background of a natural scene, and it is difficult to correctly complete a related task. Overall, such tasks can be divided into two phases, localization and recognition of text. The positioning of the text is the precise positioning of the text position in the image, and mainly distinguishes fields from backgrounds according to the extraction of relevant character features, such as MSERs. Compared with the traditional text feature-based detection method, the method for realizing text positioning by training the deep neural network also appears at present. However, this method often requires a large amount of manual labeling data for training, and the trained model is difficult to be directly extended to more other application scenarios.

Disclosure of Invention

The invention aims to provide a simple and effective Chinese positioning, segmenting and identifying method in a natural scene image, which can be applied to more scenes in an expanded mode.

In order to achieve the above purpose, the invention adopts the technical scheme that: a Chinese positioning, segmenting and identifying method in natural scene images comprises the following steps:

1) performing primary character positioning on an original picture through an FASText model, and extracting candidate character areas;

2) pre-dividing a candidate character area;

3) and identifying the single character part of the character region after the pre-segmentation, and further segmenting and identifying the single character part of the field part.

Further, the candidate text region is extracted by getCharSegmentation function of fasttext.

Further, the specific process of pre-segmentation in the step 2) is as follows: and (3) calibrating the communication area of the candidate character area, removing some small communication areas (noise), taking the area which accords with the length-width ratio of the Chinese character as a single character to be directly cut out, and taking out the rest communication areas.

Further, the specific process of performing further single character segmentation on the field part after the pre-segmentation in the step 3) is as follows:

(1) training a deep residual error neural network to obtain a single word recognizer ResNet;

(2) directly identifying the single character result obtained after the pre-segmentation through a single character identifier;

(3) further performing single character segmentation on the field result obtained after pre-segmentation, acquiring the region range of candidate characters in the field picture by using FASText, collecting vertical lines in all the region ranges, and using the vertical lines as a candidate segmentation line set for single character segmentation

(ii) a Generating all candidate single character segmentation schemes by using a path tree method, wherein each path corresponds to a single character segmentation scheme;

(4) scheme for dividing any single word by using trained single word recognizer ResNet (such as

) Identifying, recording each individual character and corresponding identification confidence degree (

) Then taking the average value

；

(5) Selecting the single character segmentation scheme with the highest average confidence coefficient as the optimal single character segmentation scheme;

(6) and taking the single character recognition result corresponding to the optimal single character segmentation scheme as an optimal field recognition scheme, and outputting a corresponding field recognition result.

Further, the rectangular boxes corresponding to the candidate single characters on each path in the step (3) are not overlapped with each other and cover all character strokes detected by the FASText.

The invention utilizes a character stroke detector FASText based on character stroke characteristics to extract candidate single characters and field regions, and then provides a path tree method on the basis of a candidate single character rectangular frame to generate a candidate single character segmentation scheme. For each single character segmentation scheme, a single character recognizer ResNet trained by a deep residual error neural network is used for recognizing all single characters corresponding to the single character segmentation scheme and recording single character recognition confidence coefficients, a field recognition confidence coefficient corresponding to each single character segmentation scheme is calculated, and the scheme with the highest field recognition confidence coefficient is selected as a final single character segmentation and recognition scheme. The method utilizes the accurate extraction of character stroke characteristics and the strong character recognition capability of a deep residual error neural network, combines a path tree method, simply and effectively realizes the purposes of Chinese positioning and recognition, and can be applied to various natural scenes without supervision and training.

Compared with the prior art, the invention has the following beneficial effects: firstly, aiming at the obvious stroke structural characteristics in Chinese characters, an FASText model is adopted, and through the detection of the stroke parts of the characters, the initial positioning of character areas is realized, so that the influence of background factors is effectively eliminated. Second, there are both single word and field parts in the acquired candidate area. Aiming at the field part, the invention adopts FASText to further separate the single character of the detected candidate field region, and simultaneously utilizes a deep residual error neural network to identify the separated single character part, the method integrates the field separation and the single character identification, and the optimal scheme is found out on the premise of trying all candidate separation schemes, thereby having higher robustness and accuracy.

Drawings

FIG. 1 is a flow chart of the present invention.

FIG. 2 is a diagram illustrating the effect of locating and identifying the Chinese embodiment according to the present invention.

Detailed Description

As shown in fig. 1, the present embodiment provides a method for locating, segmenting and recognizing chinese characters in natural scene images, and the process can be divided into the following steps:

2) pre-dividing a candidate character area;

As shown in fig. 2, in which diagram (a) is an original picture; step 1, extracting candidate image regions by using a getCharSegmentation function of FASText, wherein the extracted image is shown as a picture (b); the pre-segmentation operation of the step 2 is specifically that the connected region extracted in the step 1 is determined, after some small connected regions (noise) are removed, the region which accords with the aspect ratio (close to 1: 1) of the Chinese character is regarded as a single character and is directly cut out, and then the remaining connected regions (as shown in a figure (c)) are taken out; in step 3, the word result obtained by pre-segmentation can be directly identified by a word identifier, such as "kou", "mao", "yi", "there", "limited", "official" and "department" in the graph (c), and the field result needs to be further divided into words, such as "shanghai real in and out" in the graph (c), where a region (region box) of a candidate character (Label Candidates) in the field picture needs to be obtained by FASText first (as shown in the graph (d)); then, the vertical lines in all the area ranges are collected and used as candidate segmentation line sets of single character segmentation

(as shown in fig. (e)); then, a path tree method is used for generating all candidate single-word segmentation schemes, and any single-word segmentation scheme (such as

) Respectively calculating the confidence of each single segmentation region on the word recognizer ResNet ()

) Then taking the average value

Selecting the final single-word segmentation scheme with the highest average confidence coefficient from all the segmentation schemes, wherein the field identifiers corresponding to the single-word segmentation schemeThe other result is used as the final field identification result.

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.

Claims

1. A Chinese positioning, segmenting and identifying method in natural scene images is characterized by comprising the following steps:

1) performing primary character positioning on an original picture through a FASText model, and extracting a candidate character region, wherein the candidate character region is extracted through a getCharSegmentation function of the FASText;

2) pre-dividing the candidate character area, wherein the specific process of the pre-dividing is as follows: calibrating a communication region of the candidate character region, removing some smaller communication regions, directly cutting out the region conforming to the length-width ratio of the Chinese character as a single character, and taking out the remaining communication region;

3) identifying the single character part of the character region after the pre-segmentation, and further segmenting and identifying the single character in the field part; the specific process of the step is as follows:

(3) the field part is further divided into single words, the FASText is used for acquiring the region range of the candidate characters in the field picture, the vertical lines in all the region ranges are collected and used as the candidate division line set for single word division; generating all candidate single character segmentation schemes by using a path tree method, wherein each path corresponds to a single character segmentation scheme;

(4) recognizing any single character segmentation scheme by using a trained single character recognizer ResNet, recording each recognized single character and a corresponding recognition confidence coefficient, and then averaging;

(6) and taking the single character recognition result corresponding to the optimal single character segmentation scheme as an optimal field recognition result, and outputting a corresponding field recognition result.

2. The method of claim 1, wherein the method comprises the steps of: and (4) the rectangular frames corresponding to the candidate single characters on each path in the step (3) are not overlapped and cover all character strokes detected by the FASText.