CN109919037A

CN109919037A - A kind of text positioning method and device, text recognition method and device

Info

Publication number: CN109919037A
Application number: CN201910105748.0A
Authority: CN
Inventors: 刘正珍; 黄威
Original assignee: Hanwang Technology Co Ltd
Current assignee: Hanwang Technology Co Ltd
Priority date: 2019-02-01
Filing date: 2019-02-01
Publication date: 2019-06-21
Anticipated expiration: 2039-02-01
Also published as: CN109919037B

Abstract

This application provides a kind of text recognition methods, belong to text recognition technique field, and accuracy rate is low during solving the problems, such as prior art text identification.The described method includes: obtaining line of text image to be identified；The line of text image to be identified is input to line of text identification model trained in advance, determine that the corresponding line of text recognition result of the line of text image to be identified, the line of text recognition result are used to indicate the line of text attribute of the line of text image to be identified corresponding position；Image-region corresponding with the line of text attribute in the line of text image to be identified is determined according to line of text recognition result.Text positioning method disclosed in the present application determines the distributed areas of the text of different line of text attributes in the line of text image to be identified of complicated arrangement by the line of text identification model of training, help to be identified for different text filed use text image corresponding with the line of text attribute of this article one's respective area identification engines, to promote the accuracy of text identification.

Description

A kind of text positioning method and device, text recognition method and device

Technical field

This application involves text recognition technique field more particularly to a kind of text positioning method and device, text identification sides Method and device.

Background technique

File and picture identification process be usually will style of writing this image or column text image be input in advance trained text This image recognition engine is to obtain corresponding text code.Column text is by being rotated by 90 ° to obtain style of writing originally, therefore, usually row Text and column text are referred to as style of writing originally.

Text image identification engine in the prior art is that the image of image or single-row text based on single file text carries out Trained, therefore, for the case where the single file text and multiline text of mixed distribution, text image is known in the text image of input Other engine is identified as single file text.

For example, most common is exactly the line of text image that texts are annotated by single-row body text and two column in ancient books document, And existing text image identification engine can identify the line of text of two column annotation texts as uniline body text, it is clear that this Single-row line of text and multiple row line of text be it is different, therefore, be thus easy to cause biserial annotation text line of text erroneous judgement It is lower to the recognition accuracy of the image of the column text so as to cause text image identification engine for single-row body text.

To sum up, at least there is recognition accuracy when the text image for carrying out complicated arrangement is identified in the prior art Low problem.

Summary of the invention

The embodiment of the present application provides a kind of text positioning method, helps to solve text recognition method in the prior art and exists The low problem of accuracy rate.

In a first aspect, the embodiment of the present application provides a kind of text positioning method, comprising:

Obtain line of text image to be identified；

The line of text image to be identified is input to line of text identification model trained in advance, determines the text to be identified The corresponding line of text recognition result of current row image, wherein the line of text recognition result is used to indicate the line of text to be identified The line of text attribute of image corresponding position；

Figure corresponding with the line of text attribute in the line of text image to be identified is determined according to line of text recognition result As region.

Optionally, described that the line of text image to be identified is input to line of text identification model trained in advance, it determines Before the step of line of text image to be identified corresponding line of text recognition result, comprising:

Obtain line of text identification model training sample, wherein the sample data of the training sample be preset height and The line of text image of predetermined width, the sample label of the training sample are used to indicate corresponding position in the line of text image Line of text attribute；

Using the sample data of the training sample as the input of the line of text identification model, with line of text identification The output and the minimum target of error of the sample label of the training sample of model, the training line of text identification model, In, the line of text identification model is constructed based on convolutional neural networks.

Optionally, the step of obtaining the training sample of line of text identification model, comprising:

The line of text image of several preset heights and predetermined width is obtained as sample data, constructs sample data sets；

For the corresponding line of text image of each sample data in the sample data sets, by according to default step Long mobile specified sliding window is scanned the line of text image along picture traverse direction, to mark the finger according to scanning result Determine the line of text attribute of line of text image described in the position sequentially passed through in sliding window moving process；

According to the text of line of text image described in the position sequentially passed through in the specified sliding window moving process of label Current row attribute determines the sample label of respective sample data；Wherein, the height of the specified sliding window is the line of text image First preset ratio of the preset height, the width of the specified sliding window are the predetermined width of the line of text image Second preset ratio.

Optionally, the step of acquisition line of text image to be identified, comprising:

By stretch along width and/or short transverse to line of text image to be identified or compression processing, will it is described to Identify that line of text Image Adjusting is the line of text image to be identified of the preset height and the predetermined width.

Optionally, the line of text recognition result includes line of text image to be identified corresponding position described in sequence identification The classification results of line of text attribute, it is described according to line of text recognition result determine in the line of text image to be identified with the text The step of current row attribute corresponding image-region, comprising:

The text diagram image position being sequentially distributed in the line of text image to be identified is determined according to the width of the specified sliding window It sets, the text image position of the sequence distribution is sequentially corresponding with the classification results of the line of text attribute；

According to the classification results of the line of text attribute, the identical text image position of adjacent and classification results is gathered It closes, determines the image-region in the line of text image to be identified corresponding from different line of text attributes.

Second aspect, the embodiment of the present application also provides a kind of text recognition methods, comprising: passes through aforementioned first aspect public affairs The text positioning method opened determines image-region corresponding from different line of text attributes in line of text image to be identified；

By the text image identification model with each line of text attributes match, respectively to corresponding line of text attribute pair Line of text image to be identified in the image-region answered is identified, determines the line of text image to be identified in respective image region Recognition result；

According to the position in described image region, the recognition result of the line of text image to be identified in each image-region is carried out Fusion, determines the corresponding text of the line of text image to be identified.

The third aspect, the embodiment of the present application also provides a kind of String localization devices, comprising:

Line of text image collection module to be identified, for obtaining line of text image to be identified；

Line of text recognition result determining module, for the line of text image to be identified to be input to text trained in advance Row identification model determines the corresponding line of text recognition result of the line of text image to be identified, wherein the line of text identification knot Fruit is used to indicate the line of text attribute of the line of text image to be identified corresponding position；

Image-region determining module, for according to line of text recognition result determine in the line of text image to be identified with institute State the corresponding image-region of line of text attribute.

Optionally, the line of text image to be identified is being input to line of text identification model trained in advance, is determining institute Before the step of stating line of text image to be identified corresponding line of text recognition result, further includes:

Sample collection module, for obtaining the training sample of line of text identification model, wherein the sample of the training sample Data are the line of text image of preset height and predetermined width, and the sample label of the training sample is used to indicate the line of text The line of text attribute of corresponding position in image；

Line of text identification model training module, for being identified using the sample data of the training sample as the line of text The input of model, with the output of the line of text identification model and the minimum mesh of error of the sample label of the training sample Mark, the training line of text identification model, wherein the line of text identification model is constructed based on convolutional neural networks.

Optionally, the sample collection module is further used for:

Optionally, when obtaining line of text image to be identified, the line of text image collection module to be identified is further used In:

By stretch along width and/or short transverse to line of text image to be identified or compression processing, will it is described to Identify that line of text Image Adjusting is the line of text image to be identified of the preset height and predetermined width.

Optionally, the line of text recognition result includes line of text image to be identified corresponding position described in sequence identification The classification results of line of text attribute, described image area determination module, are further used for:

Fourth aspect, the embodiment of the present application also provides a kind of text identification devices, comprising:

Line of text image-region determining module to be identified, for true by text positioning method described in aforementioned first aspect Image-region corresponding from different line of text attributes in fixed line of text image to be identified；

Subregion identification module, for passing through the text image identification model with each line of text attributes match, difference Line of text image to be identified in image-region corresponding with corresponding line of text attribute is identified, determines respective image region The recognition result of interior line of text image to be identified；

Recognition result Fusion Module, for the position according to described image region, to the text to be identified in each image-region The recognition result of current row image is merged, and determines the corresponding text of the line of text image to be identified.

5th aspect the embodiment of the present application also provides a kind of electronic equipment, including memory, processor and is stored in institute The computer program that can be run on memory and on a processor is stated, the processor realizes this when executing the computer program Apply for text positioning method and/or text recognition method as described in the examples.

6th aspect, the embodiment of the present application also provides a kind of computer readable storage mediums, are stored thereon with computer Program, the program realize text positioning method described in the embodiment of the present application and/or text recognition method when being executed by processor The step of.

In this way, text positioning method disclosed in the embodiment of the present application then, will by obtaining line of text image to be identified The line of text image to be identified is input to line of text identification model trained in advance, determines the line of text image pair to be identified The line of text recognition result answered, the line of text recognition result are used to indicate the line of text image to be identified corresponding position Line of text attribute；Finally, according to line of text recognition result determine in the line of text image to be identified with the line of text attribute Corresponding image-region helps to solve the problems, such as that text identification accuracy rate is low in the prior art.The embodiment of the present application discloses Text positioning method not identical text is determined in the line of text image to be identified of complicated arrangement by the line of text identification model of training The distributed areas of the text (such as single file text or multiline text) of current row attribute, facilitate for it is different it is text filed using with The corresponding text image identification engine of the line of text attribute of this article one's respective area identifies corresponding text filed image, to mention Rise the accuracy of text identification.

Detailed description of the invention

Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to required in the embodiment of the present application description Attached drawing to be used is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, For those of ordinary skill in the art, without any creative labor, it can also obtain according to these attached drawings Obtain other attached drawings.

Fig. 1 is the text positioning method flow chart of the embodiment of the present application one；

Fig. 2 is the text positioning method flow chart of the embodiment of the present application two；

Fig. 3 is the original image that training sample is acquired in the embodiment of the present application two；

Fig. 4 is the line of text image schematic diagram based on Image Acquisition in Fig. 3 in the embodiment of the present application two；

Fig. 5 is the sample data schematic diagram for obtain after stretch processing to the line of text image of acquisition；

Fig. 6 is the schematic diagram that the sample data based on Fig. 5 carries out sample mark；

Fig. 7 is the schematic diagram for carrying out sliding window scanning in the embodiment of the present application two to the sample data of Fig. 6；

Fig. 8 is the line of text identification model schematic network structure used in the embodiment of the present application two；

Fig. 9 is that the recognition result of the text image to be identified in the embodiment of the present application two and input picture corresponding relationship are illustrated Figure；

Figure 10 is the final positioning result schematic diagram of text image to be identified in the embodiment of the present application two；

Figure 11 is the text recognition method flow chart of the embodiment of the present application three；

Figure 12 is one of String localization apparatus structure schematic diagram of the embodiment of the present application four；

Figure 13 is the two of the String localization apparatus structure schematic diagram of the embodiment of the present application four；

Figure 14 is the text identification apparatus structure schematic diagram of the embodiment of the present application five.

Specific embodiment

Below in conjunction with the attached drawing in the embodiment of the present application, technical solutions in the embodiments of the present application carries out clear, complete Site preparation description, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Based on this Shen Please in embodiment, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall in the protection scope of this application.

Different line of text attributes in the embodiment of the present application can for single file text or with duplicate rows text, or Different literals font or different literals type etc..Understand this programme for the ease of reader, in the embodiment of the present application with difference Line of text attribute is single file text or illustrates the specific embodiment of text positioning method with duplicate rows text.

Embodiment one:

A kind of text positioning method is present embodiments provided, as shown in Figure 1, which comprises step 10 to step 12.

Step 10, line of text image to be identified is obtained.

Line of text image to be identified described in the embodiment of the present application is the image of pre-set dimension.The line of text to be identified It can only include the line of text image of single file text in image, or only include the line of text image of multiline text, may be used also Think the line of text image for not only including single file text but also including the mixing arrangement of multiline text.

When it is implemented, promoting the efficiency of String localization to reduce operand, it is preferable that the text to be identified of acquisition Row image is gray level image.

Step 11, the line of text image to be identified is input to in advance trained line of text identification model, determine described in The corresponding line of text recognition result of line of text image to be identified.

Wherein, the line of text recognition result is used to indicate the line of text of the line of text image to be identified corresponding position Attribute.

The application is when it is implemented, before identifying the line of text image to be identified, it is necessary first to training text Current row identification model.The line of text identification model is based on convolutional neural networks training, by carrying out to input line of text image Multiple convolution operation simultaneously carries out feature extraction and mapping, and line of text image to be identified described in final output is along picture traverse direction The line of text attributive classification result for each image-region that (as from left to right or from right to left) successively divides.Wherein, Mei Getu As region size and the training line of text identification model when, adopted when to the line of text image pattern of acquisition, sample label is set The size of sliding window is identical；It include: single file text and multiline text citing, the line of text identification with the line of text attribute As a result the line of text image to be identified for being input to the line of text identification model is used to indicate along picture traverse direction successively with institute Sliding window size is stated to be divided to obtain each image-region to be single file text or multiline text.

Step 12, according to line of text recognition result determine in the line of text image to be identified with the line of text attribute pair The image-region answered.

After the recognition result that line of text image to be identified has been determined, next further recognition result is gathered It closes.Since the line of text for including in line of text image to be identified may arrange for single file text row or multiline text mixing, and And the position and length of single file text or multiline text are not fixed, therefore, it is necessary to the line of text obtained to abovementioned steps identifications to tie Adjacent single file text region and adjacent multiline text region are polymerize in fruit, to determine in the line of text image to be identified The image-region of single file text and more line of text images.

When it is implemented, above-mentioned line of text recognition result can be expressed as character array, each element of array is respectively referred to Show the line of text attribute for the image-region being arranged successively in the line of text image to be identified.Wherein, the line of text figure to be identified The image-region being arranged successively as in is by waiting knowing to this with the width equal with the width for the sliding window mentioned in abovementioned steps Other line of text image sequentially divides obtained multiple images region along the progress of picture traverse direction.

Text positioning method disclosed in the embodiment of the present application, by obtaining line of text image to be identified, then, will it is described to Identification line of text image is input to line of text identification model trained in advance, determines the corresponding text of the line of text image to be identified Current row recognition result, the line of text recognition result are used to indicate the line of text of the line of text image to be identified corresponding position Attribute；Finally, being determined according to line of text recognition result corresponding with the line of text attribute in the line of text image to be identified Image-region, it is accurate because multiline text is regarded text identification caused by single file text identifies in the prior art to help to solve The low problem of rate.Text positioning method disclosed in the embodiment of the present application determines complicated row by the line of text identification model of training The distributed areas of the text (such as single file text or multiline text) of different line of text attributes, have in the line of text image to be identified of cloth Help for different text filed use text image corresponding with the line of text attribute of this article one's respective area identification engines to phase Text filed image is answered to be identified, to promote the accuracy of text identification.

Embodiment two:

A kind of text positioning method is present embodiments provided, as shown in Figure 2, which comprises step 20 to step 23.

Step 20, training text row identification model.

In some embodiments of the present application, line of text image to be identified is input to line of text trained in advance and identifies mould Type, before the step of determining the line of text image to be identified corresponding line of text recognition result, further includes: the identification of training text row Model.When it is implemented, training text row identification model further comprises: the training sample of line of text identification model is obtained, In, the sample data of training sample is the line of text image of preset height and predetermined width, and the sample label of the training sample is used In the line of text attribute for indicating corresponding position in above-mentioned line of text image；Using the sample data of the training sample as line of text The input of identification model, with the output and the minimum target of error of the sample label of the training sample of line of text identification model, Training this article current row identification model, wherein this article current row identification model is constructed based on convolutional neural networks.

When it is implemented, firstly the need of training sample is collected.

In some embodiments of the present application, obtain line of text identification model training sample the step of, comprising: if obtain The line of text image of dry preset height and predetermined width constructs sample data sets as sample data；For sample data set The corresponding line of text image of each sample data in conjunction, by specifying sliding window to this article current row figure according to preset step-length is mobile As being scanned along picture traverse direction, to mark everybody sequentially passed through in the specified sliding window moving process according to scanning result Set the line of text attribute of place's this article current row image；According to the position sequentially passed through in the specified sliding window moving process of label The line of text attribute of this article current row image, determines the sample label of respective sample data；Wherein, the height of the specified sliding window is to be somebody's turn to do First preset ratio of the aforementioned preset height of line of text image, the width of the specified sliding window are the aforementioned pre- of this article current row image If the second preset ratio of width.Further, the first preset ratio and the second preset ratio are according to line of text figure to be identified The length and text distribution height of picture determine；Preset step-length is generally equal to the width of sliding window.In the present embodiment, for the ease of retouching State, by from left to right line of text image is scanned along picture traverse direction or convolution algorithm for, illustrate String localization side The specific technical solution of method.

In some embodiments of the present application, can choose the image of ancient books, document as original image, then to it is original into The processing of row gray processing, and every a line or the corresponding image of each column content are partitioned into as line of text image.When with as shown in Figure 3 Local chronicle image as original image, it is available every by being pre-processed to original image when acquiring training sample The image of one column text, such as the image of the column text in rectangular area 310.Then, by 90 degree of the image rotation of each column text, A width line of text image is obtained, as shown in Figure 4.

When it is implemented, the text image for training text row identification model can have unified size, it is therefore, right The line of text image acquired in through different original images carries out stretching or compression processing, to normalize to unified ruler It is very little.When such as carrying out stretch processing to line of text image, the height of the line of text image stretched will be needed to be adjusted to preset height (such as 64), width adjustment is to predetermined width (such as 1280).Stretched processing will obtain line of text image as shown in Figure 5.

Pass through the available several line of text images of the above method, sample of every width line of text image as a training sample Notebook data constructs sample data sets.The application when it is implemented, aforementioned preset height according to the row of text image to be identified High, ancient books file col width, the high data of row determine；Aforementioned predetermined width is true according to data such as long, the row length of column of ancient books file It is fixed.

Further, sample label is arranged to each sample data in sample data sets.

Firstly, to each sample data (i.e. each width line of text image) in sample data sets, manually mark is different The coordinate of the image-region of line of text attribute and corresponding line of text attribute.For example, marking each duplicate rows in every width line of text image Text filed top left co-ordinate and bottom right angular coordinate and the text filed quantity of duplicate rows.With line of text image shown in fig. 5 Citing, annotation results schematic diagram are as shown in Figure 6.Wherein, bound [] indicates the text filed top left co-ordinate and bottom right of duplicate rows Angular coordinate；DNum indicates the quantity of duplicate rows text in this article current row image.

When it is implemented, passing through cunning to each sample data (i.e. each width line of text image) in sample data sets The text pixel of window scan text row image corresponding position is distributed the sample label for determining the sample data.For example, with sample Height of the height of line of text image as specified sliding window in data acquisition system, alternatively, with line of text image in sample data sets Height height of the first preset ratio (such as 4/5 to 6/5) as specified sliding window；With line of text image in sample data sets Width width of the second preset ratio (such as 1/40) as specified sliding window, determine specified sliding window.

Then, to each sample data in sample data sets, i.e., each width line of text image, from this article current row figure The left-most position of picture rises, and with the mobile specified sliding window of preset step-length, each position of the specified sliding window corresponds to this article current row One position of image, the corresponding image-region in the position of line of text image, the width of each image-region are specified with this The width of sliding window is equal.It is as shown in Figure 7 by several image-regions for specifying sliding window to be scanned determining line of text image.This In embodiment, aforementioned preset step-length is the width of 1 specified sliding window, when it is implemented, can be according to the accurate of line of text positioning Property and the comprehensive step-length for determining that specified sliding window is mobile of operation efficiency.

Further, for determining each image-region, further according to the abscissa of the image-region and to this article The text filed upper left corner abscissa of the duplicate rows of current row image labeling and lower right corner abscissa determine whether the image-region is double Style of writing one's respective area.It, can will be in the corresponding sample label of the image-region if the image-region is that duplicate rows is text filed Data are to be set to indicate that the line of text attribute tags of duplicate rows text, such as 0, i.e. the scanning result of the image-region is 0；If should The non-duplicate rows of image-region is text filed, then can be to be set to indicate that list by the data in the corresponding sample label of the image-region The line of text attribute tags of style of writing originally, such as 1, the i.e. scanning result of the image-region are 1.

It, can be with when judging a certain image-region is text filed duplicate rows or single file text region in specific implementation process According to the text filed area accounting judgement of duplicate rows in the image-region.For example, it is text filed to work as duplicate rows in a certain image-region Area be greater than the image-region area 1/2 when, it is determined that the image-region be duplicate rows it is text filed, otherwise, it determines the figure As region is single file text region.

Later, each of this article current row image successively determining in above-mentioned specified sliding window from left to right moving process will be passed through The array of 40 dimensions of the line of text attribute tags composition of image-region, as the sample label of this article current row image, i.e., corresponding sample The sample label of notebook data.The sample label of line of text image shown in Fig. 7 can identify are as follows:

Label=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,0, 0,0,0,0,0,0,0,0,0,0]。

When it is implemented, accuracy and the comprehensive determination of operation efficiency that aforementioned second preset ratio is positioned according to line of text.

The application is when it is implemented, also need to construct line of text identification model.

In the embodiment of the present application, line of text identification model is constructed based on convolutional neural networks.When it is implemented, can construct The line of text identification model of network structure as shown in Figure 8.This article current row identification model from input side to outlet side successively are as follows: by Tie up convolutional layers, the maximum pond layer that convolution kernel size is 2 × 2, by rolling up in the 2 of 16 convolution kernels composition that convolution kernel size is 7 × 7 Tie up convolutional layers, the maximum pond layer that convolution kernel size is 2 × 2, by convolution in the 2 of 32 convolution kernels composition that product core size is 5 × 5 Tie up convolutional layers, the maximum pond layer that convolution kernel size is 2 × 2, by convolution kernel in the 2 of 64 convolution kernels composition that core size is 3 × 3 It is maximum pond layer that 2 dimension convolutional layers that 64 convolution kernels that size is 3 × 3 are constituted, convolution kernel size are 2 × 2, big by convolution kernel It is small be 2 × 2 128 convolution kernels constitute 2 dimension convolutional layers, convolution kernel size be 2 × 2 maximum pond layer, batch standardize layer, 2 dimension convolutional layers, a vector transposition process layer, the vector being made of 1 convolution kernel that convolution kernel size is 1 × 1 flatten layer.Its In, the output of the last one convolutional layer indicates the classification results of each image-region in the line of text image of input.

When it is implemented, opencv (cross-platform computer vision library based on BSD license (open source) distribution) can be passed through Or other existing kits read each sample data (i.e. line of text image), obtain 64 × 1280 × 1 gray level image, then The gray level image of reading is input to aforementioned texts row identification model.Size of the gray level image after first time convolution is 64 × 1280 × 16, it is then passed through primary maximum pondization operation, the size by maximum Chi Huahou for the first time is 32 × 640 × 16. Successively convolution sum pondization processing by above layers will obtain size as the vector of 2 × M × 1, such as [[0.2,0.6, 0.3,…,0.9],[0.8,,0.4,0.7,…,0.1]].Later, the first peacekeeping second dimension of the vector of 2 × M × 1 is turned Processing is set, keeps the vector corresponding in width direction with the line of text image of input, the vector that size is M × 2 × 1 will be obtained.Most Afterwards, pass through, TimeDistrition (Flatten) indicates that the data of the last bidimensional to transposition treated vector flatten Processing, obtains the output of model, such as [[0.2,0.8], [0.6,0.4], [0.3,0.7], [0.9,0.1] ...], wherein M is equal to 40, for the quantity for the image-region that each sample data is designated to determine when sliding window scanning.

Later, based on the training sample training text row identification model in above-mentioned sample set.The training text that training obtains Current row identification model can the line of text image to a certain length handle, and export and each image-region of input picture Corresponding line of text attributive character.

The process of the deconvolution parameter of each layer in the really continuous solution of the training process of model, Optimized model, by reversed Transmission method, with the output of line of text identification model and the minimum mesh of error of the sample label of the line of text image accordingly inputted Mark solves optimized parameter, is finally completed the training of this article current row identification model.The specific training process of model is referring to existing skill Art repeats no more in the present embodiment.

The application prevents mould when it is implemented, equilibrium treatment can be done to the sample data in sample data sets first Type training is inclined.Meanwhile the sample in sample data sets is carried out to upset the extensive effect handled to be got well at random, with ratio 0.8 for total sample is used as training set, is left to be used as test set, to verify the extensive energy for the line of text identification model that training obtains Power.

Step 21, line of text image to be identified is obtained.

Line of text image to be identified described in the embodiment of the present application is the image of pre-set dimension.The line of text to be identified It can only include the line of text image of single file text in image, or only include the line of text image of multiline text, may be used also Think and as shown in Figure 3 not only included single file text and included the line of text image arranged of mixing of multiline text.

In some embodiments of the present application, the step of acquisition line of text image to be identified, comprising: known by treating Other line of text image carries out stretching or compression processing along width and/or short transverse, is by the line of text Image Adjusting to be identified The line of text image to be identified of the preset height and predetermined width.Because training sample is pre- during model training If the line of text image of height and predetermined width, therefore, during being identified, if the width of line of text image to be identified Degree is not equal to aforementioned predetermined width, then needs in the width direction to carry out the line of text image to be identified at stretching or compression Reason, by the width adjustment of the line of text image to be identified to aforementioned predetermined width.Further, if line of text image to be identified Height be not equal to aforementioned preset height, it is also necessary to the line of text image to be identified is stretched or is compressed along short transverse Processing, is adjusted to aforementioned preset height for the height of the line of text image to be identified, such as by the size of line of text image to be identified It is adjusted to 64 × 1280.

Step 22, the line of text image to be identified is input to in advance trained line of text identification model, determine described in The corresponding line of text recognition result of line of text image to be identified.

The application is when it is implemented, before identifying the line of text image to be identified, it is necessary first to training text Current row identification model.The line of text identification model is based on convolutional neural networks training, by carrying out to input line of text image Multiple convolution operation simultaneously carries out feature extraction and mapping, and line of text image to be identified described in final output successively divides from left to right Each image-region line of text attributive classification result.Wherein, the size of each image-region and the training line of text are known When other model, the size of the sliding window used when sample label is arranged to the line of text image pattern of acquisition is identical；With the text Row attribute includes: single file text and multiline text citing, and the line of text recognition result, which is used to indicate, is input to the line of text The line of text image to be identified of identification model successively divide with the sliding window size (as from left to right) in the width direction It is single file text or multiline text to each image-region.

Step 23, according to line of text recognition result determine in the line of text image to be identified with the line of text attribute pair The image-region answered.

After the recognition result that line of text image to be identified has been determined, next further recognition result is gathered It closes.In some embodiments of the present application, line of text recognition result includes that line of text image to be identified described in sequence identification is corresponding The classification results of line of text attribute at position, it is described to be determined in the line of text image to be identified according to line of text recognition result The step of image-region corresponding with the line of text attribute, comprising: determined according to the width of the specified sliding window described wait know The text image position being sequentially distributed in other line of text image, the text image position of sequence distribution sequentially with the text The classification results of row attribute are corresponding；According to the classification results of the line of text attribute, text identical to adjacent and classification results Picture position is polymerize, and determines the image-region in the line of text image to be identified corresponding from different line of text attributes.

It take sample label when training text row identification model as the array of 40 elements, i.e., each sample data is designated to slide The image-region determined when window scans is 40 citings, and the line of text recognition result of this article current row identification model will be 40 × 2 Two-dimensional array, such as it is expressed as A [40,2], wherein the value of each elements A in array is used to indicate line of text image to be identified The image-region of middle corresponding position is identified as the probability of single file text or duplicate rows text.For example, array element A [0,0] can be used The image-region of first specified sliding window position is identified as the probability of single file text in expression line of text image to be identified；Array It is double that elements A [0,1] can be used to indicate that the image-region of first specified sliding window position in line of text image to be identified is identified as The probability of style of writing originally.When the probability that the image-region of a certain position in line of text image to be identified is identified as single file text is greater than in advance If when threshold value, it is determined that the line of text attribute of the image-region is single file text.For example, work as A [0,0] > A [0,1] when, indicate to The image-region of first specified sliding window position is identified as single file text in identification line of text image, such as by its line of text attribute Labeled as 0；Conversely, determining that the line of text attribute of the image-region is duplicate rows text, such as its line of text attribute is labeled as 1. Wherein, the width of each image-region corresponds to the width of aforementioned specified sliding window.

The each image-region being sequentially distributed from left to right in line of text image to be identified can be determined according to preceding method Line of text attribute label, such as:

0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0.Wherein, a figure in the corresponding line of text image to be identified of each line of text attribute label of sequential As the classification results in region.The corresponding relationship of line of text attribute label and an image-region in line of text image to be identified is such as Shown in Fig. 9.

Further, identical image-region is marked to polymerize adjacent and line of text attribute, to adjacent and classification As a result identical text image position is polymerize, and is determined in line of text image to be identified corresponding from different line of text attributes Image-region.It is if 13, the left side line of text attribute label in Fig. 9 is 0, then this 13 line of text attribute labels are corresponding Image district in line of text image to be identified is polymerize, by the line of text attribute of the image-region obtained after polymerization be determined as with The corresponding line of text attribute of line of text attribute label 0, i.e. single file text.For another example the left side in Fig. 9 plays the 14th to the 18th Line of text attribute label be 1, then by this 5 line of text attributes mark the image district in corresponding line of text image to be identified into Row polymerization, is determined as line of text corresponding with line of text attribute label 1 for the line of text attribute of the image-region obtained after polymerization Attribute, i.e. multiline text.In this way, the image district of the different line of text attributes in line of text image to be identified can be determined Domain.As shown in Figure 10, it identifies and polymerize by aforementioned texts row, finally determine 3 figures for including in line of text image to be identified As region 1010,1011 and 1012, wherein image-region 1010 and 1012 is the image-region of corresponding single file text, image district Domain 1011 is the image-region of corresponding single file text.

Text positioning method disclosed in the embodiment of the present application, by being identified based on the preparatory training text row of convolutional neural networks The line of text image to be identified is input to line of text trained in advance after getting line of text image to be identified by model Identification model determines that the corresponding line of text recognition result of the line of text image to be identified, this article current row recognition result are used to indicate The line of text attribute of the line of text image to be identified corresponding position；Finally, determining that this is to be identified according to line of text recognition result Image-region corresponding with above-mentioned line of text attribute in line of text image helps to solve text identification accuracy rate in the prior art Low problem.Text positioning method disclosed in the embodiment of the present application determines complicated arrangement by the line of text identification model of training Line of text image to be identified in different line of text attributes text (such as single file text or multiline text) distributed areas, help Identify engine to corresponding in the text filed use text image corresponding to the line of text attribute of this article one's respective area for being directed to different Text filed image is identified, to promote the accuracy of text identification.

Embodiment three:

Correspondingly, as shown in figure 11, the embodiment of the present application also discloses a kind of text recognition method, including step 1101 to Step 1103.

Step 1101, image-region corresponding from different line of text attributes in line of text image to be identified is determined.

When it is implemented, passing through String localization described in embodiment one or embodiment two for line of text image to be identified Method determines image-region corresponding from different line of text attributes in the line of text image to be identified, such as corresponding with single file text Image-region, image-region corresponding with multiline text.

Step 1102, by the text image identification model with each line of text attributes match, respectively to corresponding line of text Line of text image to be identified in the corresponding image-region of attribute is identified, determines the text to be identified in respective image region The recognition result of row image.

Next, being carried out respectively by single file text image recognition model pair each image-region corresponding with single file text Identification, obtains corresponding uniline recognition result；Pass through multiline text image recognition model pair each image corresponding with multiline text Region is identified respectively, obtains corresponding multirow recognition result.

Step 1103, according to the position of above-mentioned each image-region, to the line of text image to be identified in each image-region Recognition result is merged, and determines the corresponding text of the line of text image to be identified.

Finally, to obtained uniline recognition result and multirow recognition result, according to corresponding image-region in text to be identified Position in current row image is spliced, and the recognition result in the line of text image to be identified is obtained.

When it is implemented, the difference line of text attribute can for single file text or with duplicate rows text, or it is different Character script or different literals type.

Text recognition method disclosed in the embodiment of the present application, by determination line of text image to be identified from different line of text The corresponding image-region of attribute, then, by the text image identification model with each line of text attributes match, respectively to Accordingly the line of text image to be identified in the corresponding image-region of line of text attribute is identified, is determined in respective image region The recognition result of line of text image to be identified, according to the position in described image region, to the text to be identified in each image-region The recognition result of row image is merged, and is determined the corresponding text of the line of text image to be identified, is helped to promote complicated row The identification accuracy of the text image of cloth.

Example IV:

Correspondingly, the embodiment of the present application also discloses a kind of String localization device, as shown in figure 12, described device includes:

Line of text image collection module 121 to be identified, for obtaining line of text image to be identified；

Line of text recognition result determining module 122, for above-mentioned line of text image to be identified to be input to training in advance Line of text identification model determines the corresponding line of text recognition result of above-mentioned line of text image to be identified, wherein above-mentioned line of text is known Other result is used to indicate the line of text attribute of above-mentioned line of text image to be identified corresponding position；

Image-region determining module 123, for being determined in above-mentioned line of text image to be identified according to line of text recognition result Image-region corresponding with each line of text attribute.

Optionally, above-mentioned line of text image to be identified is being input to in advance trained line of text identification model, in determination Before stating the corresponding line of text recognition result of line of text image to be identified, as shown in figure 13, above-mentioned String localization device further include:

Sample collection module 124, for obtaining the training sample of line of text identification model, wherein above-mentioned training sample Sample data is the line of text image of preset height and predetermined width, and the sample label of training sample is used to indicate line of text image The line of text attribute of middle corresponding position；

Line of text identification model training module 125, for using the sample data of above-mentioned training sample as above-mentioned line of text The input of identification model, it is minimum with the error of the sample label of above-mentioned training sample with the output of above-mentioned line of text identification model Target, the training line of text identification model, wherein above-mentioned line of text identification model is constructed based on convolutional neural networks.

Optionally, the sample collection module 124 is further used for:

For the corresponding line of text image of each sample data in the sample data sets, by according to default step Long mobile specified sliding window is scanned above-mentioned line of text image along picture traverse direction, to mark above-mentioned finger according to scanning result Determine the line of text attribute of the above-mentioned line of text image in the position sequentially passed through in sliding window moving process；

According to the text of the above-mentioned line of text image in position sequentially passed through in the above-mentioned specified sliding window moving process of label Current row attribute determines the sample label of respective sample data；Wherein, the height of above-mentioned specified sliding window is above-mentioned line of text image First preset ratio of the preset height, the width of above-mentioned specified sliding window are the above-mentioned predetermined width of above-mentioned line of text image Second preset ratio.

Optionally, when obtaining line of text image to be identified, above-mentioned line of text image collection module 121 to be identified is further For:

By carrying out stretching or compression processing along width and/or short transverse to line of text image to be identified, this is waited knowing Other line of text Image Adjusting is the line of text image to be identified of above-mentioned preset height and predetermined width.

Optionally, the line of text recognition result includes the text of the sequence identification line of text image to be identified corresponding position The classification results of current row attribute, above-mentioned image-region determining module 123, are further used for:

The text diagram image position being sequentially distributed in above-mentioned line of text image to be identified is determined according to the width of the specified sliding window It sets, the text image position of said sequence distribution is sequentially corresponding with the classification results of above-mentioned line of text attribute；

According to the classification results of the line of text attribute, the identical text image position of adjacent and classification results is gathered It closes, determines the image-region in the line of text image to be identified corresponding from different line of text attributes.The embodiment of the present application is public Then the line of text image to be identified is input to pre- by the String localization device opened by obtaining line of text image to be identified First trained line of text identification model, determines the corresponding line of text recognition result of the line of text image to be identified, the text Row recognition result is used to indicate the line of text attribute of the line of text image to be identified corresponding position；Finally, according to line of text Recognition result determines image-region corresponding with the line of text attribute in the line of text image to be identified, helps to solve existing There is the problem that text recognition accuracy is low in technology.The text that String localization device disclosed in the embodiment of the present application passes through training Row identification model determines text (such as single file text or more of different line of text attributes in the line of text image to be identified of complicated arrangement Style of writing this) distributed areas, facilitate for different text filed using corresponding with the line of text attribute of this article one's respective area Text image identifies that engine identifies corresponding text filed image, to promote the accuracy of text identification.

Embodiment five:

Correspondingly, the embodiment of the present application also discloses a kind of text identification device, as shown in figure 14, described device includes:

Line of text image-region determining module 141 to be identified, for passing through text described in embodiment one and embodiment two Localization method determines image-region corresponding from different line of text attributes in line of text image to be identified；

Subregion identification module 142, for dividing by aforementioned and each line of text attributes match text image identification model The other line of text image to be identified in image-region corresponding with corresponding line of text attribute identifies, determines respective image area The recognition result of line of text image to be identified in domain；

Recognition result Fusion Module 143, for the position according to described image region, to be identified in each image-region The recognition result of line of text image is merged, and determines the corresponding text of line of text image to be identified.

Text identification device disclosed in the embodiment of the present application, by determination line of text image to be identified from different line of text The corresponding image-region of attribute, then, by the text image identification model with each line of text attributes match, respectively to Accordingly the line of text image to be identified in the corresponding image-region of line of text attribute is identified, is determined in respective image region The recognition result of line of text image to be identified, according to the position in described image region, to the text to be identified in each image-region The recognition result of row image is merged, and is determined the corresponding text of the line of text image to be identified, is helped to promote complicated row The identification accuracy of the text image of cloth.

Correspondingly, the embodiment of the present application also discloses a kind of electronic equipment, the electronic equipment, including memory, processing Device and it is stored in the computer program that can be run on the memory and on a processor, the processor executes the computer Text positioning method described in the embodiment of the present application one and embodiment two is realized when program, and/or, realize the embodiment of the present application three The text recognition method.The electronic equipment can be mobile phone, PAD, tablet computer, human face recognition machine etc..

Correspondingly, being stored thereon with computer journey the embodiment of the present application also provides a kind of computer readable storage medium The step of sequence, which realizes text positioning method described in the embodiment of the present application one and embodiment two when being executed by processor, And/or the step of realizing text recognition method described in the embodiment of the present application three.

The Installation practice of the application is corresponding with method, the specific implementation side of each module and each unit in Installation practice Formula is embodiment referring to method, and details are not described herein again.

Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed Scope of the present application.

One with ordinary skill in the art would appreciate that in embodiment provided herein, it is described to be used as separation unit The unit of explanation may or may not be physically separated, it can and it is in one place, or can also be distributed Onto multiple network units.In addition, each functional unit in each embodiment of the application can integrate in a processing unit In, it is also possible to each unit and physically exists alone, can also be integrated in one unit with two or more units.

It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, the technical solution of the application can be produced with software The form of product embodies, which is stored in a storage medium, including some instructions are used so that one Platform computer equipment (can be personal computer, server or the network equipment etc.) executes described in each embodiment of the application The all or part of the steps of method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, ROM, RAM, magnetic or disk etc. The various media that can store program code.

The above, the only specific embodiment of the application, but the protection scope of the application is not limited thereto, ability Domain those of ordinary skill is it is to be appreciated that unit described in conjunction with the examples disclosed in the embodiments of the present disclosure and algorithm steps Suddenly, it can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions actually with hardware still Software mode executes, specific application and design constraint depending on technical solution.Professional technician can be to each Specific application is to use different methods to achieve the described function, but this realization is it is not considered that exceed the model of the application It encloses.

Claims

1. a kind of text positioning method characterized by comprising

Obtain line of text image to be identified；

The line of text image to be identified is input to line of text identification model trained in advance, determines the line of text to be identified The corresponding line of text recognition result of image, wherein the line of text recognition result is used to indicate the line of text image to be identified The line of text attribute of corresponding position；

Image district corresponding with the line of text attribute in the line of text image to be identified is determined according to line of text recognition result Domain.

2. the method according to claim 1, wherein described be input to the line of text image to be identified in advance Trained line of text identification model, before the step of determining the line of text image to be identified corresponding line of text recognition result, Include:

Obtain the training sample of line of text identification model, wherein the sample data of the training sample is preset height and presets The line of text image of width, the sample label of the training sample are used to indicate the text of corresponding position in the line of text image Current row attribute；

Using the sample data of the training sample as the input of the line of text identification model, with the line of text identification model Output and the minimum target of error of the sample label of the training sample, the training line of text identification model, wherein institute Stating line of text identification model is constructed based on convolutional neural networks.

3. according to the method described in claim 2, it is characterized in that, obtain line of text identification model training sample the step of, Include:

For the corresponding line of text image of each sample data in the sample data sets, by being moved according to preset step-length Dynamic specified sliding window is scanned the line of text image along picture traverse direction, to mark the specified cunning according to scanning result The line of text attribute of line of text image described in the position sequentially passed through in window moving process；

According to the line of text of line of text image described in the position sequentially passed through in the specified sliding window moving process of label Attribute determines the sample label of respective sample data；Wherein, the height of the specified sliding window is the described of the line of text image First preset ratio of preset height, the width of the specified sliding window are the second of the predetermined width of the line of text image Preset ratio.

4. according to the method described in claim 2, it is characterized in that, the step of acquisition line of text image to be identified, comprising:

It, will be described to be identified by carrying out stretching or compression processing along width and/or short transverse to line of text image to be identified Line of text Image Adjusting is the line of text image to be identified of the preset height and the predetermined width.

5. according to the method described in claim 3, it is characterized in that, the line of text recognition result include described in sequence identification to Identify the classification results of the line of text attribute of line of text image corresponding position, it is described according to the determination of line of text recognition result In line of text image to be identified the step of image-region corresponding with the line of text attribute, comprising:

The text image position being sequentially distributed in the line of text image to be identified, institute are determined according to the width of the specified sliding window The text image position for stating sequence distribution is sequentially corresponding with the classification results of the line of text attribute；

According to the classification results of the line of text attribute, the identical text image position of adjacent and classification results is polymerize, Determine the image-region in the line of text image to be identified corresponding from different line of text attributes.

6. a kind of text recognition method characterized by comprising

By text positioning method described in any one of claim 1 to 5 determine in line of text image to be identified from different texts The corresponding image-region of row attribute；

By the text image identification model with each line of text attributes match, respectively to corresponding with corresponding line of text attribute Line of text image to be identified in image-region is identified, determines the knowledge of the line of text image to be identified in respective image region Other result；

According to the position in described image region, the recognition result of the line of text image to be identified in each image-region is melted It closes, determines the corresponding text of the line of text image to be identified.

7. a kind of String localization device characterized by comprising

Line of text recognition result determining module is known for the line of text image to be identified to be input to line of text trained in advance Other model determines the corresponding line of text recognition result of the line of text image to be identified, wherein the line of text recognition result is used In the line of text attribute for indicating the line of text image to be identified corresponding position；

Image-region determining module, for according to line of text recognition result determine in the line of text image to be identified with the text The corresponding image-region of current row attribute.

8. device according to claim 7, which is characterized in that the line of text image to be identified is being input to preparatory instruction Experienced line of text identification model, before the step of determining the line of text image to be identified corresponding line of text recognition result, also Include:

Sample collection module, for obtaining the training sample of line of text identification model, wherein the sample data of the training sample For the line of text image of preset height and predetermined width, the sample label of the training sample is used to indicate the line of text image The line of text attribute of middle corresponding position；

Line of text identification model training module, for using the sample data of the training sample as the line of text identification model Input, with the output and the minimum target of error of the sample label of the training sample of the line of text identification model, instruction Practice the line of text identification model, wherein the line of text identification model is constructed based on convolutional neural networks.

9. device according to claim 8, which is characterized in that the sample collection module is further used for:

10. device according to claim 8, which is characterized in that described to be identified when obtaining line of text image to be identified Line of text image collection module is further used for:

11. device according to claim 9, which is characterized in that the line of text recognition result includes described in sequence identification The classification results of the line of text attribute of line of text image to be identified corresponding position, described image area determination module, further For:

12. a kind of text identification device characterized by comprising

Line of text image-region determining module to be identified, for passing through text positioning method described in any one of claim 1 to 5 Determine image-region corresponding from different line of text attributes in line of text image to be identified；

Subregion identification module, for by the text image identification model with each line of text attributes match, respectively to Accordingly the line of text image to be identified in the corresponding image-region of line of text attribute is identified, is determined in respective image region The recognition result of line of text image to be identified；

Recognition result Fusion Module, for the position according to described image region, to the line of text to be identified in each image-region The recognition result of image is merged, and determines the corresponding text of the line of text image to be identified.

13. a kind of electronic equipment, including memory, processor and it is stored on the memory and can runs on a processor Computer program, which is characterized in that the processor realizes claim 1 to 5 any one when executing the computer program The text positioning method, and/or, realize text recognition method as claimed in claim 6.

14. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The step of text positioning method described in claim 1 to 5 any one is realized when execution, and/or, realize claim 6 institute The step of text recognition method stated.