CN112132142A

CN112132142A - Text region determination method, text region determination device, computer equipment and storage medium

Info

Publication number: CN112132142A
Application number: CN202011033867.9A
Authority: CN
Inventors: 刘舒萍
Original assignee: Ping An Medical and Healthcare Management Co Ltd
Current assignee: Shenzhen Ping An Medical Health Technology Service Co Ltd
Priority date: 2020-09-27
Filing date: 2020-09-27
Publication date: 2020-12-25

Abstract

The application relates to the technical field of artificial intelligence, and discloses a text region determination method, a text region determination device, a text region determination computer device and a storage medium, wherein the method comprises the following steps: acquiring a target image feature vector; inputting the target image feature vector into a text region prediction model for prediction, wherein the text region prediction model is a model obtained based on U-shaped convolutional neural network training; obtaining a Gaussian map predicted value and a minimum circumscribed rectangle deflection angle predicted value output by the text region prediction model; and performing edge expansion and deflection angle correction on the Gaussian map corresponding to the Gaussian map predicted value according to the Gaussian map predicted value and the minimum circumscribed rectangle deflection angle predicted value to obtain a target text area. Therefore, the method has good generalization capability and strong detection capability on the bent text region, and improves the accuracy of the target text region.

Description

Text region determination method, text region determination device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a text region determination method, apparatus, computer device, and storage medium.

Background

Text detection technology is a very important module in OCR (optical character recognition) and mainly functions to locate text regions in images or videos. Finding text regions in images or videos is a very simple biological ability for humans, but represents the highest level of research effort in the field of OCR recognition systems or artificial intelligence.

The conventional text detection technology usually extracts image features through an artificially designed extraction mode, and the extraction mode is only effective for text detection of a simple scene, and loses the effect in a slightly complex scene, for example, when the length and the length-width ratio of a text change greatly, the direction information of a text region cannot be sufficiently described by a conventional object frame, so that false alarm is easily caused, when the background texture of the image is very similar to the text region or only the information of a local text region of the image is referred, false alarm is also easily caused due to insufficient detection capability when a curved text region is detected, so that false alarm is caused.

Disclosure of Invention

The present application mainly aims to provide a text region determination method, apparatus, computer device and storage medium, and aims to solve the technical problem that the text detection technology in the prior art loses effect in a slightly complex scene.

In order to achieve the above object of the invention, the present application proposes a text region determining method, including:

acquiring a target image feature vector;

inputting the target image feature vector into a text region prediction model for prediction, wherein the text region prediction model is a model obtained based on U-shaped convolutional neural network training;

obtaining a Gaussian map predicted value and a minimum circumscribed rectangle deflection angle predicted value output by the text region prediction model;

and performing edge expansion and deflection angle correction on the Gaussian map corresponding to the Gaussian map predicted value according to the Gaussian map predicted value and the minimum circumscribed rectangle deflection angle predicted value to obtain a target text area.

Further, before the step of inputting the target image feature vector into a text region prediction model for prediction, where the text region prediction model is a model trained based on a U-shaped convolutional neural network, the method further includes:

obtaining a plurality of training samples, the training samples comprising: the method comprises the following steps of (1) setting a characteristic vector of an image sample, a Gaussian icon constant value and a minimum external rectangular deflection angle calibration value;

inputting the image sample feature vector of the training sample into the U-shaped convolution neural network for prediction to obtain a Gaussian image training value and a minimum circumscribed rectangle deflection angle training value corresponding to the training sample;

and training according to the Gaussian icon fixed value, the minimum circumscribed rectangle deflection angle calibration value, the Gaussian icon training value and the minimum circumscribed rectangle deflection angle training value, and obtaining the text region prediction model after training.

Further, the obtaining a plurality of training samples includes: the method comprises the steps of image sample characteristic vector, Gaussian icon fixed value and minimum external rectangle deflection angle calibration value, and comprises the following steps:

acquiring an image sample;

determining the feature vector of the image sample according to the image sample;

obtaining a text region labeling result of the image sample, wherein the text region labeling result comprises: the method comprises the steps of obtaining a first long edge labeling result, a second long edge labeling result and two short edge labeling results, wherein the first long edge labeling result comprises at least one line segment, and the line segment number of the first long edge labeling result is the same as that of the second long edge labeling result;

obtaining a minimum circumscribed rectangle labeling result according to the line segments of the first long edge labeling result and the second long edge labeling result;

obtaining a deflection angle calibration value of the minimum circumscribed rectangle according to the minimum circumscribed rectangle labeling result;

and determining the Gaussian icon fixed value according to the line segments of the first long-edge labeling result and the second long-edge labeling result.

Further, the U-shaped convolutional neural network includes: a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer, a sixth convolutional layer, a first anti-convolutional layer, a second anti-convolutional layer, a third anti-convolutional layer, and a fourth anti-convolutional layer; and the number of the first and second groups,

the step of inputting the image sample feature vector of the training sample into the U-shaped convolution neural network for prediction to obtain a Gaussian image training value and a minimum circumscribed rectangle deflection angle training value corresponding to the training sample comprises the following steps:

inputting the image sample feature vector into the first convolution layer for convolution to obtain a first feature vector;

inputting the first feature vector into the second convolution layer for convolution to obtain a second feature vector;

inputting the second feature vector into the third convolution layer for convolution to obtain a third feature vector;

inputting the third feature vector into the fourth convolution layer for convolution to obtain a fourth feature vector;

inputting the fourth feature vector into the fifth convolutional layer for convolution to obtain a fifth feature vector;

inputting the fifth feature vector into the sixth convolutional layer for convolution to obtain a sixth feature vector;

inputting the sixth feature vector and the fifth feature vector into the first deconvolution layer for deconvolution to obtain a first fused feature vector;

inputting the fourth feature vector and the first fusion feature vector into the second deconvolution layer for deconvolution to obtain a second fusion feature vector;

inputting the third feature vector and the second fusion feature vector into the third deconvolution layer for deconvolution to obtain a third fusion feature vector;

and inputting the second feature vector and the third fusion feature vector into the fourth deconvolution layer for deconvolution to obtain the Gaussian image training value and the minimum circumscribed rectangle deflection angle training value.

Further, the step of training according to the gaussian icon fixed value, the minimum circumscribed rectangle deflection angle calibrated value, the gaussian icon training value and the minimum circumscribed rectangle deflection angle training value, and obtaining the text region prediction model after training, includes:

inputting the Gaussian icon fixed value, the minimum circumscribed rectangle deflection angle calibration value, the Gaussian icon training value and the minimum circumscribed rectangle deflection angle training value into a loss function for calculation to obtain a loss value of the U-shaped convolutional neural network, updating parameters of the U-shaped convolutional neural network according to the loss value, and using the updated U-shaped convolutional neural network for calculating the Gaussian icon training value and the minimum circumscribed rectangle deflection angle training value next time;

and repeatedly executing the steps of the method until the loss value reaches a first convergence condition or the iteration frequency reaches a second convergence condition, and determining the U-shaped convolutional neural network with the loss value reaching the first convergence condition or the iteration frequency reaching the second convergence condition as the text region prediction model.

Further, the step of performing edge extension and deflection angle correction on the gaussian map corresponding to the gaussian map predicted value according to the gaussian map predicted value and the minimum circumscribed rectangle deflection angle predicted value to obtain a target text region includes:

performing edge expansion on the Gaussian map corresponding to the Gaussian map predicted value according to the Gaussian map predicted value to obtain text region edge data;

and correcting the deflection angle of the text area corresponding to the text area edge data according to the predicted value of the minimum circumscribed rectangle deflection angle to obtain the target text area.

Further, the step of performing edge extension on the gaussian map corresponding to the gaussian map prediction value according to the gaussian map prediction value to obtain text region edge data includes:

when I (X)₁,Y₁)≤I(X₂,Y₂) And I (X)₁,Y₁) If > T, then I (X)₁,Y₁) Taking the corresponding pixel as an extensible pixel, and merging the extensible pixel and the Gaussian map predicted value;

repeating the steps of the method until the extensible pixel cannot be found, and taking the position data of the edge pixel of the Gaussian map predicted value after the combination as the edge data of the text area;

wherein, I (X)₂,Y₂) Is the average pixel value of the Gaussian map prediction value, I (X)₁,Y₁) Is shown as being associated with Gauss image table I (X)₂,Y₂) T is a constant value for the pixel value of any one of the adjacent pixels.

The present application also provides a text region determination apparatus, the apparatus including:

the image characteristic vector acquisition module is used for acquiring a target image characteristic vector;

the prediction module is used for inputting the target image feature vector into a text region prediction model for prediction, wherein the text region prediction model is a model obtained based on U-shaped convolutional neural network training, and a Gaussian map prediction value and a minimum circumscribed rectangle deflection angle prediction value output by the text region prediction model are obtained;

and the correction module is used for performing edge expansion and deflection angle correction on the Gaussian map corresponding to the Gaussian map predicted value according to the Gaussian map predicted value and the minimum circumscribed rectangle deflection angle predicted value to obtain a target text area.

The present application further proposes a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of the method of any one of the above when executing the computer program.

The present application also proposes a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any of the above.

According to the text region determining method, the text region determining device, the computer equipment and the storage medium, the target image feature vector is input into the text region prediction model for prediction, the Gaussian map prediction value and the minimum external rectangular deflection angle prediction value output by the text region prediction model are obtained, and the boundary region which is not strictly surrounded can be well processed based on the Gaussian map, so that the text region determining method and the text region determining device have good generalization capability; the text region prediction model is trained on the basis of the U-shaped convolutional neural network, and the U-shaped convolutional neural network has the advantages of resolution consistency of an output layer and a high-precision segmentation effect, so that the text region prediction model can accurately segment images of target image characteristic vectors of a slightly complex scene, the accuracy of a Gaussian map prediction value and a minimum circumscribed rectangle deflection angle prediction value is improved, and the accuracy of a target text region is improved; because the edge expansion and the deflection angle correction are carried out on the Gaussian map corresponding to the Gaussian map predicted value according to the Gaussian map predicted value and the minimum circumscribed rectangle deflection angle predicted value, the target text area is obtained, and therefore the method has strong detection capability on the curved text area and improves the accuracy of the target text area.

Drawings

FIG. 1 is a flowchart illustrating a text region determining method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a text region labeling result of the text region determining method according to the present application;

fig. 3 is a schematic diagram of a gaussian icon setting value of a text region in the text region determination method of the present application;

fig. 4 is a structural diagram of a U-shaped convolutional neural network of the text region determining method of the present application;

FIG. 5 is a block diagram illustrating a text region determining apparatus according to an embodiment of the present application;

fig. 6 is a block diagram illustrating a structure of a computer device according to an embodiment of the present application.

The objectives, features, and advantages of the present application will be further described with reference to the accompanying drawings.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In order to solve the technical problem that the text detection technology in the prior art loses effect in a slightly complex scene, the application provides a text region determination method, and the text region determination method is applied to the technical field of artificial intelligence, such as the technical field of artificial intelligence neural networks, deep learning and machine learning. According to the method, the Gaussian map predicted value and the minimum circumscribed rectangle deflection angle predicted value are obtained through prediction, and then the target text region is obtained through expansion correction according to the Gaussian map predicted value and the minimum circumscribed rectangle deflection angle predicted value, so that the method has strong detection capability on the curved text region, and the accuracy of the target text region is improved.

Referring to fig. 1, the text region determining method includes:

s1: acquiring a target image feature vector;

s2: inputting the target image feature vector into a text region prediction model for prediction, wherein the text region prediction model is a model obtained based on U-shaped convolutional neural network training;

s3: obtaining a Gaussian map predicted value and a minimum circumscribed rectangle deflection angle predicted value output by the text region prediction model;

s4: and performing edge expansion and deflection angle correction on the Gaussian map corresponding to the Gaussian map predicted value according to the Gaussian map predicted value and the minimum circumscribed rectangle deflection angle predicted value to obtain a target text area.

In the embodiment, the target image feature vector is input into the text region prediction model for prediction, a Gaussian map prediction value and a minimum circumscribed rectangle deflection angle prediction value output by the text region prediction model are obtained, and a boundary region which is not strictly surrounded can be better processed based on the Gaussian map, so that the method has good generalization capability; the text region prediction model is trained on the basis of the U-shaped convolutional neural network, and the U-shaped convolutional neural network has the advantages of resolution consistency of an output layer and a high-precision segmentation effect, so that the text region prediction model can accurately segment images of target image characteristic vectors of a slightly complex scene, the accuracy of a Gaussian map prediction value and a minimum circumscribed rectangle deflection angle prediction value is improved, and the accuracy of a target text region is improved; because the edge expansion and the deflection angle correction are carried out on the Gaussian map corresponding to the Gaussian map predicted value according to the Gaussian map predicted value and the minimum circumscribed rectangle deflection angle predicted value, the target text area is obtained, and therefore the method has strong detection capability on the curved text area and improves the accuracy of the target text area.

For S1, acquiring a target image; and determining the characteristic vector of the target image according to the target image.

The target image refers to a digital image of a target text region needing to be determined.

The target image feature vector is a feature vector of three channels, wherein the three channels are R (red), G (green) and B (blue) colors respectively, namely, each color corresponds to the feature vector of one channel.

Extracting feature vectors for the target image according to colors to obtain feature vectors corresponding to the target image, wherein each feature vector corresponding to the target image corresponds to one color, that is, the number of the feature vectors corresponding to the target image is three. And then splicing the three characteristic vectors corresponding to the target image in a channel dimension to obtain the characteristic vector of the target image.

Each vector element of the feature vector corresponding to the target image represents a pixel in the target image. For example, the vector elements of the second row and the third column of the feature vector corresponding to the target image of R (red) represent the R (red) color values of the pixels of the second row and the third column in the target image.

And S2, inputting the target image feature vector into a text region prediction model for prediction, and outputting a Gaussian map prediction value and a minimum circumscribed rectangle deflection angle prediction value by the text region prediction model. That is, the text region prediction model predicts the gaussian map and the minimum bounding rectangle deflection angle.

Wherein a plurality of training samples are obtained, the training samples comprising: the method comprises the following steps of (1) setting a characteristic vector of an image sample, a Gaussian icon constant value and a minimum external rectangular deflection angle calibration value; and training the U-shaped convolutional neural network according to the plurality of training samples, and taking the U-shaped convolutional neural network obtained after the training as the text region prediction model.

In S3, the gaussian map prediction value refers to a prediction value of a gaussian map of an image corresponding to the target image feature vector.

A gaussian map refers to the result of applying a gaussian mapping to transform all points on a surface. A gaussian is a set of points within a unit sphere (also known as a gaussian), e.g., a planar gaussian is a point on a spherical surface.

The minimum circumscribed rectangle deflection angle prediction value refers to a prediction value of the deflection angle of the minimum circumscribed rectangle of the image corresponding to the target image feature vector.

The Minimum Bounding Rectangle (MBR) is also translated as a minimum bounding rectangle, a minimum containing rectangle, or a minimum enclosing rectangle. The minimum bounding rectangle refers to the maximum range of a plurality of two-dimensional shapes (e.g., points, lines, polygons) expressed in two-dimensional coordinates, i.e., a rectangle whose boundary is defined by the maximum abscissa, the minimum abscissa, the maximum ordinate, and the minimum ordinate of the vertices of a given two-dimensional shape. Such a rectangle contains a given two-dimensional shape with sides parallel to the coordinate axes. The minimum bounding rectangle is a two-dimensional form of a minimum bounding box (mini bounding box).

The deflection angle is an included angle between the minimum circumscribed rectangle and a forward axis of an X axis (which is a transverse direction of an image corresponding to the feature vector of the target image). That is, the deflection angle is an angle value from-90 ° to 90 ° (including-90 ° and 90 °). It is understood that the deflection angle may have other reference relationships, such as a negative axis of the minimum bounding rectangle and the X axis, the minimum bounding rectangle and the Y axis (referring to the longitudinal direction of the image corresponding to the feature vector of the target image), a positive axis of the minimum bounding rectangle and the Y axis, and a negative axis of the minimum bounding rectangle and the Y axis, which are not specifically limited herein.

The included angle between the minimum external rectangle and the positive axis of the X axis may be an included angle between a side of the minimum external rectangle and the positive axis of the X axis, or an included angle between a diagonal line of the minimum external rectangle and the positive axis of the X axis, which is not specifically limited in this example.

For S4, obtaining text region edge data according to the Gaussian map predicted value; and correcting the deflection angle of the text area corresponding to the edge data of the text area according to the predicted value of the minimum circumscribed rectangle deflection angle, and taking the text area which is corrected by the deflection angle and corresponds to the edge data of the text area as a target text area.

In one embodiment, before the step of inputting the feature vector of the target image into a text region prediction model for prediction, the text region prediction model is trained based on a U-shaped convolutional neural network, the method further includes:

s01: obtaining a plurality of training samples, the training samples comprising: the method comprises the following steps of (1) setting a characteristic vector of an image sample, a Gaussian icon constant value and a minimum external rectangular deflection angle calibration value;

s02: inputting the image sample feature vector of the training sample into the U-shaped convolution neural network for prediction to obtain a Gaussian image training value and a minimum circumscribed rectangle deflection angle training value corresponding to the training sample;

s03: and training according to the Gaussian icon fixed value, the minimum circumscribed rectangle deflection angle calibration value, the Gaussian icon training value and the minimum circumscribed rectangle deflection angle training value, and obtaining the text region prediction model after training.

The embodiment realizes that the text region prediction model is obtained based on the training of the U-shaped convolutional neural network, and the U-shaped convolutional neural network has the advantages of resolution consistency of an output layer and a higher-precision segmentation effect, so that the text region prediction model can accurately segment images of target image characteristic vectors of slightly complex scenes, the accuracy of a Gaussian image prediction value and a minimum circumscribed rectangle deflection angle prediction value is improved, and the accuracy of a target text region is improved; and the parameters of the U-shaped convolution neural network are few, so that the operation efficiency of determining the target text area is improved.

For S01, training samples may be obtained from the database; in each training sample, each image sample feature vector corresponds to a Gaussian icon fixed value, and each image sample feature vector corresponds to at least one minimum circumscribed rectangle deflection angle calibration value.

The Gaussian icon constant value refers to a calibration value of a Gaussian icon of an image corresponding to the image sample feature vector of the training sample.

The minimum circumscribed rectangle deflection angle calibration value refers to a calibration value of the deflection angle of the minimum circumscribed rectangle of the image corresponding to the image sample feature vector of the training sample.

For S02, sequentially inputting the feature vectors of the image samples of all the training samples into the U-shaped convolutional neural network for prediction, to obtain a gaussian training value and a minimum circumscribed rectangle deflection angle training value corresponding to the training samples, that is, each training sample corresponds to one gaussian training value, and each training sample corresponds to at least one minimum circumscribed rectangle deflection angle training value.

And S03, calculating the loss value of the U-shaped convolutional neural network and updating parameters according to the Gaussian icon fixed value, the minimum circumscribed rectangle deflection angle calibration value, the Gaussian icon training value and the minimum circumscribed rectangle deflection angle training value to train the U-shaped convolutional neural network, and taking the trained U-shaped convolutional neural network as the text region prediction model.

Referring to fig. 2 and 3, in one embodiment, the obtaining a plurality of training samples includes: the method comprises the steps of image sample characteristic vector, Gaussian icon fixed value and minimum external rectangle deflection angle calibration value, and comprises the following steps:

s011: acquiring an image sample;

s012: determining the feature vector of the image sample according to the image sample;

s013: obtaining a text region labeling result of the image sample, wherein the text region labeling result comprises: the method comprises the steps of obtaining a first long edge labeling result, a second long edge labeling result and two short edge labeling results, wherein the first long edge labeling result comprises at least one line segment, and the line segment number of the first long edge labeling result is the same as that of the second long edge labeling result;

s014: obtaining a minimum circumscribed rectangle labeling result according to the line segments of the first long edge labeling result and the second long edge labeling result;

s015: obtaining a deflection angle calibration value of the minimum circumscribed rectangle according to the minimum circumscribed rectangle labeling result;

s016: and determining the Gaussian icon fixed value according to the line segments of the first long-edge labeling result and the second long-edge labeling result.

The embodiment realizes the determination of the feature vector of the image sample, the Gaussian icon fixed value and the minimum circumscribed rectangle deflection angle calibration value according to the image sample.

For S011, the image sample refers to a digital image with text.

For S012, the image sample feature vector is a feature vector of three channels, which are R, G, B three colors respectively, that is, each color corresponds to a feature vector of one channel.

Extracting feature vectors for the image samples according to colors to obtain feature vectors corresponding to the image samples, wherein each feature vector corresponding to the image sample corresponds to one color, that is, the number of the feature vectors corresponding to the image samples is three. And then splicing the three characteristic vectors corresponding to the image sample in a channel dimension to obtain the characteristic vector of the image sample.

Each vector element of the feature vector corresponding to the image sample represents a pixel in the image sample. For example, the vector elements of the third row and the fifth column of the feature vector of R corresponding to the image sample represent the R color values of the pixels of the third row and the fifth column in the image sample.

For S013, referring to fig. 2, for example, the points in the first long-side labeling result sequentially include 1, 2, 3, 4, 5, 6, 7, 8, and 9, the points in the second long-side labeling result sequentially include a, b, c, d, e, f, g, h, and i, the two short-side labeling results are 1a and 9i, the first long-side labeling result includes 8 line segments (line segment 12, line segment 23, line segment 34, line segment 45, line segment 56, line segment 67, line segment 78, and line segment 89), the second long-side labeling result includes 8 line segments (line segment ab, line segment bc, line segment cd, line segment de, line segment ef, line segment fg, line segment gh, and line segment hi), and the text region labeling result is an image region surrounded by the first long-side labeling result, the second long-side labeling result, and the two short-side labeling results, which is not specifically limited in this example.

For S014, a quadrilateral labeling result is obtained according to the line segment corresponding to the first long-side labeling result and the second long-side labeling result, and fig. 3a illustrates an image area corresponding to the quadrilateral labeling result; and performing minimum external rectangle calculation on the quadrilateral labeling result to obtain a minimum external rectangle labeling result.

For example, the first line segment 12 of the first long-side labeling result is opposite to the first line segment ab of the second long-side labeling result, the image area surrounded by 12ba is used as the quadrangle labeling result, the minimum circumscribed rectangle of the image area surrounded by 12ba is obtained, and the minimum circumscribed rectangle is used as the minimum circumscribed rectangle labeling result of the image area surrounded by 12 ba.

Wherein the quadrilateral labeling result is rotated within a range of 90 degrees in increments of about 3 degrees each time. The maximum and minimum x, y values of the bounding points of the circumscribed rectangle in the direction of its coordinate system are recorded once per revolution. After rotating to a certain angle, the area of the external rectangle reaches the minimum. And taking the parameters of the circumscribed rectangle with the minimum area as the length and the width in the principal axis sense, and taking the coordinate data corresponding to the circumscribed rectangle with the minimum area as the labeling result of the minimum circumscribed rectangle.

For S015, obtaining a first central point and a second central point according to two opposite sides of the image area corresponding to the minimum circumscribed rectangle labeling result; connecting the first central point and the second central point to obtain a central line marking result; and taking the included angle between the center line labeling result and the positive axis of the X axis (taking the transverse direction of the image corresponding to the image sample as the X axis) as the minimum circumscribed rectangle deflection angle calibration value. That is, the minimum circumscribed rectangle deflection angle calibration value is an angle value from-90 ° to 90 ° (including-90 °, 90 °). For example, the four vertexes of the image area corresponding to the minimum circumscribed rectangle labeling result are sequentially 12ba, the center point of 1a is taken as a first center point, the center point of 2b is taken as a second center point, and the center point of 1a is connected with the center point of 2b to obtain a center line labeling result corresponding to 12ba, and an included angle between the center line of the image area 12ba corresponding to the minimum circumscribed rectangle labeling result and the positive axis of the X axis is taken as the minimum circumscribed rectangle deflection angle calibration of 12 ba.

It is understood that there are other ways to determine the minimum circumscribed rectangle deflection angle calibration value, and this is not a specific limitation by way of example.

For S016, the method for determining the gaussian icon definite value comprises the steps of:

s0161: dividing the line segment of the first long edge marking result by n to obtain a plurality of first branch points;

for example, the line segments 12 of the first long-side labeling result are divided into n equal parts to obtain a plurality of first dividing points, and the line segments of the first long-side labeling result are sequentially divided into n equal parts until the line segments of the first long-side labeling result are divided.

S0162: dividing the line segments of the second long edge marking result by n to obtain a plurality of second parting points, wherein the number of the first parting points is the same as that of the second parting points;

for example, the line segment ab of the second long-side labeling result is equally divided by n to obtain a plurality of second branch points, and the line segments of the second long-side labeling result are sequentially equally divided by n until the line segments of the second long-side labeling result are divided. Since n is the same in step S0161 and step S0162, the number of the first split site and the second split site is the same.

S0163: connecting the plurality of first subsites and the corresponding subsites of the plurality of second subsites to obtain a plurality of branch connecting lines, wherein the dotted line of fig. 3b shows the branch connecting lines;

for example, a first of the plurality of first loci is connected to a first of the plurality of second loci, and the connected connecting line is used as a locus connecting line.

S0164: performing center point calculation according to the plurality of branch connecting lines to obtain a plurality of center points corresponding to the branch connecting lines, wherein 3b of fig. 3 shows the center points corresponding to the branch connecting lines;

and calculating the central point of each of the plurality of branch connecting lines to obtain a plurality of central points corresponding to the branch connecting lines.

S0165: and obtaining the fixed value of the Gaussian icon according to the coordinates of the central points corresponding to the plurality of the position-dividing connecting lines, wherein a Gaussian map corresponding to the fixed value of the Gaussian icon is shown in 3d of fig. 3.

And obtaining the Gaussian icon fixed value by adopting a two-dimensional Gaussian kernel function according to the coordinates of the central points corresponding to the position-dividing connecting lines.

Referring to fig. 4, in one embodiment, the U-shaped convolutional neural network includes: a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer, a sixth convolutional layer, a first anti-convolutional layer, a second anti-convolutional layer, a third anti-convolutional layer, and a fourth anti-convolutional layer; and the number of the first and second groups,

s021: inputting the image sample feature vector into the first convolution layer for convolution to obtain a first feature vector;

s022: inputting the first feature vector into the second convolution layer for convolution to obtain a second feature vector;

s023: inputting the second feature vector into the third convolution layer for convolution to obtain a third feature vector;

s024: inputting the third feature vector into the fourth convolution layer for convolution to obtain a fourth feature vector;

s025: inputting the fourth feature vector into the fifth convolutional layer for convolution to obtain a fifth feature vector;

s026: inputting the fifth feature vector into the sixth convolutional layer for convolution to obtain a sixth feature vector;

s027: inputting the sixth feature vector and the fifth feature vector into the first deconvolution layer for deconvolution to obtain a first fused feature vector;

s028: inputting the fourth feature vector and the first fusion feature vector into the second deconvolution layer for deconvolution to obtain a second fusion feature vector;

s029: inputting the third feature vector and the second fusion feature vector into the third deconvolution layer for deconvolution to obtain a third fusion feature vector;

s0210: and inputting the second feature vector and the third fusion feature vector into the fourth deconvolution layer for deconvolution to obtain the Gaussian image training value and the minimum circumscribed rectangle deflection angle training value.

The embodiment provides a detailed structure of the U-shaped convolutional neural network, and the U-shaped convolutional neural network has the advantages of resolution consistency of an output layer and a segmentation effect with higher precision, and can better fuse high-level semantic features and bottom-level texture features of an image, so that a text region prediction model can accurately segment an image of a target image feature vector of a slightly complex scene, the accuracy of a Gaussian map prediction value and a minimum circumscribed rectangle deflection angle prediction value is improved, and the accuracy of a target text region is improved; and the parameters of the U-shaped convolution neural network are few, so that the operation efficiency of determining the target text area is improved.

The convolution kernel size adopted by the first convolution layer is 2 x 64(64 is a convolution channel). The second convolution layer uses convolution kernel sizes of 4 x 128(128 is a convolution channel). The third convolution layer uses convolution kernel sizes of 8 x 256(256 are convolution channels). The fourth convolution layer uses convolution kernel size of 16 × 512(512 is convolution channel). The fifth convolution layer uses convolution kernel size of 32 x 512(512 is convolution channel). The sixth convolution layer uses convolution kernel size of 32 x 512(512 is convolution channel). The first deconvolution layer employed convolution kernel sizes of 16 × 256(256 are convolution channels). The second deconvolution layer employed convolution kernel sizes of 8 × 128(128 being convolution channels). The third deconvolution layer employed convolution kernel sizes of 4 x 64(64 being convolution channels). The fourth deconvolution layer sequentially adopts 1 convolution kernel of 1 × 1, 1 batcnorm layer, 1 convolution kernel of 3 × 3, 1 batch norm layer, 1 up-sampling of 2 × 2, 3 convolution kernels of 3 × 3 and 1 convolution kernel of 1 × 1.

In an embodiment, the step of performing training according to the gaussian map constant value, the minimum circumscribed rectangle deflection angle calibration value, the gaussian map training value, and the minimum circumscribed rectangle deflection angle training value, and obtaining the text region prediction model after the training is finished includes:

s031: inputting the Gaussian icon fixed value, the minimum circumscribed rectangle deflection angle calibration value, the Gaussian icon training value and the minimum circumscribed rectangle deflection angle training value into a loss function for calculation to obtain a loss value of the U-shaped convolutional neural network, updating parameters of the U-shaped convolutional neural network according to the loss value, and using the updated U-shaped convolutional neural network for calculating the Gaussian icon training value and the minimum circumscribed rectangle deflection angle training value next time;

s032: and repeatedly executing the steps of the method until the loss value reaches a first convergence condition or the iteration frequency reaches a second convergence condition, and determining the U-shaped convolutional neural network with the loss value reaching the first convergence condition or the iteration frequency reaching the second convergence condition as the text region prediction model.

Wherein the loss function is:

L＝L_reg+λL_reg(∝)

λ is a constant between 0 and 1, excluding 0 and excluding 1;

L_regfor global constraint Gaussian map, calculating the error between the training value of Gaussian map and the constant value of Gaussian map, and using the function of root mean square error, L_regThe calculation formula of (a) is as follows:

wherein text is the number of pixels of a positive sample in the Gaussian icon constant value, BG is the number of pixels of a negative sample in the Gaussian icon constant value, Y is the pixel of the Gaussian icon constant value, f_θ(X_text) Pixels of the positive sample representing the training values of the Gaussian map, f_θ(X_BG) The pixel of a negative sample representing the Gaussian map training value, theta is a parameter of the U-shaped convolutional neural network to be learned, and X is the image sample feature vector;

L_reg(. alpha.) is used to predict the minimum circumscribed rectangle deflection angle training value, L_regThe formula for the calculation of (oc) is as follows:

L_reg(∝)＝SmoothL₁(∝^*-∝)

wherein α is the minimum circumscribed rectangle deflection angle training value, α is the minimum circumscribed rectangle deflection angle calibration value, 0.5x²if | x | < 1 means SmoothL when | x | < 1₁(X) is 0.5X²The term "otherwise" refers to smoothL when | x | ≧ 1₁The value of (X) is | X | -0.5.

According to the embodiment, training is performed according to the Gaussian icon fixed value, the minimum circumscribed rectangle deflection angle calibration value, the Gaussian icon training value and the minimum circumscribed rectangle deflection angle training value.

For S032, the first convergence condition means that the magnitudes of the losses of two adjacent calculations satisfy the lipschitz condition (lipschitz continuity condition).

The iteration number reaching the second convergence condition refers to the number of times that the U-shaped convolutional neural network is used for calculating the gaussian map training value and the minimum circumscribed rectangle deflection angle training value, that is, the iteration number is increased by 1 after one calculation.

In an embodiment, the step of performing edge extension and deflection angle correction on the gaussian corresponding to the predicted gaussian according to the predicted gaussian and the predicted minimum circumscribed rectangle deflection angle to obtain the target text region includes:

s41: performing edge expansion on the Gaussian map corresponding to the Gaussian map predicted value according to the Gaussian map predicted value to obtain text region edge data;

s42: and correcting the deflection angle of the text area corresponding to the text area edge data according to the predicted value of the minimum circumscribed rectangle deflection angle to obtain the target text area.

In the embodiment, the Gaussian map predicted value and the minimum circumscribed rectangle deflection angle predicted value are used for expansion correction, so that the method has strong detection capability on the curved text region, and the accuracy of the target text region is improved.

For S41, the text region edge data refers to position data of edge pixels of the text region.

For S42, determining a minimum bounding rectangle expansion result according to the text region edge data; correcting the deflection angle of each minimum circumscribed rectangle expansion result according to the deflection angle predicted value of the minimum circumscribed rectangle corresponding to the minimum circumscribed rectangle expansion result to obtain a minimum circumscribed rectangle correction result; and combining all the minimum circumscribed rectangle correction results to obtain the target text area. That is to say, the number of the minimum circumscribed rectangle expansion results is at least one, each minimum circumscribed rectangle expansion result corresponds to one predicted value of the minimum circumscribed rectangle deflection angle, and each minimum circumscribed rectangle expansion result is corrected by the deflection angle once.

And correcting the deflection angle, namely correcting to ensure that the deflection angle of the minimum circumscribed rectangle correction result is the same as the predicted value of the deflection angle of the minimum circumscribed rectangle corresponding to the minimum circumscribed rectangle expansion result.

In an embodiment, the step of performing edge extension on the gaussian corresponding to the predicted gaussian value according to the predicted gaussian value to obtain edge data of the text region includes:

s411: when I (X)₁,Y₁)≤I(X₂,Y₂) And I (X)₁,Y₁) If > T, then I (X)₁,Y₁) Taking the corresponding pixel as an extensible pixel, and merging the extensible pixel and the Gaussian map predicted value;

s412: repeating the steps of the method until the extensible pixel cannot be found, and taking the position data of the edge pixel of the Gaussian map predicted value after the combination as the edge data of the text area;

According to the method and the device, the expansion according to the Gaussian map predicted value is realized, so that the accuracy of the target text area is improved.

Where T is the threshold value of the pixel value.

Referring to fig. 5, the present application further proposes a text region determination apparatus, the apparatus including:

an image feature vector obtaining module 100, configured to obtain a target image feature vector;

the prediction module 200 is configured to input the feature vector of the target image into a text region prediction model for prediction, where the text region prediction model is a model obtained based on U-shaped convolutional neural network training, and obtains a gaussian prediction value and a minimum circumscribed rectangle deflection angle prediction value output by the text region prediction model;

and the correcting module 300 is configured to perform edge extension and deflection angle correction on the gaussian corresponding to the gaussian predicted value according to the gaussian predicted value and the minimum circumscribed rectangle deflection angle predicted value, so as to obtain a target text region.

Further, the text region determining apparatus of the present embodiment may implement the text region determining method shown in fig. 1, and for specific operation content, please refer to the description of the embodiment in fig. 1, which is not described herein again.

Referring to fig. 6, a computer device, which may be a server and whose internal structure may be as shown in fig. 6, is also provided in the embodiment of the present application. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer designed processor is used to provide computational and control capabilities. The memory of the computer device comprises a storage medium and an internal memory. The storage medium stores an operating system, a computer program, and a database. The memory provides an environment for the operation of operating systems and computer programs in non-storage media. The database of the computer device is used for storing data such as a text region determination method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a text region determination method. The text region determination method comprises the following steps: acquiring a target image feature vector; inputting the target image feature vector into a text region prediction model for prediction, wherein the text region prediction model is a model obtained based on U-shaped convolutional neural network training; obtaining a Gaussian map predicted value and a minimum circumscribed rectangle deflection angle predicted value output by the text region prediction model; and performing edge expansion and deflection angle correction on the Gaussian map corresponding to the Gaussian map predicted value according to the Gaussian map predicted value and the minimum circumscribed rectangle deflection angle predicted value to obtain a target text area.

An embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, the computer program, when executed by a processor, implementing a text region determination method, including the steps of: acquiring a target image feature vector; inputting the target image feature vector into a text region prediction model for prediction, wherein the text region prediction model is a model obtained based on U-shaped convolutional neural network training; obtaining a Gaussian map predicted value and a minimum circumscribed rectangle deflection angle predicted value output by the text region prediction model; and performing edge expansion and deflection angle correction on the Gaussian map corresponding to the Gaussian map predicted value according to the Gaussian map predicted value and the minimum circumscribed rectangle deflection angle predicted value to obtain a target text area.

According to the executed text region determining method, the target image feature vector is input into the text region prediction model for prediction, the Gaussian image prediction value and the minimum circumscribed rectangle deflection angle prediction value output by the text region prediction model are obtained, and the boundary region which is not strictly surrounded can be well processed based on the Gaussian image, so that the method has good generalization capability; the text region prediction model is trained on the basis of the U-shaped convolutional neural network, and the U-shaped convolutional neural network has the advantages of resolution consistency of an output layer and a high-precision segmentation effect, so that the text region prediction model can accurately segment images of target image characteristic vectors of a slightly complex scene, the accuracy of a Gaussian map prediction value and a minimum circumscribed rectangle deflection angle prediction value is improved, and the accuracy of a target text region is improved; because the edge expansion and the deflection angle correction are carried out on the Gaussian map corresponding to the Gaussian map predicted value according to the Gaussian map predicted value and the minimum circumscribed rectangle deflection angle predicted value, the target text area is obtained, and therefore the method has strong detection capability on the curved text area and improves the accuracy of the target text area.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile or volatile computer-readable storage medium, and when executed, the computer program can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in the examples may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A text region determination method, the method comprising:

acquiring a target image feature vector;

2. The method of determining the text region according to claim 1, wherein the step of inputting the target image feature vector into a text region prediction model for prediction, wherein the text region prediction model is trained based on a U-type convolutional neural network, further comprises:

3. The text region determination method according to claim 2, wherein the obtaining a plurality of training samples comprises: the method comprises the steps of image sample characteristic vector, Gaussian icon fixed value and minimum external rectangle deflection angle calibration value, and comprises the following steps:

acquiring an image sample;

4. The text region determination method according to claim 2, wherein the U-shaped convolutional neural network comprises: a first convolutional layer, a second convolutional layer, a third convolutional layer, a fourth convolutional layer, a fifth convolutional layer, a sixth convolutional layer, a first anti-convolutional layer, a second anti-convolutional layer, a third anti-convolutional layer, and a fourth anti-convolutional layer; and the number of the first and second groups,

5. The method according to claim 2, wherein the step of obtaining the predictive model of the text region after training according to the gaussian icon fixed value, the minimum circumscribed rectangle deflection angle calibrated value, the gaussian map training value, and the minimum circumscribed rectangle deflection angle training value comprises:

6. The method of claim 1, wherein the step of performing edge extension and deflection angle correction on the gaussian corresponding to the predicted gaussian according to the predicted gaussian and the predicted minimum circumscribed rectangle deflection angle to obtain the target text region comprises:

7. The method according to claim 6, wherein the step of performing edge extension on the gaussian map corresponding to the predicted gaussian map value according to the predicted gaussian map value to obtain text region edge data comprises:

8. A text region determination apparatus, characterized in that the apparatus comprises:

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.