CN112926565A

CN112926565A - Picture text recognition method, system, device and storage medium

Info

Publication number: CN112926565A
Application number: CN202110213721.0A
Authority: CN
Inventors: 何小臻
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2021-06-08
Anticipated expiration: 2041-02-25
Also published as: CN112926565B

Abstract

The invention provides an identification method of a picture text, which is characterized in that the picture text is preprocessed by acquiring a picture to be identified; detecting the picture by using a preset text detection model to obtain the coordinates of each text line in the picture; obtaining a width value corresponding to each text line according to the coordinates of each text line; sorting the text lines according to the width value, splicing the text lines with the longest width and the shortest width to form a long text, repeating the operation, and stopping splicing when the width threshold value is about to be exceeded; detecting a long text, and if the width value of the long text does not reach a width threshold value, repairing the width of the long text according to the width threshold value; repeating the operation until the width threshold of the remaining long texts formed by the current line is reached to form a long text set; and inputting the long text set into a preset text recognition model for recognition, and disassembling a returned result to obtain a recognition result of the picture. The invention effectively improves the processing time of the background model through the bottom logic of secondary batch processing.

Description

Picture text recognition method, system, device and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, a system, a device, and a storage medium for recognizing a picture text.

Background

In the current field of artificial intelligence, algorithm landform or engineering deployment, several commonly used deployment frameworks are available in the market, such as the TensorFlow-spring of TensorFlow, the hundred degree Paddle platform, the TensorRT framework of England, and so on. TensorRT is a high-performance deep learning Inference (Inference) optimizer and can provide low-delay and high-throughput deployment Inference for deep learning applications. TensorRT can be used for reasoning and accelerating a super-large scale data center, an embedded platform or an automatic driving platform. TensorRT can support nearly all deep learning frameworks such as TensorFlow, Caffe, Mxnet, Pyorch and the like, and the GPU of TensorRT and NVIDIA is combined, so that rapid and efficient deployment reasoning can be carried out in nearly all frameworks. By taking the example of the TensorRT framework of England, aiming at the scene of text recognition, the general solution idea is as follows: all text lines are detected from the input picture, and then the text lines are sequentially sent to a recognition model for text recognition.

Since the number of text lines of the input picture is not constant, especially for the picture with dense text, the number of the measured text lines is large, and even if the TensorRT framework like Invitta supports batch processing, the whole processing is time-consuming. Therefore, in order to speed up the processing time, a new picture text recognition method is proposed.

Disclosure of Invention

Based on the above, the invention provides a method, a system, equipment and a storage medium for identifying picture texts, so as to accelerate the identification speed of the picture texts.

In order to achieve the above object, the present invention provides a method for identifying a picture text, wherein the method comprises:

acquiring a picture to be identified, and preprocessing the picture;

detecting the preprocessed picture by using a text detection model trained in advance to obtain the coordinates of each text line in the picture;

calculating to obtain a width value corresponding to each text line according to the coordinates of each text line;

sorting the text lines according to the width values, traversing all the width values, splicing the text line with the longest width and the text line with the shortest width to form a long text, repeating the operation, and stopping splicing when the width value of the long text being spliced exceeds a width threshold value;

detecting the long text, and if the width value of the long text does not reach a width threshold value, repairing the width of the long text according to the width threshold value;

splicing and repairing the remaining text lines until all the text lines form long texts reaching a width threshold value so as to form a long text set;

and inputting the long text set into a preset text recognition model for recognition, and disassembling a returned result to obtain a recognition result of the picture to be recognized.

Preferably, the step of preprocessing the picture includes:

zooming the picture, wherein the zoomed maximum width of the picture is not more than 1600 pixels, the maximum height of the zoomed picture is not more than 2400 pixels, the minimum width is not less than 600 pixels, and the minimum height is not less than 800 pixels;

and converting the zoomed picture into a gray scale image.

Preferably, the step of detecting the preprocessed picture by using the pre-trained text detection model to obtain the coordinates of each text line in the picture includes:

calling a trained text detection model, wherein the text detection model adopts a dbnet algorithm, a preprocessed picture is input into the text detection model, and the text detection model outputs a probability value of a corresponding text in a pixel point of the picture;

carrying out thresholding selection on pixel points in the picture, and dividing the pixel points in the picture into text pixel points and non-text pixel points according to the probability value to obtain a binary image;

calculating a connected domain set of the binary image by using a first image processing algorithm;

and inputting the connected domain set into a second image processing algorithm, and calculating a minimum circumscribed rectangle of each connected domain, wherein four vertexes of the minimum circumscribed rectangle are coordinates of text lines.

Preferably, the step of calculating the width value of each corresponding text line according to the coordinates of each text line includes:

calculating to obtain the original width and the original height of each text line according to the coordinates of the four vertexes of each text line;

zooming each text line to the same preset height, and calculating the zooming ratio corresponding to each text line according to the original height and the preset height of each text line;

the current width value of each text line is obtained through the original width and the scaling ratio of each text line.

Preferably, the step of detecting the spliced text line, and if the text line width value does not reach the width threshold, repairing the text line width according to the width threshold includes:

detecting the long text, and judging whether the width value of the long text reaches a width threshold value, wherein the width threshold value is 1600 pixels;

and if not, selecting a black edge of the color three channels of the picture to be recognized, and adding the black edge to the width of the long text according to the width threshold value, so that the width of the long text reaches 1600 pixels.

Preferably, the step of inputting the long text set into a preset text recognition model for recognition, and disassembling the returned result to obtain the recognition result of the picture to be recognized includes:

carrying out recognition calculation on a long text set according to the batch processing operation of TensorRT, wherein the text recognition model adopts a CRNN algorithm;

splitting the text strings identified by the long text and corresponding to the corresponding text lines;

and obtaining the recognition result of the picture to be recognized.

Preferably, after the identification result of the picture to be identified is obtained, the identification result is uploaded to a block chain, so that the block chain encrypts and stores the identification result.

In order to achieve the above object, the present invention further provides a system for recognizing a picture text, wherein the system comprises:

the preprocessing module is used for acquiring a picture to be identified and preprocessing the picture;

the detection module is used for detecting the preprocessed pictures by utilizing the pre-trained text detection model to obtain the coordinates of each text line in the pictures;

the width value module is used for calculating the width value corresponding to each text line according to the coordinates of each text line;

the splicing module is used for sequencing the text lines according to the width values, traversing all the width values, splicing the text line with the longest width and the text line with the shortest width to form a long text, repeating the operation, and stopping splicing when the width value of the long text being spliced exceeds a width threshold value;

the repairing module is used for detecting the long text, and repairing the width of the long text according to the width threshold if the width value of the long text does not reach the width threshold;

the long text set module is used for controlling the splicing module and the repairing module to splice and repair the remaining text lines until all the text lines form long texts reaching the width threshold value so as to form a long text set;

and the recognition module is used for inputting the long text set into a preset text recognition model for recognition, and disassembling a returned result to obtain a recognition result of the picture to be recognized.

To achieve the above object, the present invention further provides a computer device, which includes a storage and a processor, wherein the storage stores readable instructions, and the readable instructions, when executed by the processor, cause the processor to execute the steps of the method for recognizing picture texts.

In order to achieve the above object, the present invention further provides a computer-readable storage medium storing a program file capable of implementing the method for recognizing a picture text as described above.

The invention provides a method, a system, equipment and a storage medium for identifying a picture text, wherein the identification method is used for preprocessing a picture to be identified by acquiring the picture; detecting the preprocessed picture by using a text detection model trained in advance to obtain the coordinates of each text line in the picture; calculating to obtain a width value corresponding to each text line according to the coordinates of each text line; sorting the text lines according to the width values, traversing all the width values, splicing the text line with the longest width and the text line with the shortest width to form a long text, repeating the operation, and stopping splicing when the width value of the long text being spliced exceeds a width threshold value; detecting the long text, and if the width value of the long text does not reach a width threshold value, repairing the width of the long text according to the width threshold value; splicing and repairing the remaining text lines until all the text lines form long texts reaching a width threshold value so as to form a long text set; and inputting the long text set into a preset text recognition model for recognition, and disassembling a returned result to obtain a recognition result of the picture to be recognized. Therefore, the recognition method provided by the invention can effectively improve the calculation and processing time of the background model through the bottom logic of secondary batch processing in the environment of high concurrent requests in the actual use scene after the project falls to the ground through the text combination-based batch text recognition acceleration method.

Drawings

FIG. 1 is a diagram of an environment in which an identification method provided in one embodiment may be implemented;

FIG. 2 is a block diagram showing an internal configuration of a computer device according to an embodiment;

FIG. 3 is a flow diagram of an identification method in one embodiment;

FIG. 4 is a schematic diagram of an identification system in one embodiment;

FIG. 5 is a schematic diagram of a computer apparatus in one embodiment;

FIG. 6 is a block diagram of a computer-readable storage medium in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another.

Fig. 1 is a diagram of an implementation environment of a method for recognizing picture texts according to an embodiment, as shown in fig. 1, in the implementation environment, including a computer device 110 and a display device 120.

The computer device 110 may be a computer device such as a computer used by a user, and a picture text recognition system is installed on the computer device 110. When calculating, the user can perform calculation in accordance with a recognition method of the picture text at the computer device 110 and display the calculation result through the display device 120.

It should be noted that the combination of the computer device 110 and the display device 120 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like.

FIG. 2 is a diagram showing an internal configuration of a computer device according to an embodiment. As shown in fig. 2, the computer apparatus includes a processor, a nonvolatile storage medium, a storage, and a network interface connected through a system bus. The non-volatile storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can enable the processor to realize a picture text recognition method when being executed by the processor. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have computer readable instructions stored therein, which when executed by the processor, may cause the processor to perform a method of identifying picture text. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 2 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

As shown in fig. 3, in an embodiment, a method for recognizing a picture text is provided, where the method may be applied to the computer device 110 and the display device 120, and specifically includes the following steps:

and step 31, acquiring a picture to be identified, and preprocessing the picture.

Specifically, the specific step of preprocessing the picture includes:

s311, zooming the picture, wherein the zoomed maximum width of the picture is not more than 1600 pixels, the zoomed maximum height of the picture is not more than 2400 pixels, the zoomed minimum width of the picture is not less than 600 pixels, and the zoomed minimum height of the picture is not less than 800 pixels;

specifically, a color picture is obtained, the picture is zoomed at first, and a prior value is obtained according to the resolution condition of the real picture in the zooming process, namely the maximum width of the picture does not exceed 1600 pixels, and the maximum height of the picture does not exceed 2400 pixels; the minimum width is not less than 600 pixels, the minimum height is not less than 800 pixels, and the scaling of the picture is not fixed but basically fixed in the intervals. More specifically, the picture scaling is performed according to the real resolution of the picture to be recognized, if the picture is too small, the picture is enlarged a little, otherwise, the text detection effect is poor, if the picture is too large, the picture is reduced a little, otherwise, the text detection time is too long. For all the pictures to be identified, the width and the height of the pictures are limited within the interval through scaling, namely, the pictures only need to be within the interval, and further, the picture scaling is carried out according to an interpolation algorithm.

And S312, converting the zoomed picture into a gray scale image.

Specifically, the image is converted into a gray-scale image after being zoomed, and in a general practical use scene, the image uploaded by a user is a color image generally, but the gray-scale image is required to be processed by a subsequent algorithm, so that the input image is converted into the gray-scale image.

Further, the color picture is converted into a gray scale image, i.e. 3 channels (RGB) of the picture are converted into 1 channel. There are generally three ways to implement this, respectively:

(1) the average method, the simplest method, is the average method of course, averages the values of 3 channels RGB at the same pixel position, I (x, y) 1/3 × I _ R (x, y) +1/3 × I _ G (x, y) +1/3 × I _ B (x, y).

(2) The maximum-minimum averaging method is to average the maximum and minimum brightness values of RGB at the same pixel position.

(3) Weighted average method, I (x, y) ═ 0.3 × I _ R (x, y) +0.59 × I _ G (x, y) +0.11 × I _ B (x, y); this is the most popular approach. Several weighting factors 0.3,0.59,0.11 are parameters adjusted according to the human brightness perception system, which is a widely used standardized parameter.

And step 32, detecting the preprocessed picture by using the pre-trained text detection model to obtain the coordinates of each text line in the picture.

The text detection model can detect coordinates of each text line in the picture, wherein the coordinates refer to coordinates of four vertexes of a minimum circumscribed rectangle of the text line, an origin of the coordinates is a vertex at the upper left corner of the picture, coordinate axes take the width of the picture as horizontal coordinates, and the height of the picture as vertical coordinates.

Specifically, the step of detecting the preprocessed picture by using the pre-trained text detection model to obtain the coordinates of each text line in the picture includes:

s321, calling a trained text detection model, wherein the text detection model adopts a dbnet algorithm, a preprocessed picture is input into the text detection model, and the text detection model outputs a probability value of a corresponding text in a pixel point of the picture;

specifically, the text detection model may be constructed by using a text detection algorithm based on pixel segmentation, and an attention mechanism may be added to the text detection model. The text detection algorithm based on segmentation may be any one of the algorithms of SENEt, DBNet, PixelLink, etc. In this embodiment, a trained text detection model needs to be called, where the text detection model adopts a dbnet algorithm (Differentiable Binarization Network), and the dbnet algorithm is a text segmentation model designed based on a picture segmentation method. According to the dbnet algorithm, data marked by a user can be used as a training set, and a desired text detection model is obtained through training. Specifically, the result output by the text detection model is the probability (0-1) of each pixel point in the picture, for example, a picture with 100 pixels x100 pixels, and after calculation by dbnet, the probability values corresponding to the 10000 pixel points and belonging to the text are output, that is, how many of the 10000 pixel points correspond to the text in the picture, and how many correspond to the hollow part or the non-text part in the picture.

S322, carrying out thresholding selection on pixel points in the picture, and dividing the pixel points in the picture into text pixel points and non-text pixel points according to the probability value to obtain a binary image;

specifically, a user can set a threshold value, and the 10000 pixel points are divided into text pixel points and non-text pixel points through the probability value, wherein the non-text pixel points can be pixel points belonging to a blank part in a picture, so that a binary image can be formed.

S323, calculating a connected domain set of the binary image by using a first image processing algorithm;

specifically, the first image processing algorithm may be the computer vision processing kit opencv widely used in the industry.

And S324, inputting the connected domain set into a second image processing algorithm, and calculating a minimum circumscribed rectangle of each connected domain, wherein four vertexes of the minimum circumscribed rectangle are coordinates of text lines.

Specifically, the second image processing algorithm calls the findcontouus function of opencv.

It should be noted that the text behavior output by the text detection model includes at least one text line of a text, which may be understood as an outline image of a text or an outline image of a text line/column. In a possible embodiment, the text line may also be a sub-image that is segmented from the target picture and includes text.

In some embodiments, a line of text is a bounding box in units of a particular content, where the particular content may be a word, a line of words, or a single word, among others. In some embodiments, the text detection model may generate different lines of text based on the type of text in the picture to be recognized. For example, when the picture contains an english text, the text detection model may frame the english text of the picture line by taking a word as a unit, and generate a plurality of text lines. For another example, when the picture to be recognized includes a chinese character, the text detection model may frame the chinese text of the picture in units of lines to generate a plurality of text lines, and it can be understood that the text in the text lines determined in this embodiment is a line of chinese text. For another example, when the picture to be recognized contains chinese, the text detection model may frame the chinese text of the picture in units of a single word to generate a plurality of text lines, and it can be understood that the text in the text line determined in this embodiment is a word.

As will be readily understood by those skilled in the art, most of the characters in the pictures are laid out regularly, and the characters are generally arranged in a straight line, such as a horizontal direction, a vertical direction, and an oblique direction. The shape of the text lines obtained in this step is therefore generally quadrangular. Specifically, according to the text typesetting of the picture to be detected, corresponding data can be selected as a training set to perform model training, and if the text of the picture to be detected is the horizontal typesetting, the data of the text detection model training set are all pictures of the horizontal text.

Specifically, after the target picture is input into the text detection model, the text line information output by the text detection model is the information of one or more text lines in the target picture. All text line information output by the text detection model may be characterized as D ═ { p1, p2, p3,. and pn }, where N ═ 1,2,3,. and N, and N characterizes the number of text lines in the detected target picture. When each text line information includes four vertex coordinates of the text line, the four vertex coordinates of each text line may be characterized as pi { (x1, y1), (x2, y2), (x3, y3), (x4, y4) }, where (x1, y1) is a text line upper-left corner coordinate point, (x2, y2) is a text line upper-right corner coordinate point, (x3, y3) is a text line lower-right corner coordinate point, and (x4, y4) is a text line lower-left corner coordinate point.

In the process of training the text detection model, the number of training sample pictures used may be 500, 800, 1000, and so on, and specifically may be determined by a developer according to actual conditions.

And step 33, calculating to obtain a width value corresponding to each text line according to the coordinates of each text line.

Specifically, the step of calculating the width value of each corresponding text line according to the coordinates of each text line includes:

s331, calculating to obtain the original width and the original height of each text line according to the coordinates of the four vertexes of each text line;

specifically, since the text line detected by the text detection model is a rectangular text line, there is coordinate information of four vertices of the rectangular text line, and the width and height of the text line can be calculated from the values of the vertices, such as: width ═ (x _ max-x _ min); height ═ (y _ max-y _ min)).

S332, zooming each text line to the same preset height, and calculating the zooming ratio corresponding to each text line according to the original height and the preset height of each text line;

specifically, because the function of TensorRT batch processing requires the resolution of each batch of pictures, namely the width and the height are consistent; the resolution of the text lines detected by the text detection model is various, and then all the detected text lines need to be scaled; therefore, it is necessary to calculate the scaling ratio, which is the scaled height/original height, by knowing the original height and the scaled height of the text line. In one embodiment, the height of the scaling may be uniformly set to 32 pixels.

S323, obtaining the current width value of each text line according to the original width and the scaling ratio of each text line.

Specifically, the scaled width is the original width scaling ratio, and thus the scaled width value is calculated.

And step 34, sequencing the text lines according to the width values, traversing all the width values, splicing the text line with the longest width and the text line with the shortest width to form a long text, repeating the operation, and stopping splicing when the width value of the long text being spliced exceeds a width threshold value.

Specifically, sorting the text lines from small to large or from large to small according to the width value of the text lines to form a text line set, combining the sorted text line set from two ends, namely traversing from the longest text and the shortest text at the same time, and splicing the text lines together, namely splicing the two text lines together end to end, for example, setting the width threshold value to be 1600 pixels, selecting two text lines a and b from the file line set in sequence, splicing the two text lines together to obtain a new text line c, namely a long text c, then continuously splicing the new text line c with other text lines in the set to obtain a new long text d, and repeating the operation until the width threshold value is exceeded after the next text line is added, so that a long text is formed. The preset width threshold is a prior value set according to an experimental condition, the prior value preferably adopts 1600 pixels, and the prior value is fixed and has no relation with other factors, and is an experimental value.

And step 35, detecting the long text, and if the width value of the long text does not reach the width threshold value, repairing the width of the long text according to the width threshold value.

Specifically, the step of detecting the long text, and if the width value of the long text does not reach the width threshold, repairing the width of the long text according to the width threshold includes:

s351, detecting the long text, and judging whether the width value of the long text reaches a width threshold value, wherein the width threshold value is 1600 pixels;

in general, few spliced long texts are exactly equal to the width threshold, and therefore, the long texts need to be supplemented, so that the long texts are supplemented to the set long text width. For example, when the width value of the long text formed by splicing is 1580 pixels, then the preset width of the long text is 1600 pixels, but since no short text with the width value of 20 pixels in the text line set and the 1580 pixel width value can be spliced again to form a long text with 1600 pixels, 20 pixels are added at the tail of the spliced long text.

And S352, if not, selecting a black edge of the color three channels of the picture to be recognized, and adding the black edge to the width of the long text according to the width threshold value, so that the width of the long text reaches 1600 pixels.

In particular, the source of the interpolation is the black border of the three color channels (RGB) from the picture to be recognized, to complement 1600 pixel widths. More specifically, the width of each long text is limited to a size, such as 1600 pixels, because the TensorRT framework requires the same resolution for each batch of pictures that are input when the framework is called for batch processing.

And step 36, performing splicing and repairing operations on the remaining text lines until all the text lines form long texts reaching the width threshold value so as to form a long text set.

Specifically, for the long text set, after completing the patching of one long text, the next iteration may be performed until all the long texts are traversed. In one embodiment, the sequence number of the text line of each long text combination and their corresponding position in the original image are recorded simultaneously. When the serial number detects a stack of text lines from the text detection model, the text line serial numbers such as 0,1,2 and … need to be set; the position is the position coordinate of the original image, i.e. the coordinates of the four vertices of each text line. The purpose of this step is to perform a batch process operation before performing the TensorRT batch process, so that the TensorRT batch process is fed again, which is equal to performing two batch processes, thereby achieving the speed-up effect.

And step 37, inputting the long text set into a preset text recognition model for recognition, and disassembling a returned result to obtain a recognition result of the picture to be recognized.

Specifically, the step of inputting the long text set into a preset text recognition model for recognition, and disassembling the returned result to obtain the recognition result of the picture to be recognized includes:

s371, carrying out recognition calculation of a text recognition model on the long text set according to batch processing operation of TensorRT, wherein the text recognition model adopts a CRNN algorithm;

specifically, a text recognition model needs to be trained in advance, the text recognition model adopts a CRNN algorithm, then the model is deployed in a TensoRT framework, then the text recognition is performed by using the batch processing inference characteristic of the TensoRT, and the recognition result is the text content of each long text.

S372, splitting the text strings identified by the long text and corresponding to the corresponding text lines;

specifically, the returned result needs to be disassembled, in which each text string recognized by the long text with a width of 1600 pixels is disassembled to correspond to a corresponding text line, where the corresponding text line is the initial text line spliced into the long text. For example, a 1600-pixel long text line is composed of four short text lines a, b, c, and d in sequence, and since the text line numbers of 0,1,2, … are already set when the text detection model detects a stack of text lines, and the position coordinates of the text lines are recorded at the same time, the numbers and the position coordinates of the four short text lines a, b, c, and d can be known. According to the text recognition model, one character candidate exists in every four pixels, 400 character candidates form a candidate set in a long text line with the width of 1600 pixels, if the width of a is 400 pixels, the candidate set corresponds to 100 character candidate sets, and so on, as the initial text line has a serial number and a position coordinate, a character string recognized by the long text formed by splicing the text lines also correspondingly has a serial number and a position coordinate, and thus, the character string recognized by the text recognition model can be split into each corresponding short text line.

And S373, obtaining an identification result of the picture to be identified.

Specifically, when composing a long text, the serial number of each short text line composing the long text and their coordinate information have been recorded. After the long text recognition results are disassembled, we know what the strings are for each short text line and where the strings are on the original image. Accordingly, it can be known which short text lines each long text is composed of, so that the recognition result is split into the recognition results of the corresponding short text lines, the coordinate values of each short text line are mapped one by one, and finally the OCR recognition result of the picture, including the coordinate position and the text content of each text line, is returned.

According to the steps, after a plurality of times of experiments, the text combination mode of the invention can realize about 30% speed increase under a TensorRT framework.

OCR (Optical Character Recognition) refers to inputting a manuscript image into a computer through a scanner or other input tools, then preprocessing the image by the computer, finally recognizing each Character in the preprocessed image and converting the Character into a corresponding Chinese Character code.)

In an alternative embodiment, it is also possible to: and uploading the result of the identification method of the picture text to a block chain.

Specifically, the corresponding digest information is obtained based on the result of the identification method of the picture text, and specifically, the digest information is obtained by hashing the result of the identification method of the picture text, for example, using the sha256s algorithm. Uploading summary information to the blockchain can ensure the safety and the fair transparency of the user. The user can download the summary information from the blockchain to verify whether the result of the identification method of the picture text is tampered. The blockchain referred to in this example is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

As shown in fig. 4, the present invention further provides a system for recognizing a picture text, which can be integrated in the computer device 110, and specifically can include a preprocessing module 20, a detection module 30, a width value module 40, a splicing module 50, a patching module 60, a long text aggregation module 70, and a recognition module 80.

The preprocessing module 20 is configured to acquire a picture to be identified and preprocess the picture; the detection module 30 is configured to detect the preprocessed picture by using a pre-trained text detection model, so as to obtain coordinates of each text line in the picture; the width value module 40 is configured to calculate a width value corresponding to each text line according to the coordinates of each text line; the splicing module 50 is configured to sort the text lines according to the width values, traverse all the width values, splice the text line with the longest width and the text line with the shortest width to form a long text, repeat the operation, and stop splicing when the width value of the long text being spliced is about to exceed the width threshold; the patching module 60 is configured to detect the long text, and patch the width of the long text according to the width threshold if the width value of the long text does not reach the width threshold; the long text set module 70 is configured to control the splicing module 50 and the patching module 60 to splice and patch the remaining text lines until all the text lines form a long text reaching the width threshold, so as to form a long text set; the recognition module 80 is configured to input the long text set into a preset text recognition model for recognition, and disassemble a returned result to obtain a recognition result of the picture to be recognized.

In one embodiment, the preprocessing step of the preprocessing module 20 includes:

and converting the zoomed picture into a gray scale image.

In one embodiment, the processing steps of the detection module 30 include:

In one embodiment, the processing steps of the width value module 40 include:

In one embodiment, the processing steps of the patching module 60 include:

In one embodiment, the processing steps of the identification module 80 include:

and obtaining the recognition result of the picture to be recognized.

In one embodiment, the identification system further includes a block chain module (not shown) configured to upload the identification result of the picture to be identified to a block chain after the identification result of the picture to be identified is obtained, so that the block chain encrypts and stores the identification result.

The processing steps of the above modules are described in detail in embodiments of the method and will not be described again.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an apparatus according to an embodiment of the present invention. As shown in fig. 5, the apparatus 200 includes a processor 201 and a storage 202 coupled to the processor 201.

The storage 202 stores program instructions for implementing the method for identifying picture texts according to any of the above embodiments.

The processor 201 is used to execute program instructions stored by the memory 202.

The processor 201 may also be referred to as a Central Processing Unit (CPU). The processor 201 may be an integrated circuit chip having signal processing capabilities. The processor 201 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a storage medium according to an embodiment of the invention. The storage medium of the embodiment of the present invention stores a program file 301 capable of implementing the method for identifying a picture text, where the program file 301 may be stored in the storage medium in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Claims

1. A picture text recognition method is characterized by comprising the following steps:

acquiring a picture to be identified, and preprocessing the picture;

2. The identification method of claim 1, wherein the step of preprocessing the picture comprises:

and converting the zoomed picture into a gray scale image.

3. The recognition method of claim 1, wherein the step of detecting the preprocessed image by using the pre-trained text detection model to obtain the coordinates of each text line in the image comprises:

4. The recognition method of claim 1, wherein the step of calculating the width value corresponding to each text line according to the coordinates of each text line comprises:

5. The method of claim 1, wherein the step of detecting the long text and repairing the width of the long text according to the width threshold if the width of the long text does not reach the width threshold comprises:

6. The recognition method of claim 1, wherein the step of inputting the long text set into a preset text recognition model for recognition and disassembling the returned result to obtain the recognition result of the picture to be recognized comprises:

and obtaining the recognition result of the picture to be recognized.

7. The identification method according to claim 1, wherein after the identification result of the picture to be identified is obtained, the identification result is uploaded to a block chain, so that the block chain encrypts and stores the identification result.

8. A recognition system for picture text, the recognition system comprising:

9. A computer device comprising a storage and a processor, the storage having stored therein readable instructions which, when executed by the processor, cause the processor to carry out the steps of the method of recognition of a picture text as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that a program file capable of implementing the recognition method of the picture text according to any one of claims 1 to 7 is stored.