CN112926564A

CN112926564A - Picture analysis method, system, computer device and computer-readable storage medium

Info

Publication number: CN112926564A
Application number: CN202110213670.1A
Authority: CN
Inventors: 何小臻
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2021-06-08
Anticipated expiration: 2041-02-25

Abstract

The invention provides a picture analysis method, a system, computer equipment and a computer readable storage medium, wherein the picture analysis method is used for preprocessing an acquired picture; detecting the preprocessed picture by using a text detection model to obtain the coordinates of each text line in the picture; randomly selecting a plurality of text lines, and outwards intercepting small rectangular picture blocks with preset sizes according to the central points of the selected text lines to form a plurality of small picture block sets with text information; inputting the small picture block set into a text quality model and/or a text direction model, and outputting a text quality judgment result and/or a text direction classification result; and constructing a voting mechanism, and taking the majority result of the text quality judgment and/or the text direction classification as the text quality judgment result and/or the text direction classification result of the current picture. Therefore, the picture analysis method can effectively improve the accuracy and robustness of text quality judgment and text direction classification of the picture.

Description

Picture analysis method, system, computer device and computer-readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and a system for picture analysis, a computer device, and a computer-readable storage medium.

Background

The current OCR technology can be roughly divided into two aspects according to the difference of input sources (pictures): one is character recognition in a natural scene, and the other is a photographed image of a paper document or a scanned image thereof. Generally, before text recognition, there are some preprocessing operations, such as determining the quality of a current picture (e.g., the definition, integrity, and tilt angle of the picture), and for the situation of a photographed image of a paper document or an image of a scanned document thereof, most of the existing solutions for quality determination and character direction determination/correction are processing the whole picture without paying attention to information of characters in the picture, so that there are problems of poor effect or algorithm robustness.

Disclosure of Invention

Based on the above, the invention provides a picture analysis method, a picture analysis system, a computer device and a computer readable storage medium, so as to improve the accuracy and robustness of text quality judgment and text direction classification.

In order to achieve the above object, the present invention provides an image analysis method, including:

acquiring a picture to be identified, and preprocessing the picture;

detecting the preprocessed picture by using a text detection model trained in advance to obtain the coordinates of each text line in the picture;

randomly selecting a plurality of text lines, determining the center point of each text line according to the coordinates of the selected text line, and intercepting rectangular small picture blocks with preset sizes from the center point outwards to form a plurality of small picture block sets with text information;

inputting the small picture block set into a pre-trained text quality model and/or a text direction model, and outputting a text quality judgment result and/or a text direction classification result;

and constructing a voting mechanism, and taking the majority result of the text quality judgment and/or the text direction classification as the text quality judgment result and/or the text direction classification result of the current picture.

Preferably, the step of preprocessing the picture includes:

zooming the picture, wherein the zoomed maximum width of the picture is not more than 1600 pixels, the maximum height of the zoomed picture is not more than 2400 pixels, the minimum width is not less than 600 pixels, and the minimum height is not less than 800 pixels;

and converting the zoomed picture into a gray scale image.

Preferably, the step of detecting the preprocessed picture by using the pre-trained text detection model to obtain the coordinates of each text line in the picture includes:

calling a trained text detection model, wherein the text detection model adopts a dbnet algorithm, a preprocessed picture is input into the text detection model, and the text detection model outputs a probability value of a corresponding text in a pixel point of the picture;

carrying out thresholding selection on pixel points in the picture, and dividing the pixel points in the picture into text pixel points and non-text pixel points according to the probability value to obtain a binary image;

calculating a connected domain set of the binary image by using a first image processing algorithm;

and inputting the connected domain set into a second image processing algorithm, and calculating a minimum circumscribed rectangle of each connected domain, wherein four vertexes of the minimum circumscribed rectangle are coordinates of text lines.

Preferably, the step of randomly selecting a plurality of text lines, determining a center point of each text line according to coordinates of the selected text line, and intercepting a preset size of a small rectangular picture block from the center point to the outside to form a plurality of small picture block sets with text information includes:

randomly selecting a plurality of text lines;

calculating to obtain the center point and the height of the corresponding text line according to the coordinates of each vertex of the selected text line;

and according to the central point of each text line, outwards intercepting small rectangular picture blocks with the length not exceeding one half of the height of the text line to form a plurality of small picture block sets with text information.

Preferably, the small rectangular picture blocks are square.

Preferably, the text quality model and the text direction model both use a resnet18 residual network, the text quality determination result includes clearness and unsharpness, and the text direction classification result includes 0 degree, 90 degrees, 180 degrees and 270 degrees.

Preferably, after the text quality judgment result and/or the text direction classification result are obtained, the text quality judgment result and/or the text direction classification result are uploaded to a block chain, so that the block chain encrypts and stores the text quality judgment result and/or the text direction classification result.

In order to achieve the above object, the present invention further provides an image analysis system, including:

the preprocessing module is used for acquiring a picture to be identified and preprocessing the picture;

the detection module is used for detecting the preprocessed pictures by utilizing the pre-trained text detection model to obtain the coordinates of each text line in the pictures;

the picture block module is used for randomly selecting a plurality of text lines, determining the central point of each text line according to the coordinates of the selected text line, and intercepting a small rectangular picture block with a preset size from the central point outwards to form a plurality of small picture block sets with text information;

the result module is used for inputting the small picture block set into a pre-trained text quality model and/or a text direction model and outputting a text quality judgment result and/or a text direction classification result;

and the voting module is used for constructing a voting mechanism and taking the majority result of the text quality judgment and/or the text direction classification as the text quality judgment result and/or the text direction classification result of the current picture.

To achieve the above object, the present invention further provides a computer device, which includes a storage and a processor, wherein the storage stores readable instructions, and the readable instructions, when executed by the processor, cause the processor to execute the steps of the picture analysis method as described above.

To achieve the above object, the present invention also provides a computer-readable storage medium storing a program file capable of implementing the picture analysis method as described above.

The invention provides a picture analysis method, a system, computer equipment and a computer readable storage medium, wherein the picture analysis method is used for preprocessing a picture to be identified by acquiring the picture; detecting the preprocessed picture by using a text detection model trained in advance to obtain the coordinates of each text line in the picture; randomly selecting a plurality of text lines, determining the center point of each text line according to the coordinates of the selected text line, and intercepting rectangular small picture blocks with preset sizes from the center point outwards to form a plurality of small picture block sets with text information; inputting the small picture block set into a pre-trained text quality model and/or a text direction model, and outputting a text quality judgment result and/or a text direction classification result; and constructing a voting mechanism, and taking the majority result of the text quality judgment and/or the text direction classification as the text quality judgment result and/or the text direction classification result of the current picture. Therefore, the method for judging the text quality and the text direction classification in an auxiliary manner by using the text line information, which is provided by the picture analysis method, has the advantage that the accuracy and the robustness of the effect are obviously improved after a plurality of times of algorithm self-test and comparison experiments.

Drawings

FIG. 1 is a diagram of an implementation environment of a picture analysis method provided in one embodiment;

FIG. 2 is a block diagram that illustrates the internal architecture of the computing device, in one embodiment;

FIG. 3 is a flow diagram of a method for picture analysis in one embodiment;

FIG. 4 is a schematic diagram of a picture analysis system in one embodiment;

FIG. 5 is a schematic diagram of a computer apparatus in one embodiment;

FIG. 6 is a block diagram of a computer-readable storage medium in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another.

Fig. 1 is a diagram of an implementation environment of a picture analysis method provided in an embodiment, as shown in fig. 1, in the implementation environment, including a computing device 110 and a display device 120.

The computing device 110 may be a computer device such as a computer used by a user, and a picture analysis system is installed on the computing device 110. When calculating, the user may perform the calculation according to a picture analysis method at the calculation device 110 and display the calculation result through the display device 120.

It should be noted that the computing device 110 and the display device 120 may be, but are not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like.

FIG. 2 is a diagram showing an internal configuration of a computer device according to an embodiment. As shown in fig. 2, the computer apparatus includes a processor, a nonvolatile storage medium, a storage, and a network interface connected through a system bus. The non-volatile storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can enable the processor to realize a picture analysis method when being executed by the processor. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have computer readable instructions stored therein that, when executed by the processor, cause the processor to perform a picture analysis method. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 2 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

As shown in fig. 3, in an embodiment, a picture analysis method is provided, which may be applied to the computing device 110 and the display device 120, and specifically includes the following steps:

and step 31, acquiring a picture to be identified, and preprocessing the picture.

Specifically, the specific step of preprocessing the picture includes:

s311, zooming the picture, wherein the zoomed maximum width of the picture is not more than 1600 pixels, the zoomed maximum height of the picture is not more than 2400 pixels, the zoomed minimum width of the picture is not less than 600 pixels, and the zoomed minimum height of the picture is not less than 800 pixels;

specifically, a color picture is obtained, the picture is zoomed at first, and a prior value is obtained according to the resolution condition of the real picture in the zooming process, namely the maximum width of the picture does not exceed 1600 pixels, and the maximum height of the picture does not exceed 2400 pixels; the minimum width is not less than 600 pixels, the minimum height is not less than 800 pixels, and the scaling of the picture is not fixed but basically fixed in the intervals. More specifically, the picture scaling is performed according to the real resolution of the picture to be recognized, if the picture is too small, the picture is enlarged a little, otherwise, the text detection effect is poor, if the picture is too large, the picture is reduced a little, otherwise, the text detection time is too long. For all the pictures to be identified, the width and height of the picture are limited within the interval by scaling, that is, the picture is within the interval, further, the picture scaling is performed according to an interpolation algorithm, and more specifically, the picture scaling is implemented by using an opencv tool, which is a general algorithm used in image processing and computer vision, and the implementation includes: reading pictures, and converting the pictures into arrays; converting array data; displaying the array data in a window; image saving; intercepting an image; data slices for BGR; calculating array pixel values with the same size; fusing pictures; scaling of pictures, etc.

And S312, converting the zoomed picture into a gray scale image.

Specifically, the image is converted into a gray-scale image after being zoomed, and in a general practical use scene, the image uploaded by a user is a color image generally, but the gray-scale image is required to be processed by a subsequent algorithm, so that the input image is converted into the gray-scale image.

Further, the color picture is converted into a gray scale image, i.e. 3 channels (RGB) of the picture are converted into 1 channel. There are generally three ways to implement this, respectively:

(1) the average method, the simplest method, is the average method of course, averages the values of 3 channels RGB at the same pixel position, I (x, y) 1/3 × I _ R (x, y) +1/3 × I _ G (x, y) +1/3 × I _ B (x, y).

(2) The maximum-minimum averaging method is to average the maximum and minimum brightness values of RGB at the same pixel position.

(3) Weighted average method, I (x, y) ═ 0.3 × I _ R (x, y) +0.59 × I _ G (x, y) +0.11 × I _ B (x, y); this is the most popular approach. Several weighting factors 0.3,0.59,0.11 are parameters adjusted according to the human brightness perception system, which is a widely used standardized parameter.

And step 32, detecting the preprocessed picture by using the pre-trained text detection model to obtain the coordinates of each text line in the picture.

The text detection model can detect coordinates of each text line in the picture, wherein the coordinates refer to coordinates of four vertexes of a minimum circumscribed rectangle of the text line, an origin of the coordinates is a vertex at the upper left corner of the picture, coordinate axes take the width of the picture as horizontal coordinates, and the height of the picture as vertical coordinates.

Specifically, the step of detecting the preprocessed picture by using the pre-trained text detection model to obtain the coordinates of each text line in the picture includes:

s321, calling a trained text detection model, wherein the text detection model adopts a dbnet algorithm, a preprocessed picture is input into the text detection model, and the text detection model outputs a probability value of a corresponding text in a pixel point of the picture;

specifically, the text detection model may be constructed by using a text detection algorithm based on pixel segmentation, and an attention mechanism may be added to the text detection model. The text detection algorithm based on segmentation may be any one of the algorithms of SENEt, DBNet, PixelLink, etc. In this embodiment, a trained text detection model needs to be called, where the text detection model adopts a dbnet algorithm (Differentiable Binarization Network), and the dbnet algorithm is a text segmentation model designed based on a picture segmentation method. According to the dbnet algorithm, data marked by a user can be used as a training set, and a desired text detection model is obtained through training. Specifically, the result output by the text detection model is the probability (0-1) of each pixel point in the picture, for example, a picture with 100 pixels x100 pixels, and after calculation by dbnet, the probability values corresponding to the 10000 pixel points and belonging to the text are output, that is, how many of the 10000 pixel points correspond to the text in the picture, and how many correspond to the hollow part or the non-text part in the picture.

S322, carrying out thresholding selection on pixel points in the picture, and dividing the pixel points in the picture into text pixel points and non-text pixel points according to the probability value to obtain a binary image;

specifically, a user can set a threshold value, and the 10000 pixel points are divided into text pixel points and non-text pixel points through the probability value, wherein the non-text pixel points can be pixel points belonging to a blank part in a picture, so that a binary image can be formed.

S323, calculating a connected domain set of the binary image by using a first image processing algorithm;

specifically, the first image processing algorithm may be the computer vision processing kit opencv widely used in the industry.

And S324, inputting the connected domain set into a second image processing algorithm, and calculating a minimum circumscribed rectangle of each connected domain, wherein four vertexes of the minimum circumscribed rectangle are coordinates of text lines.

Specifically, the second image processing algorithm calls the findcontouus function of opencv.

It should be noted that the text behavior output by the text detection model includes at least one text line of a text, which may be understood as an outline image of a text or an outline image of a text line/column. In a possible embodiment, the text line may also be a sub-image that is segmented from the target picture and includes text.

In some embodiments, a line of text is a bounding box in units of a particular content, where the particular content may be a word, a line of words, or a single word, among others. In some embodiments, the text detection model may generate different lines of text based on the type of text in the picture to be recognized. For example, when the picture contains an english text, the text detection model may frame the english text of the picture line by taking a word as a unit, and generate a plurality of text lines. For another example, when the picture to be recognized includes a chinese character, the text detection model may frame the chinese text of the picture in units of lines to generate a plurality of text lines, and it can be understood that the text in the text lines determined in this embodiment is a line of chinese text. For another example, when the picture to be recognized contains chinese, the text detection model may frame the chinese text of the picture in units of a single word to generate a plurality of text lines, and it can be understood that the text in the text line determined in this embodiment is a word.

As will be readily understood by those skilled in the art, most of the characters in the pictures are laid out regularly, and the characters are generally arranged in a straight line, such as a horizontal direction, a vertical direction, and an oblique direction. The shape of the text lines obtained in this step is therefore generally quadrangular. Specifically, according to the text typesetting of the picture to be detected, corresponding data can be selected as a training set for model training, and if the text of the picture to be detected is transversely typeset, the data of the text detection model training set are all pictures with the text being transversely typeset. In this embodiment, the characters of the picture to be detected are all horizontally typeset.

Specifically, after the target picture is input into the text detection model, the text line information output by the text detection model is the information of one or more text lines in the target picture. All text line information output by the text detection model may be characterized as D ═ { p1, p2, p3,. and pn }, where N ═ 1,2,3,. and N, and N characterizes the number of text lines in the detected target picture. When each text line information includes four vertex coordinates of the text line, the four vertex coordinates of each text line may be characterized as pi { (x1, y1), (x2, y2), (x3, y3), (x4, y4) }, where (x1, y1) is a text line upper-left corner coordinate point, (x2, y2) is a text line upper-right corner coordinate point, (x3, y3) is a text line lower-right corner coordinate point, and (x4, y4) is a text line lower-left corner coordinate point.

In the process of training the text detection model, the number of training sample pictures used may be 500, 800, 1000, and so on, and specifically may be determined by a developer according to actual conditions.

And step 33, randomly selecting a plurality of text lines, determining the central point of each text line according to the coordinates of the selected text line, and intercepting the small rectangular picture blocks with preset sizes from the central point outwards to form a plurality of small picture block sets with text information.

Specifically, the small picture blocks may be rectangular or square, but it is necessary to ensure that the sizes of the small picture blocks are consistent.

In a preferred embodiment, the step of randomly selecting a plurality of text lines, determining a center point of each text line according to coordinates of the selected text line, and capturing a preset size of rectangular small picture block outwards to form a plurality of small picture block sets with text information includes:

s331, randomly selecting a plurality of text lines;

s332, calculating to obtain the center point and the height of the corresponding text line according to the coordinates of each vertex of the selected text line;

when the coordinates corresponding to each selected text line are found, the position of the center point of the text line can be known through the coordinates of the text line, the center point is the center point of a rectangle framing the text line, namely the intersection point of two oblique lines, or the middle value between the width and the height of the text line, and it can be expected that the probability of characters existing in the center point is higher, and the subsequent detection result is more accurate.

Because the text line detected by the text detection model is a rectangular text line, there is coordinate information of four vertices of the rectangular text line, and the width and height of the text line can be calculated from the values of the vertices, such as: width ═ (x _ max-x _ min); height ═ (y _ max-y _ min)).

S333, according to the center point of each text line, cutting out small square picture blocks with the length not exceeding one half of the height of the text line outwards to form a plurality of small picture block sets with text information.

The present embodiment is preferably a small square picture block, and selecting a square is convenient for subsequent scaling, so as to implement fast processing of the whole scheme. In the process of selecting the square, if the randomly selected text lines have the condition of high inconsistency, three schemes can be used for intercepting the square picture block. The first scheme is as follows: determining the height of each text line to obtain the minimum text line height h, and then cutting out a small picture block with the height h not more than half from the central point of each text line outwards along the width direction after knowing the height h of the text line to form a square picture block; the second scheme is as follows: intercepting each text line outwards along the width direction to obtain a picture block with the height not more than half of the height of the text line, and subsequently zooming to the same proportion; in the third scheme: and zooming each text line to the same preset height, and then finishing the interception of the small square picture block. Of course, if the selected text lines are consistent in height, the second scheme is adopted, only the subsequent scaling is not needed, for example, a region with a length of 112 pixels can be cut from the center point, so that a square small picture with 224 pixels x224 pixels is formed, and the length of 112 pixels is an experimental value, which can meet the requirement in general. Therefore, a plurality of small picture block sets with text information can be collected in a text positioning mode, one small picture block is selected from each text line in randomly selected text lines, the number of the small picture blocks corresponding to the text lines is large, for example, 100 text lines are detected by a text detection model, and then 15 text lines are randomly selected to intercept the small square picture blocks, so that 15 small square picture blocks are formed.

Further, the step of intercepting outwards comprises: determining the width and the height of the small picture block to be intercepted by adopting an opencv tool, knowing the coordinates of four vertexes of the small picture block to be intercepted, and intercepting pixels in the four vertexes corresponding to the small picture block in the corresponding text line to form the small picture block.

More specifically, by means of steps 32 and 33, the problem that the upper half of the picture is selected by the user, but the lower half of the picture has a greater influence on the picture quality can be solved. Therefore, a user firstly detects all text lines of the picture by using a text detection model, and then randomly selects and cuts out small square picture blocks from the text lines, so that all the selected small square picture blocks are ensured to contain characters.

And step 34, inputting the small picture block set into a pre-trained text quality model and/or text direction model, and outputting a text quality judgment result and/or a text direction classification result.

Specifically, the text quality model and the text direction model both adopt a resnet18 residual error network, wherein data used by the document quality model is data labeled by a user, and can be used for data training and prediction, unclear pictures and clear pictures can be selected as training data, different scores can be labeled according to the definition of the pictures, and similarly, the training data selected by the text direction model can label the directions of characters in a text line, and can label 0 degree, 90 degrees, 180 degrees, 270 degrees and the like. And after the models are converged, using the small picture block set as a shared input picture of the two models, thereby obtaining the text quality judgment score or the text direction classification result of the small picture block. The text quality judgment results include two results: clear and unclear, in the judging process, a score threshold value is set, for example, the score of the threshold value is 60 points, the score of a given picture of the model is clear when the score exceeds 60 points, otherwise, the score is unclear; the results of text direction classification are four: 0 degrees, 90 degrees, 180 degrees and 270 degrees, wherein 0 degrees represents that the text line character direction is correct and is towards the upper direction of the text line height, 90 degrees represents that the text line character direction is wrong, characters are towards the right direction of the text line width, 180 degrees represents that the text line character direction is wrong, characters are towards the lower direction of the text line height, 270 degrees represents that the text line character direction is wrong, and characters are towards the left direction of the text line width.

In this embodiment, the residual error network is characterized by being easily optimized and increasing the accuracy by adding a considerable depth, the residual error block inside the residual error network uses a jump connection, which alleviates the problem of gradient disappearance caused by adding a depth in the deep neural network, and further, the resnet18 residual error network can add a short circuit between two convolutional layers, where 18 designates 18 layers with weights, including convolutional layers and fully-connected layers, excluding pooling layers and BN layers.

And step 35, constructing a voting mechanism, and taking the majority result of the text quality judgment and/or the text direction classification as the text quality judgment result and/or the text direction classification result of the current picture.

Specifically, according to a voting mechanism, most results are used as final quality judgment and text direction classification results of the current picture. Taking the document quality judgment as an example: if 15 small square picture blocks are selected, after model processing, 2 are judged to be unclear, and 13 are judged to be clear, then the picture is judged to be clear in quality in a few majority-obeying modes. Examples are classified in text direction: as long as the number of certain angles is the largest, the angle is directly used as the angle of the characters in the text line of the picture, for example, 6 of the existing 10 small picture blocks are 0 degree, and 4 of the existing 10 small picture blocks are other degrees, which indicates that the text direction of the picture is correct, so that the characters of the picture do not need to be corrected.

Further, the voting mechanism is a combination strategy for classification problems in ensemble learning, and the basic idea is to select the class with the most output from all machine learning algorithms. The machine learning classification algorithm has two types, one is to directly output class labels, and the other is to output class probabilities. The label of the text quality judgment is clear or unclear; for text direction classification, labels are 0 degree, 90 degrees, 180 degrees and 270 degrees. And after the model outputs the results, labeling each result, and counting the result with the most labels, wherein the result is the final result.

After a plurality of experimental iterations and tests, a user can verify whether the logic designed by the technical scheme of the invention is feasible, such as whether the voting mechanism and the trained model accuracy are robust and stable, wherein the accuracy of the text quality judgment result of the picture is stable at about 97%, and the accuracy of the text direction classification result of the picture is stable at about 99%. Obviously, the technical scheme of the invention can effectively improve the robustness of the algorithm.

The scheme is used for further preparing for image recognition, for example, whether the image quality meets the requirements is determined, the image meeting the requirements can be subjected to character recognition, whether the text direction meets the requirements is determined, after the preparation work is done, the character recognition is carried out again, and the text recognition can be accurately and quickly completed.

In an alternative embodiment, it is also possible to: and uploading the text quality judgment result and/or the text direction classification result of the current picture to the block chain.

Specifically, the corresponding digest information is obtained based on the result of the picture analysis method, and specifically, the digest information is obtained by hashing the result of the picture analysis method, for example, by using the sha256s algorithm. Uploading summary information to the blockchain can ensure the safety and the fair transparency of the user. The user can download the summary information from the blockchain to verify whether the result of the picture analysis method is tampered. The blockchain referred to in this example is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

As shown in fig. 4, the present invention further provides a picture analysis system, which can be integrated in the computing device 110, and specifically can include a preprocessing module 20, a detection module 30, a picture block module 40, a result module 50, and a voting module 60.

The system comprises a preprocessing module 20, a recognition module and a recognition module, wherein the preprocessing module 20 is used for acquiring a picture to be recognized and preprocessing the picture; the detection module 30 is configured to detect the preprocessed picture by using a pre-trained text detection model, so as to obtain coordinates of each text line in the picture; the picture block module 40 is used for randomly selecting a plurality of text lines, determining the central point of each text line according to the coordinates of the selected text line, and intercepting a small rectangular picture block with a preset size from the central point outwards to form a plurality of small picture block sets with text information; a result module 50, configured to input the small picture block set into a pre-trained text quality model and/or text direction model, and output a text quality determination result and/or a text direction classification result; and the voting module 60 is configured to construct a voting mechanism, and use the majority result of the text quality judgment and/or the text direction classification as the text quality judgment result and/or the text direction classification result of the current picture.

In one embodiment, the preprocessing step of the preprocessing module 20 includes:

and converting the zoomed picture into a gray scale image.

In one embodiment, the processing steps of the detection module 30 include:

In one embodiment, the processing steps of the picture block module 40 include:

randomly selecting a plurality of text lines;

Further, the small rectangular picture blocks are square.

In one embodiment, the text quality model and the text direction model both use a resnet18 residual network, the text quality determination result includes clearness and unsharpness, and the text direction classification result includes 0 degree, 90 degrees, 180 degrees and 270 degrees.

In an embodiment, the picture analysis system further includes a block chain module (not shown) configured to, after obtaining the text quality determination result and/or the text direction classification result, upload the text quality determination result and/or the text direction classification result to a block chain, so that the block chain encrypts and stores the text quality determination result and/or the text direction classification result.

The processing steps of the above modules are described in detail in embodiments of the method and will not be described again.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown in fig. 5, the apparatus 200 includes a processor 201 and a storage 202 coupled to the processor 201.

The storage 202 stores program instructions for implementing the picture analysis method according to any of the above embodiments.

The processor 201 is used to execute program instructions stored by the memory 202.

The processor 201 may also be referred to as a Central Processing Unit (CPU). The processor 201 may be an integrated circuit chip having signal processing capabilities. The processor 201 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a storage medium according to an embodiment of the invention. The computer-readable storage medium of the embodiment of the present invention stores a program file 301 capable of implementing the picture analysis method, where the program file 301 may be stored in the storage medium in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Claims

1. A picture analysis method, characterized in that the analysis method comprises:

acquiring a picture to be identified, and preprocessing the picture;

2. The analysis method of claim 1, wherein the step of pre-processing the picture comprises:

and converting the zoomed picture into a gray scale image.

3. The analysis method as claimed in claim 1, wherein the step of detecting the preprocessed image by using the pre-trained text detection model to obtain the coordinates of each text line in the image comprises:

4. The analysis method as claimed in claim 1, wherein the step of randomly selecting a plurality of text lines, determining a center point of each text line according to the coordinates of the selected text line, and intercepting a small rectangular picture block with a preset size from the center point to form a plurality of small picture block sets with text information comprises:

randomly selecting a plurality of text lines;

5. The analysis method of claim 4, wherein the rectangular small picture blocks are square.

6. The analysis method as claimed in claim 1, wherein the text quality model and the text direction model both use a resnet18 residual network, the text quality determination result includes clearness and unsharpness, and the text direction classification result includes 0 degree, 90 degrees, 180 degrees, and 270 degrees.

7. The analysis method according to claim 1, wherein after the text quality determination result and/or the text direction classification result are obtained, the text quality determination result and/or the text direction classification result are uploaded to a blockchain, so that the blockchain encrypts and stores the text quality determination result and/or the text direction classification result.

8. A picture analysis system, the analysis system comprising:

9. A computer device comprising a storage and a processor, the storage having stored therein readable instructions which, when executed by the processor, cause the processor to carry out the steps of the picture analysis method according to any one of claims 1 to 7.

10. A computer-readable storage medium, characterized in that a program file capable of implementing the picture analysis method according to any one of claims 1 to 7 is stored.