CN112926564A - Picture analysis method, system, computer device and computer-readable storage medium - Google Patents

Picture analysis method, system, computer device and computer-readable storage medium Download PDF

Info

Publication number
CN112926564A
CN112926564A CN202110213670.1A CN202110213670A CN112926564A CN 112926564 A CN112926564 A CN 112926564A CN 202110213670 A CN202110213670 A CN 202110213670A CN 112926564 A CN112926564 A CN 112926564A
Authority
CN
China
Prior art keywords
text
picture
result
analysis method
small
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110213670.1A
Other languages
Chinese (zh)
Other versions
CN112926564B (en
Inventor
何小臻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202110213670.1A priority Critical patent/CN112926564B/en
Priority claimed from CN202110213670.1A external-priority patent/CN112926564B/en
Publication of CN112926564A publication Critical patent/CN112926564A/en
Application granted granted Critical
Publication of CN112926564B publication Critical patent/CN112926564B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/242Aligning, centring, orientation detection or correction of the image by image rotation, e.g. by 90 degrees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a picture analysis method, a system, computer equipment and a computer readable storage medium, wherein the picture analysis method is used for preprocessing an acquired picture; detecting the preprocessed picture by using a text detection model to obtain the coordinates of each text line in the picture; randomly selecting a plurality of text lines, and outwards intercepting small rectangular picture blocks with preset sizes according to the central points of the selected text lines to form a plurality of small picture block sets with text information; inputting the small picture block set into a text quality model and/or a text direction model, and outputting a text quality judgment result and/or a text direction classification result; and constructing a voting mechanism, and taking the majority result of the text quality judgment and/or the text direction classification as the text quality judgment result and/or the text direction classification result of the current picture. Therefore, the picture analysis method can effectively improve the accuracy and robustness of text quality judgment and text direction classification of the picture.

Description

Picture analysis method, system, computer device and computer-readable storage medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method and a system for picture analysis, a computer device, and a computer-readable storage medium.
Background
The current OCR technology can be roughly divided into two aspects according to the difference of input sources (pictures): one is character recognition in a natural scene, and the other is a photographed image of a paper document or a scanned image thereof. Generally, before text recognition, there are some preprocessing operations, such as determining the quality of a current picture (e.g., the definition, integrity, and tilt angle of the picture), and for the situation of a photographed image of a paper document or an image of a scanned document thereof, most of the existing solutions for quality determination and character direction determination/correction are processing the whole picture without paying attention to information of characters in the picture, so that there are problems of poor effect or algorithm robustness.
Disclosure of Invention
Based on the above, the invention provides a picture analysis method, a picture analysis system, a computer device and a computer readable storage medium, so as to improve the accuracy and robustness of text quality judgment and text direction classification.
In order to achieve the above object, the present invention provides an image analysis method, including:
acquiring a picture to be identified, and preprocessing the picture;
detecting the preprocessed picture by using a text detection model trained in advance to obtain the coordinates of each text line in the picture;
randomly selecting a plurality of text lines, determining the center point of each text line according to the coordinates of the selected text line, and intercepting rectangular small picture blocks with preset sizes from the center point outwards to form a plurality of small picture block sets with text information;
inputting the small picture block set into a pre-trained text quality model and/or a text direction model, and outputting a text quality judgment result and/or a text direction classification result;
and constructing a voting mechanism, and taking the majority result of the text quality judgment and/or the text direction classification as the text quality judgment result and/or the text direction classification result of the current picture.
Preferably, the step of preprocessing the picture includes:
zooming the picture, wherein the zoomed maximum width of the picture is not more than 1600 pixels, the maximum height of the zoomed picture is not more than 2400 pixels, the minimum width is not less than 600 pixels, and the minimum height is not less than 800 pixels;
and converting the zoomed picture into a gray scale image.
Preferably, the step of detecting the preprocessed picture by using the pre-trained text detection model to obtain the coordinates of each text line in the picture includes:
calling a trained text detection model, wherein the text detection model adopts a dbnet algorithm, a preprocessed picture is input into the text detection model, and the text detection model outputs a probability value of a corresponding text in a pixel point of the picture;
carrying out thresholding selection on pixel points in the picture, and dividing the pixel points in the picture into text pixel points and non-text pixel points according to the probability value to obtain a binary image;
calculating a connected domain set of the binary image by using a first image processing algorithm;
and inputting the connected domain set into a second image processing algorithm, and calculating a minimum circumscribed rectangle of each connected domain, wherein four vertexes of the minimum circumscribed rectangle are coordinates of text lines.
Preferably, the step of randomly selecting a plurality of text lines, determining a center point of each text line according to coordinates of the selected text line, and intercepting a preset size of a small rectangular picture block from the center point to the outside to form a plurality of small picture block sets with text information includes:
randomly selecting a plurality of text lines;
calculating to obtain the center point and the height of the corresponding text line according to the coordinates of each vertex of the selected text line;
and according to the central point of each text line, outwards intercepting small rectangular picture blocks with the length not exceeding one half of the height of the text line to form a plurality of small picture block sets with text information.
Preferably, the small rectangular picture blocks are square.
Preferably, the text quality model and the text direction model both use a resnet18 residual network, the text quality determination result includes clearness and unsharpness, and the text direction classification result includes 0 degree, 90 degrees, 180 degrees and 270 degrees.
Preferably, after the text quality judgment result and/or the text direction classification result are obtained, the text quality judgment result and/or the text direction classification result are uploaded to a block chain, so that the block chain encrypts and stores the text quality judgment result and/or the text direction classification result.
In order to achieve the above object, the present invention further provides an image analysis system, including:
the preprocessing module is used for acquiring a picture to be identified and preprocessing the picture;
the detection module is used for detecting the preprocessed pictures by utilizing the pre-trained text detection model to obtain the coordinates of each text line in the pictures;
the picture block module is used for randomly selecting a plurality of text lines, determining the central point of each text line according to the coordinates of the selected text line, and intercepting a small rectangular picture block with a preset size from the central point outwards to form a plurality of small picture block sets with text information;
the result module is used for inputting the small picture block set into a pre-trained text quality model and/or a text direction model and outputting a text quality judgment result and/or a text direction classification result;
and the voting module is used for constructing a voting mechanism and taking the majority result of the text quality judgment and/or the text direction classification as the text quality judgment result and/or the text direction classification result of the current picture.
To achieve the above object, the present invention further provides a computer device, which includes a storage and a processor, wherein the storage stores readable instructions, and the readable instructions, when executed by the processor, cause the processor to execute the steps of the picture analysis method as described above.
To achieve the above object, the present invention also provides a computer-readable storage medium storing a program file capable of implementing the picture analysis method as described above.
The invention provides a picture analysis method, a system, computer equipment and a computer readable storage medium, wherein the picture analysis method is used for preprocessing a picture to be identified by acquiring the picture; detecting the preprocessed picture by using a text detection model trained in advance to obtain the coordinates of each text line in the picture; randomly selecting a plurality of text lines, determining the center point of each text line according to the coordinates of the selected text line, and intercepting rectangular small picture blocks with preset sizes from the center point outwards to form a plurality of small picture block sets with text information; inputting the small picture block set into a pre-trained text quality model and/or a text direction model, and outputting a text quality judgment result and/or a text direction classification result; and constructing a voting mechanism, and taking the majority result of the text quality judgment and/or the text direction classification as the text quality judgment result and/or the text direction classification result of the current picture. Therefore, the method for judging the text quality and the text direction classification in an auxiliary manner by using the text line information, which is provided by the picture analysis method, has the advantage that the accuracy and the robustness of the effect are obviously improved after a plurality of times of algorithm self-test and comparison experiments.
Drawings
FIG. 1 is a diagram of an implementation environment of a picture analysis method provided in one embodiment;
FIG. 2 is a block diagram that illustrates the internal architecture of the computing device, in one embodiment;
FIG. 3 is a flow diagram of a method for picture analysis in one embodiment;
FIG. 4 is a schematic diagram of a picture analysis system in one embodiment;
FIG. 5 is a schematic diagram of a computer apparatus in one embodiment;
FIG. 6 is a block diagram of a computer-readable storage medium in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another.
Fig. 1 is a diagram of an implementation environment of a picture analysis method provided in an embodiment, as shown in fig. 1, in the implementation environment, including a computing device 110 and a display device 120.
The computing device 110 may be a computer device such as a computer used by a user, and a picture analysis system is installed on the computing device 110. When calculating, the user may perform the calculation according to a picture analysis method at the calculation device 110 and display the calculation result through the display device 120.
It should be noted that the computing device 110 and the display device 120 may be, but are not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like.
FIG. 2 is a diagram showing an internal configuration of a computer device according to an embodiment. As shown in fig. 2, the computer apparatus includes a processor, a nonvolatile storage medium, a storage, and a network interface connected through a system bus. The non-volatile storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can enable the processor to realize a picture analysis method when being executed by the processor. The processor of the computer device is used for providing calculation and control capability and supporting the operation of the whole computer device. The memory of the computer device may have computer readable instructions stored therein that, when executed by the processor, cause the processor to perform a picture analysis method. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 2 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
As shown in fig. 3, in an embodiment, a picture analysis method is provided, which may be applied to the computing device 110 and the display device 120, and specifically includes the following steps:
and step 31, acquiring a picture to be identified, and preprocessing the picture.
Specifically, the specific step of preprocessing the picture includes:
s311, zooming the picture, wherein the zoomed maximum width of the picture is not more than 1600 pixels, the zoomed maximum height of the picture is not more than 2400 pixels, the zoomed minimum width of the picture is not less than 600 pixels, and the zoomed minimum height of the picture is not less than 800 pixels;
specifically, a color picture is obtained, the picture is zoomed at first, and a prior value is obtained according to the resolution condition of the real picture in the zooming process, namely the maximum width of the picture does not exceed 1600 pixels, and the maximum height of the picture does not exceed 2400 pixels; the minimum width is not less than 600 pixels, the minimum height is not less than 800 pixels, and the scaling of the picture is not fixed but basically fixed in the intervals. More specifically, the picture scaling is performed according to the real resolution of the picture to be recognized, if the picture is too small, the picture is enlarged a little, otherwise, the text detection effect is poor, if the picture is too large, the picture is reduced a little, otherwise, the text detection time is too long. For all the pictures to be identified, the width and height of the picture are limited within the interval by scaling, that is, the picture is within the interval, further, the picture scaling is performed according to an interpolation algorithm, and more specifically, the picture scaling is implemented by using an opencv tool, which is a general algorithm used in image processing and computer vision, and the implementation includes: reading pictures, and converting the pictures into arrays; converting array data; displaying the array data in a window; image saving; intercepting an image; data slices for BGR; calculating array pixel values with the same size; fusing pictures; scaling of pictures, etc.
And S312, converting the zoomed picture into a gray scale image.
Specifically, the image is converted into a gray-scale image after being zoomed, and in a general practical use scene, the image uploaded by a user is a color image generally, but the gray-scale image is required to be processed by a subsequent algorithm, so that the input image is converted into the gray-scale image.
Further, the color picture is converted into a gray scale image, i.e. 3 channels (RGB) of the picture are converted into 1 channel. There are generally three ways to implement this, respectively:
(1) the average method, the simplest method, is the average method of course, averages the values of 3 channels RGB at the same pixel position, I (x, y) 1/3 × I _ R (x, y) +1/3 × I _ G (x, y) +1/3 × I _ B (x, y).
(2) The maximum-minimum averaging method is to average the maximum and minimum brightness values of RGB at the same pixel position.
(3) Weighted average method, I (x, y) ═ 0.3 × I _ R (x, y) +0.59 × I _ G (x, y) +0.11 × I _ B (x, y); this is the most popular approach. Several weighting factors 0.3,0.59,0.11 are parameters adjusted according to the human brightness perception system, which is a widely used standardized parameter.
And step 32, detecting the preprocessed picture by using the pre-trained text detection model to obtain the coordinates of each text line in the picture.
The text detection model can detect coordinates of each text line in the picture, wherein the coordinates refer to coordinates of four vertexes of a minimum circumscribed rectangle of the text line, an origin of the coordinates is a vertex at the upper left corner of the picture, coordinate axes take the width of the picture as horizontal coordinates, and the height of the picture as vertical coordinates.
Specifically, the step of detecting the preprocessed picture by using the pre-trained text detection model to obtain the coordinates of each text line in the picture includes:
s321, calling a trained text detection model, wherein the text detection model adopts a dbnet algorithm, a preprocessed picture is input into the text detection model, and the text detection model outputs a probability value of a corresponding text in a pixel point of the picture;
specifically, the text detection model may be constructed by using a text detection algorithm based on pixel segmentation, and an attention mechanism may be added to the text detection model. The text detection algorithm based on segmentation may be any one of the algorithms of SENEt, DBNet, PixelLink, etc. In this embodiment, a trained text detection model needs to be called, where the text detection model adopts a dbnet algorithm (Differentiable Binarization Network), and the dbnet algorithm is a text segmentation model designed based on a picture segmentation method. According to the dbnet algorithm, data marked by a user can be used as a training set, and a desired text detection model is obtained through training. Specifically, the result output by the text detection model is the probability (0-1) of each pixel point in the picture, for example, a picture with 100 pixels x100 pixels, and after calculation by dbnet, the probability values corresponding to the 10000 pixel points and belonging to the text are output, that is, how many of the 10000 pixel points correspond to the text in the picture, and how many correspond to the hollow part or the non-text part in the picture.
S322, carrying out thresholding selection on pixel points in the picture, and dividing the pixel points in the picture into text pixel points and non-text pixel points according to the probability value to obtain a binary image;
specifically, a user can set a threshold value, and the 10000 pixel points are divided into text pixel points and non-text pixel points through the probability value, wherein the non-text pixel points can be pixel points belonging to a blank part in a picture, so that a binary image can be formed.
S323, calculating a connected domain set of the binary image by using a first image processing algorithm;
specifically, the first image processing algorithm may be the computer vision processing kit opencv widely used in the industry.
And S324, inputting the connected domain set into a second image processing algorithm, and calculating a minimum circumscribed rectangle of each connected domain, wherein four vertexes of the minimum circumscribed rectangle are coordinates of text lines.
Specifically, the second image processing algorithm calls the findcontouus function of opencv.
It should be noted that the text behavior output by the text detection model includes at least one text line of a text, which may be understood as an outline image of a text or an outline image of a text line/column. In a possible embodiment, the text line may also be a sub-image that is segmented from the target picture and includes text.
In some embodiments, a line of text is a bounding box in units of a particular content, where the particular content may be a word, a line of words, or a single word, among others. In some embodiments, the text detection model may generate different lines of text based on the type of text in the picture to be recognized. For example, when the picture contains an english text, the text detection model may frame the english text of the picture line by taking a word as a unit, and generate a plurality of text lines. For another example, when the picture to be recognized includes a chinese character, the text detection model may frame the chinese text of the picture in units of lines to generate a plurality of text lines, and it can be understood that the text in the text lines determined in this embodiment is a line of chinese text. For another example, when the picture to be recognized contains chinese, the text detection model may frame the chinese text of the picture in units of a single word to generate a plurality of text lines, and it can be understood that the text in the text line determined in this embodiment is a word.
As will be readily understood by those skilled in the art, most of the characters in the pictures are laid out regularly, and the characters are generally arranged in a straight line, such as a horizontal direction, a vertical direction, and an oblique direction. The shape of the text lines obtained in this step is therefore generally quadrangular. Specifically, according to the text typesetting of the picture to be detected, corresponding data can be selected as a training set for model training, and if the text of the picture to be detected is transversely typeset, the data of the text detection model training set are all pictures with the text being transversely typeset. In this embodiment, the characters of the picture to be detected are all horizontally typeset.
Specifically, after the target picture is input into the text detection model, the text line information output by the text detection model is the information of one or more text lines in the target picture. All text line information output by the text detection model may be characterized as D ═ { p1, p2, p3,. and pn }, where N ═ 1,2,3,. and N, and N characterizes the number of text lines in the detected target picture. When each text line information includes four vertex coordinates of the text line, the four vertex coordinates of each text line may be characterized as pi { (x1, y1), (x2, y2), (x3, y3), (x4, y4) }, where (x1, y1) is a text line upper-left corner coordinate point, (x2, y2) is a text line upper-right corner coordinate point, (x3, y3) is a text line lower-right corner coordinate point, and (x4, y4) is a text line lower-left corner coordinate point.
In the process of training the text detection model, the number of training sample pictures used may be 500, 800, 1000, and so on, and specifically may be determined by a developer according to actual conditions.
And step 33, randomly selecting a plurality of text lines, determining the central point of each text line according to the coordinates of the selected text line, and intercepting the small rectangular picture blocks with preset sizes from the central point outwards to form a plurality of small picture block sets with text information.
Specifically, the small picture blocks may be rectangular or square, but it is necessary to ensure that the sizes of the small picture blocks are consistent.
In a preferred embodiment, the step of randomly selecting a plurality of text lines, determining a center point of each text line according to coordinates of the selected text line, and capturing a preset size of rectangular small picture block outwards to form a plurality of small picture block sets with text information includes:
s331, randomly selecting a plurality of text lines;
s332, calculating to obtain the center point and the height of the corresponding text line according to the coordinates of each vertex of the selected text line;
when the coordinates corresponding to each selected text line are found, the position of the center point of the text line can be known through the coordinates of the text line, the center point is the center point of a rectangle framing the text line, namely the intersection point of two oblique lines, or the middle value between the width and the height of the text line, and it can be expected that the probability of characters existing in the center point is higher, and the subsequent detection result is more accurate.
Because the text line detected by the text detection model is a rectangular text line, there is coordinate information of four vertices of the rectangular text line, and the width and height of the text line can be calculated from the values of the vertices, such as: width ═ (x _ max-x _ min); height ═ (y _ max-y _ min)).
S333, according to the center point of each text line, cutting out small square picture blocks with the length not exceeding one half of the height of the text line outwards to form a plurality of small picture block sets with text information.
The present embodiment is preferably a small square picture block, and selecting a square is convenient for subsequent scaling, so as to implement fast processing of the whole scheme. In the process of selecting the square, if the randomly selected text lines have the condition of high inconsistency, three schemes can be used for intercepting the square picture block. The first scheme is as follows: determining the height of each text line to obtain the minimum text line height h, and then cutting out a small picture block with the height h not more than half from the central point of each text line outwards along the width direction after knowing the height h of the text line to form a square picture block; the second scheme is as follows: intercepting each text line outwards along the width direction to obtain a picture block with the height not more than half of the height of the text line, and subsequently zooming to the same proportion; in the third scheme: and zooming each text line to the same preset height, and then finishing the interception of the small square picture block. Of course, if the selected text lines are consistent in height, the second scheme is adopted, only the subsequent scaling is not needed, for example, a region with a length of 112 pixels can be cut from the center point, so that a square small picture with 224 pixels x224 pixels is formed, and the length of 112 pixels is an experimental value, which can meet the requirement in general. Therefore, a plurality of small picture block sets with text information can be collected in a text positioning mode, one small picture block is selected from each text line in randomly selected text lines, the number of the small picture blocks corresponding to the text lines is large, for example, 100 text lines are detected by a text detection model, and then 15 text lines are randomly selected to intercept the small square picture blocks, so that 15 small square picture blocks are formed.
Further, the step of intercepting outwards comprises: determining the width and the height of the small picture block to be intercepted by adopting an opencv tool, knowing the coordinates of four vertexes of the small picture block to be intercepted, and intercepting pixels in the four vertexes corresponding to the small picture block in the corresponding text line to form the small picture block.
More specifically, by means of steps 32 and 33, the problem that the upper half of the picture is selected by the user, but the lower half of the picture has a greater influence on the picture quality can be solved. Therefore, a user firstly detects all text lines of the picture by using a text detection model, and then randomly selects and cuts out small square picture blocks from the text lines, so that all the selected small square picture blocks are ensured to contain characters.
And step 34, inputting the small picture block set into a pre-trained text quality model and/or text direction model, and outputting a text quality judgment result and/or a text direction classification result.
Specifically, the text quality model and the text direction model both adopt a resnet18 residual error network, wherein data used by the document quality model is data labeled by a user, and can be used for data training and prediction, unclear pictures and clear pictures can be selected as training data, different scores can be labeled according to the definition of the pictures, and similarly, the training data selected by the text direction model can label the directions of characters in a text line, and can label 0 degree, 90 degrees, 180 degrees, 270 degrees and the like. And after the models are converged, using the small picture block set as a shared input picture of the two models, thereby obtaining the text quality judgment score or the text direction classification result of the small picture block. The text quality judgment results include two results: clear and unclear, in the judging process, a score threshold value is set, for example, the score of the threshold value is 60 points, the score of a given picture of the model is clear when the score exceeds 60 points, otherwise, the score is unclear; the results of text direction classification are four: 0 degrees, 90 degrees, 180 degrees and 270 degrees, wherein 0 degrees represents that the text line character direction is correct and is towards the upper direction of the text line height, 90 degrees represents that the text line character direction is wrong, characters are towards the right direction of the text line width, 180 degrees represents that the text line character direction is wrong, characters are towards the lower direction of the text line height, 270 degrees represents that the text line character direction is wrong, and characters are towards the left direction of the text line width.
In this embodiment, the residual error network is characterized by being easily optimized and increasing the accuracy by adding a considerable depth, the residual error block inside the residual error network uses a jump connection, which alleviates the problem of gradient disappearance caused by adding a depth in the deep neural network, and further, the resnet18 residual error network can add a short circuit between two convolutional layers, where 18 designates 18 layers with weights, including convolutional layers and fully-connected layers, excluding pooling layers and BN layers.
And step 35, constructing a voting mechanism, and taking the majority result of the text quality judgment and/or the text direction classification as the text quality judgment result and/or the text direction classification result of the current picture.
Specifically, according to a voting mechanism, most results are used as final quality judgment and text direction classification results of the current picture. Taking the document quality judgment as an example: if 15 small square picture blocks are selected, after model processing, 2 are judged to be unclear, and 13 are judged to be clear, then the picture is judged to be clear in quality in a few majority-obeying modes. Examples are classified in text direction: as long as the number of certain angles is the largest, the angle is directly used as the angle of the characters in the text line of the picture, for example, 6 of the existing 10 small picture blocks are 0 degree, and 4 of the existing 10 small picture blocks are other degrees, which indicates that the text direction of the picture is correct, so that the characters of the picture do not need to be corrected.
Further, the voting mechanism is a combination strategy for classification problems in ensemble learning, and the basic idea is to select the class with the most output from all machine learning algorithms. The machine learning classification algorithm has two types, one is to directly output class labels, and the other is to output class probabilities. The label of the text quality judgment is clear or unclear; for text direction classification, labels are 0 degree, 90 degrees, 180 degrees and 270 degrees. And after the model outputs the results, labeling each result, and counting the result with the most labels, wherein the result is the final result.
After a plurality of experimental iterations and tests, a user can verify whether the logic designed by the technical scheme of the invention is feasible, such as whether the voting mechanism and the trained model accuracy are robust and stable, wherein the accuracy of the text quality judgment result of the picture is stable at about 97%, and the accuracy of the text direction classification result of the picture is stable at about 99%. Obviously, the technical scheme of the invention can effectively improve the robustness of the algorithm.
The scheme is used for further preparing for image recognition, for example, whether the image quality meets the requirements is determined, the image meeting the requirements can be subjected to character recognition, whether the text direction meets the requirements is determined, after the preparation work is done, the character recognition is carried out again, and the text recognition can be accurately and quickly completed.
In an alternative embodiment, it is also possible to: and uploading the text quality judgment result and/or the text direction classification result of the current picture to the block chain.
Specifically, the corresponding digest information is obtained based on the result of the picture analysis method, and specifically, the digest information is obtained by hashing the result of the picture analysis method, for example, by using the sha256s algorithm. Uploading summary information to the blockchain can ensure the safety and the fair transparency of the user. The user can download the summary information from the blockchain to verify whether the result of the picture analysis method is tampered. The blockchain referred to in this example is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm, and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The invention provides a picture analysis method, a system, computer equipment and a computer readable storage medium, wherein the picture analysis method is used for preprocessing a picture to be identified by acquiring the picture; detecting the preprocessed picture by using a text detection model trained in advance to obtain the coordinates of each text line in the picture; randomly selecting a plurality of text lines, determining the center point of each text line according to the coordinates of the selected text line, and intercepting rectangular small picture blocks with preset sizes from the center point outwards to form a plurality of small picture block sets with text information; inputting the small picture block set into a pre-trained text quality model and/or a text direction model, and outputting a text quality judgment result and/or a text direction classification result; and constructing a voting mechanism, and taking the majority result of the text quality judgment and/or the text direction classification as the text quality judgment result and/or the text direction classification result of the current picture. Therefore, the method for judging the text quality and the text direction classification in an auxiliary manner by using the text line information, which is provided by the picture analysis method, has the advantage that the accuracy and the robustness of the effect are obviously improved after a plurality of times of algorithm self-test and comparison experiments.
As shown in fig. 4, the present invention further provides a picture analysis system, which can be integrated in the computing device 110, and specifically can include a preprocessing module 20, a detection module 30, a picture block module 40, a result module 50, and a voting module 60.
The system comprises a preprocessing module 20, a recognition module and a recognition module, wherein the preprocessing module 20 is used for acquiring a picture to be recognized and preprocessing the picture; the detection module 30 is configured to detect the preprocessed picture by using a pre-trained text detection model, so as to obtain coordinates of each text line in the picture; the picture block module 40 is used for randomly selecting a plurality of text lines, determining the central point of each text line according to the coordinates of the selected text line, and intercepting a small rectangular picture block with a preset size from the central point outwards to form a plurality of small picture block sets with text information; a result module 50, configured to input the small picture block set into a pre-trained text quality model and/or text direction model, and output a text quality determination result and/or a text direction classification result; and the voting module 60 is configured to construct a voting mechanism, and use the majority result of the text quality judgment and/or the text direction classification as the text quality judgment result and/or the text direction classification result of the current picture.
In one embodiment, the preprocessing step of the preprocessing module 20 includes:
zooming the picture, wherein the zoomed maximum width of the picture is not more than 1600 pixels, the maximum height of the zoomed picture is not more than 2400 pixels, the minimum width is not less than 600 pixels, and the minimum height is not less than 800 pixels;
and converting the zoomed picture into a gray scale image.
In one embodiment, the processing steps of the detection module 30 include:
calling a trained text detection model, wherein the text detection model adopts a dbnet algorithm, a preprocessed picture is input into the text detection model, and the text detection model outputs a probability value of a corresponding text in a pixel point of the picture;
carrying out thresholding selection on pixel points in the picture, and dividing the pixel points in the picture into text pixel points and non-text pixel points according to the probability value to obtain a binary image;
calculating a connected domain set of the binary image by using a first image processing algorithm;
and inputting the connected domain set into a second image processing algorithm, and calculating a minimum circumscribed rectangle of each connected domain, wherein four vertexes of the minimum circumscribed rectangle are coordinates of text lines.
In one embodiment, the processing steps of the picture block module 40 include:
randomly selecting a plurality of text lines;
calculating to obtain the center point and the height of the corresponding text line according to the coordinates of each vertex of the selected text line;
and according to the central point of each text line, outwards intercepting small rectangular picture blocks with the length not exceeding one half of the height of the text line to form a plurality of small picture block sets with text information.
Further, the small rectangular picture blocks are square.
In one embodiment, the text quality model and the text direction model both use a resnet18 residual network, the text quality determination result includes clearness and unsharpness, and the text direction classification result includes 0 degree, 90 degrees, 180 degrees and 270 degrees.
In an embodiment, the picture analysis system further includes a block chain module (not shown) configured to, after obtaining the text quality determination result and/or the text direction classification result, upload the text quality determination result and/or the text direction classification result to a block chain, so that the block chain encrypts and stores the text quality determination result and/or the text direction classification result.
The processing steps of the above modules are described in detail in embodiments of the method and will not be described again.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention. As shown in fig. 5, the apparatus 200 includes a processor 201 and a storage 202 coupled to the processor 201.
The storage 202 stores program instructions for implementing the picture analysis method according to any of the above embodiments.
The processor 201 is used to execute program instructions stored by the memory 202.
The processor 201 may also be referred to as a Central Processing Unit (CPU). The processor 201 may be an integrated circuit chip having signal processing capabilities. The processor 201 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a storage medium according to an embodiment of the invention. The computer-readable storage medium of the embodiment of the present invention stores a program file 301 capable of implementing the picture analysis method, where the program file 301 may be stored in the storage medium in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Claims (10)

1. A picture analysis method, characterized in that the analysis method comprises:
acquiring a picture to be identified, and preprocessing the picture;
detecting the preprocessed picture by using a text detection model trained in advance to obtain the coordinates of each text line in the picture;
randomly selecting a plurality of text lines, determining the center point of each text line according to the coordinates of the selected text line, and intercepting rectangular small picture blocks with preset sizes from the center point outwards to form a plurality of small picture block sets with text information;
inputting the small picture block set into a pre-trained text quality model and/or a text direction model, and outputting a text quality judgment result and/or a text direction classification result;
and constructing a voting mechanism, and taking the majority result of the text quality judgment and/or the text direction classification as the text quality judgment result and/or the text direction classification result of the current picture.
2. The analysis method of claim 1, wherein the step of pre-processing the picture comprises:
zooming the picture, wherein the zoomed maximum width of the picture is not more than 1600 pixels, the maximum height of the zoomed picture is not more than 2400 pixels, the minimum width is not less than 600 pixels, and the minimum height is not less than 800 pixels;
and converting the zoomed picture into a gray scale image.
3. The analysis method as claimed in claim 1, wherein the step of detecting the preprocessed image by using the pre-trained text detection model to obtain the coordinates of each text line in the image comprises:
calling a trained text detection model, wherein the text detection model adopts a dbnet algorithm, a preprocessed picture is input into the text detection model, and the text detection model outputs a probability value of a corresponding text in a pixel point of the picture;
carrying out thresholding selection on pixel points in the picture, and dividing the pixel points in the picture into text pixel points and non-text pixel points according to the probability value to obtain a binary image;
calculating a connected domain set of the binary image by using a first image processing algorithm;
and inputting the connected domain set into a second image processing algorithm, and calculating a minimum circumscribed rectangle of each connected domain, wherein four vertexes of the minimum circumscribed rectangle are coordinates of text lines.
4. The analysis method as claimed in claim 1, wherein the step of randomly selecting a plurality of text lines, determining a center point of each text line according to the coordinates of the selected text line, and intercepting a small rectangular picture block with a preset size from the center point to form a plurality of small picture block sets with text information comprises:
randomly selecting a plurality of text lines;
calculating to obtain the center point and the height of the corresponding text line according to the coordinates of each vertex of the selected text line;
and according to the central point of each text line, outwards intercepting small rectangular picture blocks with the length not exceeding one half of the height of the text line to form a plurality of small picture block sets with text information.
5. The analysis method of claim 4, wherein the rectangular small picture blocks are square.
6. The analysis method as claimed in claim 1, wherein the text quality model and the text direction model both use a resnet18 residual network, the text quality determination result includes clearness and unsharpness, and the text direction classification result includes 0 degree, 90 degrees, 180 degrees, and 270 degrees.
7. The analysis method according to claim 1, wherein after the text quality determination result and/or the text direction classification result are obtained, the text quality determination result and/or the text direction classification result are uploaded to a blockchain, so that the blockchain encrypts and stores the text quality determination result and/or the text direction classification result.
8. A picture analysis system, the analysis system comprising:
the preprocessing module is used for acquiring a picture to be identified and preprocessing the picture;
the detection module is used for detecting the preprocessed pictures by utilizing the pre-trained text detection model to obtain the coordinates of each text line in the pictures;
the picture block module is used for randomly selecting a plurality of text lines, determining the central point of each text line according to the coordinates of the selected text line, and intercepting a small rectangular picture block with a preset size from the central point outwards to form a plurality of small picture block sets with text information;
the result module is used for inputting the small picture block set into a pre-trained text quality model and/or a text direction model and outputting a text quality judgment result and/or a text direction classification result;
and the voting module is used for constructing a voting mechanism and taking the majority result of the text quality judgment and/or the text direction classification as the text quality judgment result and/or the text direction classification result of the current picture.
9. A computer device comprising a storage and a processor, the storage having stored therein readable instructions which, when executed by the processor, cause the processor to carry out the steps of the picture analysis method according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that a program file capable of implementing the picture analysis method according to any one of claims 1 to 7 is stored.
CN202110213670.1A 2021-02-25 Picture analysis method, system, computer device and computer readable storage medium Active CN112926564B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110213670.1A CN112926564B (en) 2021-02-25 Picture analysis method, system, computer device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110213670.1A CN112926564B (en) 2021-02-25 Picture analysis method, system, computer device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN112926564A true CN112926564A (en) 2021-06-08
CN112926564B CN112926564B (en) 2024-08-02

Family

ID=

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113963149A (en) * 2021-10-29 2022-01-21 平安科技(深圳)有限公司 Medical bill picture fuzzy judgment method, system, equipment and medium
CN114219876A (en) * 2022-02-18 2022-03-22 阿里巴巴达摩院(杭州)科技有限公司 Text merging method, device, equipment and storage medium
CN114387254A (en) * 2022-01-12 2022-04-22 中国平安人寿保险股份有限公司 Document quality analysis method and device, computer equipment and storage medium
CN116563875A (en) * 2023-07-05 2023-08-08 四川集鲜数智供应链科技有限公司 Intelligent image-text recognition method and system with encryption function

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304761A (en) * 2017-09-25 2018-07-20 腾讯科技(深圳)有限公司 Method for text detection, device, storage medium and computer equipment
CN109492143A (en) * 2018-09-21 2019-03-19 平安科技(深圳)有限公司 Image processing method, device, computer equipment and storage medium
CN109977723A (en) * 2017-12-22 2019-07-05 苏宁云商集团股份有限公司 Big bill picture character recognition methods
CN111382740A (en) * 2020-03-13 2020-07-07 深圳前海环融联易信息科技服务有限公司 Text picture analysis method and device, computer equipment and storage medium
CN111950555A (en) * 2020-08-17 2020-11-17 北京字节跳动网络技术有限公司 Text recognition method and device, readable medium and electronic equipment
CN111967545A (en) * 2020-10-26 2020-11-20 北京易真学思教育科技有限公司 Text detection method and device, electronic equipment and computer storage medium
CN112101317A (en) * 2020-11-17 2020-12-18 深圳壹账通智能科技有限公司 Page direction identification method, device, equipment and computer readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304761A (en) * 2017-09-25 2018-07-20 腾讯科技(深圳)有限公司 Method for text detection, device, storage medium and computer equipment
US20200012876A1 (en) * 2017-09-25 2020-01-09 Tencent Technology (Shenzhen) Company Limited Text detection method, storage medium, and computer device
CN109977723A (en) * 2017-12-22 2019-07-05 苏宁云商集团股份有限公司 Big bill picture character recognition methods
CN109492143A (en) * 2018-09-21 2019-03-19 平安科技(深圳)有限公司 Image processing method, device, computer equipment and storage medium
CN111382740A (en) * 2020-03-13 2020-07-07 深圳前海环融联易信息科技服务有限公司 Text picture analysis method and device, computer equipment and storage medium
CN111950555A (en) * 2020-08-17 2020-11-17 北京字节跳动网络技术有限公司 Text recognition method and device, readable medium and electronic equipment
CN111967545A (en) * 2020-10-26 2020-11-20 北京易真学思教育科技有限公司 Text detection method and device, electronic equipment and computer storage medium
CN112101317A (en) * 2020-11-17 2020-12-18 深圳壹账通智能科技有限公司 Page direction identification method, device, equipment and computer readable storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113963149A (en) * 2021-10-29 2022-01-21 平安科技(深圳)有限公司 Medical bill picture fuzzy judgment method, system, equipment and medium
CN114387254A (en) * 2022-01-12 2022-04-22 中国平安人寿保险股份有限公司 Document quality analysis method and device, computer equipment and storage medium
CN114219876A (en) * 2022-02-18 2022-03-22 阿里巴巴达摩院(杭州)科技有限公司 Text merging method, device, equipment and storage medium
CN116563875A (en) * 2023-07-05 2023-08-08 四川集鲜数智供应链科技有限公司 Intelligent image-text recognition method and system with encryption function
CN116563875B (en) * 2023-07-05 2023-09-08 四川集鲜数智供应链科技有限公司 Intelligent image-text recognition method and system with encryption function

Similar Documents

Publication Publication Date Title
US11710293B2 (en) Target detection method and apparatus, computer-readable storage medium, and computer device
CN110659647B (en) Seal image identification method and device, intelligent invoice identification equipment and storage medium
CN111640130A (en) Table reduction method and device
CN112926565B (en) Picture text recognition method, system, equipment and storage medium
CN107886082B (en) Method and device for detecting mathematical formulas in images, computer equipment and storage medium
CN111368638A (en) Spreadsheet creation method and device, computer equipment and storage medium
CN113486828A (en) Image processing method, device, equipment and storage medium
CN111027545A (en) Card picture mark detection method and device, computer equipment and storage medium
CN111738252B (en) Text line detection method, device and computer system in image
CN111461070B (en) Text recognition method, device, electronic equipment and storage medium
CN113343740A (en) Table detection method, device, equipment and storage medium
CN114038004A (en) Certificate information extraction method, device, equipment and storage medium
CN112580499A (en) Text recognition method, device, equipment and storage medium
CN102737240A (en) Method of analyzing digital document images
KR101725501B1 (en) Method and apparatus for recognizing character
CN115223172A (en) Text extraction method, device and equipment
CN111126266A (en) Text processing method, text processing system, device, and medium
CN111400497A (en) Text recognition method and device, storage medium and electronic equipment
CN113537184A (en) OCR (optical character recognition) model training method and device, computer equipment and storage medium
CN112036232A (en) Image table structure identification method, system, terminal and storage medium
CN112926564B (en) Picture analysis method, system, computer device and computer readable storage medium
CN112926564A (en) Picture analysis method, system, computer device and computer-readable storage medium
CN111670458B (en) Reading system
CN115797942B (en) Propaganda information interaction method and system
CN113469931B (en) Image detection model training and modification detection method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant