CN113837168A - Image text detection and OCR recognition method, device and storage medium - Google Patents

Image text detection and OCR recognition method, device and storage medium Download PDF

Info

Publication number
CN113837168A
CN113837168A CN202111118174.4A CN202111118174A CN113837168A CN 113837168 A CN113837168 A CN 113837168A CN 202111118174 A CN202111118174 A CN 202111118174A CN 113837168 A CN113837168 A CN 113837168A
Authority
CN
China
Prior art keywords
text
training
image
segmentation
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111118174.4A
Other languages
Chinese (zh)
Inventor
陈坤龙
吴梁斌
章瑶
吕建进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yilianzhong Zhiding Xiamen Technology Co ltd
Original Assignee
Yilianzhong Zhiding Xiamen Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yilianzhong Zhiding Xiamen Technology Co ltd filed Critical Yilianzhong Zhiding Xiamen Technology Co ltd
Priority to CN202111118174.4A priority Critical patent/CN113837168A/en
Publication of CN113837168A publication Critical patent/CN113837168A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of data recognition, in particular to a method, a device and a storage medium for image text detection and OCR recognition, wherein the method comprises the following steps: preprocessing the picture to obtain training data; extracting the preliminary characteristics of the training data to obtain a return result and building a training network according to the return result; the training model calls a training network to train training data to obtain a plurality of text segmentation examples; processing a plurality of text segmentation examples by a watershed segmentation method to complete detection and identification; through the steps and the watershed segmentation method, a plurality of text segmentation examples are subjected to post-processing, the algorithm time complexity is effectively reduced to O (N), and the problem of adopting PSENet is solvedThe breadth-first algorithm in the algorithm flow carries out pixel-by-pixel four-neighborhood search and combination on each text segmentation example, which leads the algorithm time complexity in the detection stage to reach O (N)2) The method has the advantages of low detection speed and low efficiency, thereby improving the image processing speed and accelerating the efficiency.

Description

Image text detection and OCR recognition method, device and storage medium
Technical Field
The invention relates to the technical field of data recognition, in particular to a method, a device and a storage medium for image text detection and OCR recognition.
Background
The core idea of the deep learning OCR method basically adopts a deep target detection algorithm strategy, the progressive expansion network PSENet is a method based on example segmentation, image feature extraction is carried out by adopting a back bone based on CNN, then a series of feature down-sampling, feature fusion and up-sampling operations are carried out on a feature image by adopting a network similar to a space pyramid to obtain a group of text segmentation examples with a predefined number, and finally the text examples are subjected to region communication by adopting a breadth-first algorithm.
CN110008950A patent "a method for detecting text in shape robust natural scene", application publication No. 2019.07.12, discloses a method for detecting text in shape robust natural scene, comprising the following steps: step 1, preprocessing training pictures in a text data set; step 2, building a PSENet progressive scale growth network, and completing feature extraction, feature fusion and segmentation prediction of a training picture by using the progressive scale growth network to obtain segmentation results of a plurality of prediction scales; step 3, performing supervised training on the PSENet progressive scale growth network built in the step 2 to obtain a detector model; step 4, detecting the picture to be detected; and 5, obtaining a final detection result by using a scale growth algorithm.
However, for an image with more text detection targets and the phenomenon of text region dislocation and overlap, the breadth-first algorithm in the PSENet algorithm flow is adopted to search and merge four adjacent regions pixel by pixel for each text segmentation example, which may cause the algorithm time complexity at the detection stage to reach O (N)2) Slow detection speedAnd the efficiency is low.
Disclosure of Invention
In order to solve the problem that the algorithm time complexity in the detection stage reaches O (N) by adopting the breadth-first algorithm in the PSENet algorithm flow to search and combine the four adjacent domains pixel by pixel for each text segmentation example2) The detection speed is slow, and the efficiency is low.
The invention provides an image text detection and OCR recognition method, which comprises the following steps:
preprocessing the picture to obtain training data;
extracting the preliminary features of the training data to obtain a return result and building a training network according to the return result;
the training model calls the training network to train the training data to obtain a plurality of text segmentation examples;
and processing a plurality of text segmentation examples by a watershed segmentation method to finish detection and identification.
Further, in a preferred embodiment, a text region of the picture is labeled, and the picture labeled with the text region is an original text coordinate tag; and processing the original text coordinate labels to generate a plurality of text segmentation kernels with similar shapes, the same central points and different sizes as training data of the training network.
Further, in a preferred embodiment, the training network is a PSENet forward network;
and extracting the preliminary features of the training data by loading a feature extraction model to obtain a return result, inputting the return result into a PSENet forward network, and constructing the PSENet forward network by using a feature space pyramid network according to a top-down mode.
Further, in a preferred embodiment, the training model invoking the training network to train the training data to obtain a plurality of text segmentation instances includes the following steps:
training preparation: setting a hyper-parameter, selecting an optimizer, and setting a mode for reading the training data into the training model;
training process: calling a PSENet forward network, calculating the current loss situation through comparison with a real label and a loss function, calculating and updating network parameter gradient by adopting an optimizer, carrying out iterative training until the ideal precision is reached, and carrying out persistence on the model;
and outputting a plurality of text segmentation examples after training is completed.
Further, in a preferred embodiment, a dice coefficient is used to define a loss function, samples with poor detection effects are screened out according to the loss of training data transmitted into the model, and the screened samples with poor detection effects are extracted and combined and trained in random gradient descent.
Further, in a preferred embodiment, processing a plurality of the text segmentation instances by a watershed segmentation method to determine a final text line region and a final background region, includes the following steps:
acquiring a foreground image mark, a background image mark and an uncertain region;
and operating a watershed segmentation algorithm to process the uncertain area to obtain a final text line area and a final background area.
Further, in a preferred embodiment, the obtaining of the foreground image mark, the background image mark and the uncertain region comprises the following steps:
marking pixels inside the minimum text segmentation example as a foreground area, and setting the pixel value of the area to be 255;
marking pixels outside the maximum text segmentation instance as a background region and setting the pixel value of the region to 128;
the region between the minimum text segmentation instance and the maximum text segmentation instance is taken as an uncertain region, and the pixel value of the region is set to 0.
Further, in a preferred embodiment, the step of operating the watershed segmentation algorithm to process the uncertain region to obtain the final text line region and the final background region comprises the following steps:
sequencing pixels in the gradient image of the uncertain region to obtain a geodesic distance threshold of a watershed segmentation algorithm, and marking the minimum value of the uncertain region as the lowest point;
continuously increasing the geodesic distance, screening out pixels smaller than the geodesic distance value, and if the distance from the screened pixels to the lowest point is smaller than a geodesic distance threshold value, submerging; otherwise, taking the gray value of the screened pixel as a local threshold, namely constructing a dam and completing the classification of the text region and the non-text region of the local region;
and the geodesic distance is continuously increased until the maximum value of the gray value, so that the separation of the text region from the background is completed, and the classification attribution judgment of all pixels is completed.
The invention also provides an image text detection and OCR recognition device, which comprises
A preprocessing module: the image preprocessing module is used for preprocessing the image to obtain training data;
training a network building module: the training network is used for extracting the preliminary features of the training data to obtain a return result and building a training network according to the return result;
a training module: the training network is used for calling the training network by a training model to train the training data so as to obtain a plurality of text segmentation examples;
a processing module: the detection and recognition are completed by processing a plurality of text segmentation examples through a watershed segmentation algorithm.
The invention also provides a computer readable storage medium, which stores computer instructions, and when executed by a processor, the computer implements any one of the image text detection and OCR recognition methods.
Compared with the prior art, the image text detection and OCR recognition method provided by the invention has the advantages that through the steps and the watershed segmentation method replacing a breadth-first search (BFS) algorithm in the original PSENet algorithm to carry out post-processing on a plurality of text segmentation examples, the algorithm time complexity is effectively reduced to O (N), the problem that the algorithm time complexity at the detection stage reaches O (N) due to the fact that the breadth-first algorithm in the PSENet algorithm flow is adopted to carry out pixel-by-pixel four-neighborhood search and combination on each text segmentation example is solved, and the algorithm time complexity at the detection stage can reach O (N)2) Slow and effective detectionThe rate is low, thereby improving the image processing speed and increasing the efficiency.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a step diagram of an image text detection and OCR recognition method provided by the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc., indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Specific examples are given below:
in the invention, the image with more text detection targets and the phenomenon of text area dislocation and overlap is taken as the medical bill image as an example. Because a large amount of medical bill data are accumulated, retraining can be directly started, and parameters of a pre-trained model are not migrated for further training, the method can be used for model training in a train from scratch mode.
An image text detection and OCR recognition method comprises the following steps:
preprocessing the picture to obtain training data;
extracting the preliminary features of the training data to obtain a return result and building a training network according to the return result;
the training model calls the training network to train the training data to obtain a plurality of text segmentation examples;
and processing a plurality of text segmentation examples by a watershed segmentation method to finish detection and identification.
Compared with the prior art, the image text detection and OCR recognition method provided by the invention has the advantages that through the steps and the watershed segmentation method replacing a breadth-first search (BFS) algorithm in the original PSENet algorithm to carry out post-processing on a plurality of text segmentation examples, the algorithm time complexity is effectively reduced to O (N), the problem that the algorithm time complexity at the detection stage reaches O (N) due to the fact that the breadth-first algorithm in the PSENet algorithm flow is adopted to carry out pixel-by-pixel four-neighborhood search and combination on each text segmentation example is solved, and the algorithm time complexity at the detection stage can reach O (N)2) The method has the advantages of low detection speed and low efficiency, thereby improving the image processing speed and accelerating the efficiency.
Specifically, preprocessing a picture to obtain training data, wherein the picture can be a picture shot in a natural scene, and a text area of the picture is marked, and the picture marked with the text area is an original text coordinate label; the text area refers to an area with a text, the labeling mode can be manual or computer labeling, and the labeling mode is a polygonal coordinate and can be a coordinate of four points of a rectangular frame;
according to the requirement of progressive scale expansion, an original text coordinate label is processed through a Vatti clipping algorithm to generate a plurality of text segmentation cores with similar shapes, the same central point and different sizes as training data of the training network.
Specifically, the method for obtaining the text segmentation cores by performing contraction processing on the original text coordinate labels through the Vatti clipping algorithm comprises the following steps:
and calculating a reduction distance according to the area and the perimeter of the maximum text segmentation kernel and a reduction ratio:
Figure BDA0003272585860000071
in implementation, the original text coordinate label is contracted to obtain a plurality of text segmentation kernels which are p in sequence1,p2...piWherein the largest text segmentation kernel (i.e. the original kernel) is p1Any one of the text segmentation kernels piWith the largest text segmentation kernel p1Is reduced by a ratio riRelative distance is diArea and Perimeter are the Area and the Perimeter of the maximum text segmentation kernel respectively;
calculating a reduction ratio according to the number of text segmentation cores and the reduction ratio:
Figure BDA0003272585860000072
wherein m is a reduction scale, the range is (0, 1), n is the number of text segmentation examples, namely the number of text segmentation kernels, and both m and n are hyper-parameters of a PSENet algorithm;
calculating the reduced labels of the original text coordinate labels through a reduction formula to obtain a plurality of text segmentation kernels, wherein the plurality of text segmentation kernels are used as original input training data of a training network; the reduction formula refers to the above-described formula of the reduction distance and the reduction ratio.
Specifically, in the step of extracting the preliminary features of the training data to obtain a return result, and building a training network according to the return result:
the training network is a PSENet forward network, the feature extraction model is, but not limited to ResNet-18, ResNet-34, ResNet-152, ResNet-50, ResNet-101, vgg16, vgg19, shufflent and mobilene, preferably, the ResNet-152 model is selected, and ResNet-152 is a network with a deeper structure, can extract more effective features and has better precision;
extracting the preliminary characteristics of the training data by loading a ResNet-152 model on the Pythroch to obtain a return result, inputting the return result into a PSENet forward network, and building the PSENet forward network by using a characteristic space pyramid network according to a top-down mode. The process of extracting the preliminary features of the training data and obtaining the returned result by the ResNet-152 model is already the prior art, and redundant description is not repeated again.
Specifically, inputting the returned results [ c2, c3, c4 and c5] into a PSENet forward network, and constructing a PSENet training network by using a feature space pyramid network according to a top-down mode, wherein the PSENet training network comprises the following steps:
(1) p5 toplayer treatment:
c5 → p5:3 × 3 convolution, BN processing, ReLU activation function;
(2) p4 upsampling:
c4 → c4l:2 × 2 convolution, BN processing, ReLU activation function;
[ p5, c4l ] → p4 bilinear interpolation (p5) + c4l
(3) p4 smoothing:
p4 → p4 original size convolution, BN processing, ReLU activation function;
(4) p3 upsampling and smoothing:
c3 → c3l:1 × 1 convolution, BN processing, ReLU activation function;
[ p4, c3l ] → p3 bilinear interpolation (p4) + c3l
The smoothing treatment is the same as p 4;
(5) p2 upsampling and smoothing:
c2 → c2l original size convolution, BN processing and ReLU activation function;
[ p3, c2l ] → p2 bilinear interpolation (p3) + c2l
The smoothing treatment is the same as p 4;
(6) and (3) upsampling combination:
based on the size of p2, bilinearly interpolating p3-p5 into the size of p2 degrees, and then combining p2-p5 vectors by using a concatenate method; and completing the construction of the PSEnet forward network.
Specifically, the training model calls a training network to train training data to obtain a plurality of text segmentation examples, and the method comprises the following steps:
training preparation: setting a hyper-parameter, selecting an optimizer, and setting a mode of reading training data into a training model;
the optimization method comprises the following steps that hyper-parameters comprise completion of learning rate and decay tasks, segmentation examples, batch _ size and epoch, an optimizer selects but is not limited to SGD and Adam, Adam is selected, Adam has the advantages of being capable of dynamically adjusting learning rate and the like, and training data are read into a training model through a generator function batch;
training process: calling a PSENet forward network, calculating the current loss situation through comparison between a model prediction result and a real label and a loss function, calculating and updating network parameter gradient by adopting an optimizer, carrying out iterative training until the ideal precision is reached, and carrying out persistence on the model:
specifically, the method comprises the steps of training by taking epochs as a unit, completely training all data sub-batchs once by each epoch (without considering boundary problems), transmitting each batch data into a model, calling a PSENet forward network, comparing the training data with real labels, calculating the current loss condition by a loss function, calculating and updating network parameter gradients by using an Adam optimizer, iteratively training until the ideal precision is reached, and persisting the model; through continuous model iteration, the result predicted by each model is compared with the real label, and if the model prediction result is basically consistent with the real label, for example, the prediction precision reaches 95%, the model parameters at the moment are stored, namely, the model parameters are stored persistently.
In the text detection of the medical bill, the negative sample area is far larger than the positive sample area, the loss function is defined by dice coefficient, samples with poor detection effect are screened out according to the loss of training data of an incoming model, the screened samples with poor detection effect are extracted and combined and trained in random gradient descent, and the loss function specifically comprises the following steps:
Figure BDA0003272585860000101
wherein Sx,yTo predict the value of the resulting pixel, Gx,yIs the point value of the pixel in the real label.
The loss function is defined as L ═ λ LC+(1-λ)LSWherein L isCIs a classification loss of text regions, LSIs the loss of the contracted text instance, of
LC=1-D(Sn×M,Gn×M)
Figure BDA0003272585860000102
Figure BDA0003272585860000103
M is generated by an online hard case mining algorithm and is 0/1byte codes; and screening out samples with poor detection effect according to the loss of the training data transmitted into the model, then extracting and combining the screened samples and training by adopting Adam.
Specifically, the method for processing a plurality of text segmentation examples by a watershed segmentation method to determine a final text line region and a final background region comprises the following steps:
firstly, obtaining a foreground image mark, specifically marking pixels inside a minimum text segmentation example as a foreground area, and setting the pixel value of the area to be 255; acquiring a background image mark, specifically, marking a pixel outside a maximum text segmentation example as a background area, and setting a pixel value of the area as 128; and acquiring an uncertain region, specifically, taking a region between the minimum text segmentation example and the maximum text segmentation example as the uncertain region, and setting the pixel value of the region to be 0.
Secondly, operating a watershed segmentation algorithm to process the uncertain region to obtain a final text line region and a final background region, specifically comprising the following steps:
sequencing pixels in the gradient image of the uncertain region to obtain a geodesic distance threshold of a watershed segmentation algorithm, marking the minimum value of the uncertain region as the lowest point, and specifically obtaining the geodesic distance threshold of the watershed algorithm by running an OTSU algorithm;
continuously increasing the geodesic distance, screening out pixels smaller than the geodesic distance value, if the distance from the pixels to the lowest point is smaller than a geodesic distance threshold value, submerging, otherwise, taking the gray value of the pixels as a local threshold value, namely constructing a dam, and completing the classification of text regions and non-text regions of the local region;
and (4) continuously increasing the geodesic distance until the maximum value of the gray value is reached, and all the regions meet on the watershed line, so that the separation of the text region from the background is completed, the classification attribution judgment of all the pixels is completed, and the final text line region and the final background region are obtained.
According to the content of the invention, in implementation, M is set to be 0.5, n is set to be 6, ResNet-152 and M is set to be 3 are selected as a feature extraction network, training data is read into model training through a generator function batch, an Adam optimizer is adopted, and during training, the input image dimensions are [ B,3, H and W ], which are respectively corresponding to the batch _ size, the number of image channels and the height and width of an image;
setting the number of text segmentation examples to be 6, carrying out image down-sampling, feature fusion and image up-sampling on the batch training image feature graph, and outputting a batch with the same size as the original image, namely [ B,6, H, W ]]For each text line of each image, 6 text segmentation results S are generated1,S2,…,S6
The medical bills (including outpatient service invoices and hospitalization invoices) are adopted to carry out test experiments, each type of picture contains 1000, and a display card of the test equipment is Tesla V100 and 32GB for display and storage. In the experiments, all pictures were limited to 1000 on the shortest side,
under the same conditions, the original BFS algorithm of the PSENet is replaced by the Watershed segmentation algorithm, and the minimum segmentation result S of all text regions is separated6Combining confidence foreground marked images as Watershed, and performing maximum segmentation on the result S1Negation as Watershed confidence background labelImage, S2,S3,...,S5The processing is performed as an uncertainty region of the algorithm:
for the original psenet algorithm, the accuracy reaches 92.37%, and the FPS (the number of pictures processed by the model per second, including data pre-processing and post-processing) reaches 11; the accuracy of the method reaches 92.51%, and FPS reaches 48%; obviously, under the condition of ensuring the precision, the processing speed of the method is more than 4 times of that of the original PSENet algorithm, compared with the prior art, the image text detection and OCR recognition method provided by the invention has the advantages that through the steps and the watershed segmentation method replacing the breadth-first search (BFS) algorithm in the original PSENet algorithm to carry out post-processing on a plurality of text segmentation examples, the algorithm time complexity is effectively reduced to O (N), the problem that the breadth-first algorithm in the PSENet algorithm flow is adopted to carry out pixel-by-pixel four-neighborhood search and combination on each text segmentation example is solved, and the algorithm time complexity in the detection stage can reach O (N) due to the fact that the algorithm time complexity in the detection stage reaches O (N)2) The method has the advantages of low detection speed and low efficiency, thereby improving the image processing speed and accelerating the efficiency.
The invention also provides an image text detection and OCR recognition system, which comprises a preprocessing module: the image preprocessing module is used for preprocessing the image to obtain training data; training a network building module: the training network is used for extracting the preliminary features of the training data to obtain a return result and building a training network according to the return result; a training module: the training network is used for calling the training network by a training model to train the training data so as to obtain a plurality of text segmentation examples; a processing module: the detection and recognition are completed by processing a plurality of text segmentation examples through a watershed segmentation algorithm. The image text detection and OCR recognition system provided by the invention improves the image processing speed and accelerates the efficiency.
The present invention also provides a computer readable storage medium having stored thereon computer instructions, which when executed by a processor, implement an image text detection and OCR recognition method as described in any of the above.
In specific implementation, the computer-readable storage medium is a magnetic Disk, an optical Disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid-State Drive (SSD), or the like; the computer readable storage medium may also include a combination of memories of the above kinds.
Although terms such as training data, preliminary features, training networks, training models, text segmentation instances, watershed segmentation, etc. are used more often herein, the possibility of using other terms is not excluded. These terms are used merely to more conveniently describe and explain the nature of the present invention; they are to be construed as being without limitation to any additional limitations that may be imposed by the spirit of the present invention.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. An image text detection and OCR recognition method is characterized in that: the method comprises the following steps:
preprocessing the picture to obtain training data;
extracting the preliminary features of the training data to obtain a return result and building a training network according to the return result;
the training model calls the training network to train the training data to obtain a plurality of text segmentation examples;
and processing a plurality of text segmentation examples by a watershed segmentation method to finish detection and identification.
2. An image text detection and OCR recognition method according to claim 1 and characterized in that: marking a text area of the picture, wherein the picture marked with the text area is an original text coordinate label; and processing the original text coordinate labels to generate a plurality of text segmentation kernels with similar shapes, the same central points and different sizes as training data of the training network.
3. An image text detection and OCR recognition method according to claim 1 and characterized in that: the training network is a PSENet forward network;
and extracting the preliminary features of the training data by loading a feature extraction model to obtain a return result, inputting the return result into a PSENet forward network, and constructing the PSENet forward network by using a feature space pyramid network according to a top-down mode.
4. An image text detection and OCR recognition method according to claim 3 and characterized in that: the training model calls the training network to train the training data to obtain a plurality of text segmentation examples, and the method comprises the following steps:
training preparation: setting a hyper-parameter, selecting an optimizer, and setting a mode for reading the training data into the training model;
training process: calling a PSENet forward network, calculating the current loss situation through comparison with a real label and a loss function, calculating and updating network parameter gradient by adopting an optimizer, carrying out iterative training until the ideal precision is reached, and carrying out persistence on the model;
and outputting a plurality of text segmentation examples after training is completed.
5. An image text detection and OCR recognition method according to claim 4 and characterized in that: and (3) defining a loss function by using dice coefficient, screening out samples with poor detection effect according to the loss of training data transmitted into the model, extracting and combining the screened samples with poor detection effect, and training in random gradient descent.
6. An image text detection and OCR recognition method according to claim 1 and characterized in that: processing a plurality of text segmentation examples by a watershed segmentation method to determine a final text line region and a final background region, comprising the following steps:
acquiring a foreground image mark, a background image mark and an uncertain region;
and operating a watershed segmentation algorithm to process the uncertain area to obtain a final text line area and a final background area.
7. An image text detection and OCR recognition method according to claim 6 and further comprising: the method for acquiring the foreground image mark, the background image mark and the uncertain region comprises the following steps:
marking pixels inside the minimum text segmentation example as a foreground area, and setting the pixel value of the area to be 255;
marking pixels outside the maximum text segmentation instance as a background region and setting the pixel value of the region to 128;
the region between the minimum text segmentation instance and the maximum text segmentation instance is taken as an uncertain region, and the pixel value of the region is set to 0.
8. An image text detection and OCR recognition method according to claim 6 and further comprising: the method for processing the uncertain region by running the watershed segmentation algorithm to obtain the final text line region and the final background region comprises the following steps:
sequencing pixels in the gradient image of the uncertain region to obtain a geodesic distance threshold of a watershed segmentation algorithm, and marking the minimum value of the uncertain region as the lowest point;
continuously increasing the geodesic distance, screening out pixels smaller than the geodesic distance value, and if the distance from the screened pixels to the lowest point is smaller than a geodesic distance threshold value, submerging; otherwise, taking the gray value of the screened pixel as a local threshold, namely constructing a dam and completing the classification of the text region and the non-text region of the local region;
and the geodesic distance is continuously increased until the maximum value of the gray value, so that the separation of the text region from the background is completed, and the classification attribution judgment of all pixels is completed.
9. An image text detection and OCR recognition device, characterized by: comprises that
A preprocessing module: the image preprocessing module is used for preprocessing the image to obtain training data;
training a network building module: the training network is used for extracting the preliminary features of the training data to obtain a return result and building a training network according to the return result;
a training module: the training network is used for calling the training network by a training model to train the training data so as to obtain a plurality of text segmentation examples;
a processing module: the detection and recognition are completed by processing a plurality of text segmentation examples through a watershed segmentation algorithm.
10. A computer-readable storage medium characterized by: the computer readable storage medium stores computer instructions which, when executed by a processor, implement a method of image text detection and OCR recognition as recited in any one of claims 1-8.
CN202111118174.4A 2021-09-22 2021-09-22 Image text detection and OCR recognition method, device and storage medium Pending CN113837168A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111118174.4A CN113837168A (en) 2021-09-22 2021-09-22 Image text detection and OCR recognition method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111118174.4A CN113837168A (en) 2021-09-22 2021-09-22 Image text detection and OCR recognition method, device and storage medium

Publications (1)

Publication Number Publication Date
CN113837168A true CN113837168A (en) 2021-12-24

Family

ID=78969694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111118174.4A Pending CN113837168A (en) 2021-09-22 2021-09-22 Image text detection and OCR recognition method, device and storage medium

Country Status (1)

Country Link
CN (1) CN113837168A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116630755A (en) * 2023-04-10 2023-08-22 雄安创新研究院 Method, system and storage medium for detecting text position in scene image
CN116863482A (en) * 2023-09-05 2023-10-10 华立科技股份有限公司 Mutual inductor detection method, device, equipment and storage medium
CN116935394A (en) * 2023-07-27 2023-10-24 南京邮电大学 Train carriage number positioning method based on PSENT region segmentation

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011128070A (en) * 2009-12-18 2011-06-30 Hitachi High-Technologies Corp Image processing device, measuring/testing system, and program
CN102725773A (en) * 2009-12-02 2012-10-10 惠普发展公司,有限责任合伙企业 System and method of foreground-background segmentation of digitized images
US20150078648A1 (en) * 2013-09-13 2015-03-19 National Cheng Kung University Cell image segmentation method and a nuclear-to-cytoplasmic ratio evaluation method using the same
CN110008950A (en) * 2019-03-13 2019-07-12 南京大学 The method of text detection in the natural scene of a kind of pair of shape robust
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN110766008A (en) * 2019-10-29 2020-02-07 北京华宇信息技术有限公司 Text detection method facing any direction and shape
CN111145209A (en) * 2019-12-26 2020-05-12 北京推想科技有限公司 Medical image segmentation method, device, equipment and storage medium
CN111738256A (en) * 2020-06-02 2020-10-02 上海交通大学 Composite material CT image segmentation method based on improved watershed algorithm
CN111798480A (en) * 2020-07-23 2020-10-20 北京思图场景数据科技服务有限公司 Character detection method and device based on single character and character connection relation prediction
US20210034700A1 (en) * 2019-07-29 2021-02-04 Intuit Inc. Region proposal networks for automated bounding box detection and text segmentation

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102725773A (en) * 2009-12-02 2012-10-10 惠普发展公司,有限责任合伙企业 System and method of foreground-background segmentation of digitized images
JP2011128070A (en) * 2009-12-18 2011-06-30 Hitachi High-Technologies Corp Image processing device, measuring/testing system, and program
US20150078648A1 (en) * 2013-09-13 2015-03-19 National Cheng Kung University Cell image segmentation method and a nuclear-to-cytoplasmic ratio evaluation method using the same
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN110008950A (en) * 2019-03-13 2019-07-12 南京大学 The method of text detection in the natural scene of a kind of pair of shape robust
US20210034700A1 (en) * 2019-07-29 2021-02-04 Intuit Inc. Region proposal networks for automated bounding box detection and text segmentation
CN110766008A (en) * 2019-10-29 2020-02-07 北京华宇信息技术有限公司 Text detection method facing any direction and shape
CN111145209A (en) * 2019-12-26 2020-05-12 北京推想科技有限公司 Medical image segmentation method, device, equipment and storage medium
CN111738256A (en) * 2020-06-02 2020-10-02 上海交通大学 Composite material CT image segmentation method based on improved watershed algorithm
CN111798480A (en) * 2020-07-23 2020-10-20 北京思图场景数据科技服务有限公司 Character detection method and device based on single character and character connection relation prediction

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
WENHAI WANG等: "Shape Robust Text Detection with Progressive Scale Expansion Network", 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), pages 9328 - 9337 *
程序员阿德: "图像分割的经典算法:分水岭算法", pages 1 - 7, Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/67741538?utm_id=0,知乎> *
运动小爽: "使用watershed作为psenet的后处理", pages 1, Retrieved from the Internet <URL:https://www.jianshu.com/p/ed750a1c488c?utm_campaign=maleskine&utm_content=note&utm_medium=seo_notes&utm_source=recommendation,简书> *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116630755A (en) * 2023-04-10 2023-08-22 雄安创新研究院 Method, system and storage medium for detecting text position in scene image
CN116630755B (en) * 2023-04-10 2024-04-02 雄安创新研究院 Method, system and storage medium for detecting text position in scene image
CN116935394A (en) * 2023-07-27 2023-10-24 南京邮电大学 Train carriage number positioning method based on PSENT region segmentation
CN116935394B (en) * 2023-07-27 2024-01-02 南京邮电大学 Train carriage number positioning method based on PSENT region segmentation
CN116863482A (en) * 2023-09-05 2023-10-10 华立科技股份有限公司 Mutual inductor detection method, device, equipment and storage medium
CN116863482B (en) * 2023-09-05 2023-12-19 华立科技股份有限公司 Mutual inductor detection method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
EP3620979B1 (en) Learning method, learning device for detecting object using edge image and testing method, testing device using the same
CN110428428B (en) Image semantic segmentation method, electronic equipment and readable storage medium
CN113837168A (en) Image text detection and OCR recognition method, device and storage medium
CN109376681B (en) Multi-person posture estimation method and system
Abdollahi et al. Improving road semantic segmentation using generative adversarial network
CN113111871B (en) Training method and device of text recognition model, text recognition method and device
CN108510504B (en) Image segmentation method and device
CN108009554A (en) A kind of image processing method and device
CN113076871A (en) Fish shoal automatic detection method based on target shielding compensation
CN116645592B (en) Crack detection method based on image processing and storage medium
CN112991280B (en) Visual detection method, visual detection system and electronic equipment
CN112861915A (en) Anchor-frame-free non-cooperative target detection method based on high-level semantic features
CN114821778A (en) Underwater fish body posture dynamic recognition method and device
CN114445715A (en) Crop disease identification method based on convolutional neural network
CN116311310A (en) Universal form identification method and device combining semantic segmentation and sequence prediction
CN112991281B (en) Visual detection method, system, electronic equipment and medium
CN112241736A (en) Text detection method and device
CN111967408B (en) Low-resolution pedestrian re-identification method and system based on prediction-recovery-identification
CN113191237A (en) Improved YOLOv 3-based fruit tree image small target detection method and device
CN111738069A (en) Face detection method and device, electronic equipment and storage medium
Samudrala et al. Semantic Segmentation in Medical Image Based on Hybrid Dlinknet and Unet
CN114724175A (en) Pedestrian image detection network, detection method, training method, electronic device, and medium
CN114359739A (en) Target identification method and device
CN111161250B (en) Method and device for detecting dense houses by using multi-scale remote sensing images
CN114648628A (en) Apple maturity detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination