CN113065547A - Character supervision information-based weak supervision text detection method - Google Patents

Character supervision information-based weak supervision text detection method Download PDF

Info

Publication number
CN113065547A
CN113065547A CN202110262361.3A CN202110262361A CN113065547A CN 113065547 A CN113065547 A CN 113065547A CN 202110262361 A CN202110262361 A CN 202110262361A CN 113065547 A CN113065547 A CN 113065547A
Authority
CN
China
Prior art keywords
character
text
network
supervision
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110262361.3A
Other languages
Chinese (zh)
Inventor
刘义江
陈蕾
侯栋梁
池建昆
范辉
阎鹏飞
魏明磊
李云超
姜琳琳
辛锐
陈曦
杨青
沈静文
吴彦巧
姜敬
檀小亚
师孜晗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiongan New Area Power Supply Company State Grid Hebei Electric Power Co
State Grid Hebei Electric Power Co Ltd
Original Assignee
Xiongan New Area Power Supply Company State Grid Hebei Electric Power Co
State Grid Hebei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiongan New Area Power Supply Company State Grid Hebei Electric Power Co, State Grid Hebei Electric Power Co Ltd filed Critical Xiongan New Area Power Supply Company State Grid Hebei Electric Power Co
Priority to CN202110262361.3A priority Critical patent/CN113065547A/en
Publication of CN113065547A publication Critical patent/CN113065547A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/155Segmentation; Edge detection involving morphological operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention discloses a weak supervision text detection method based on character supervision information, which relates to the field of text detection, in particular to a weak supervision text detection method based on character supervision information, and comprises the following steps: carrying out feature extraction on the backbone network; up-sampling the extracted features; generating a character-level label; outputting a character region probability graph and a text center line; obtaining a connected region with a high response value, and then expanding the region to obtain a complete character boundary; and traversing the text center line, connecting all points in each text region, and smoothing to obtain a final detection region. The method can be applied to the text detection problem in various scenes, and the position of each character can be accurately positioned by means of the character detection result, so that higher detection precision is obtained. The weak supervision learning mode enables the whole network to iterate continuously, and finally a good convergence effect is achieved.

Description

Character supervision information-based weak supervision text detection method
Technical Field
The invention belongs to the field of text detection, and particularly relates to a weak supervision text detection method based on character supervision information.
Background
Text detection has been attracting much attention of researchers as a key step in OCR technology. The purpose of text detection is to accurately locate the position of characters in the picture and detect specific coordinate information for a subsequent recognition model to recognize. At present, the method has a great deal of application in the fields of automatic driving and picture retrieval. The traditional text detection technology mainly aims at a printed matter, converts an optical file into an image file by utilizing scanning equipment, converts the image file into a character dot matrix format, and further edits and processes a subsequent processing algorithm. However, with the development of the times, the current processing objects gradually evolve into text detection in natural scenes, the environment is more complex, and the fonts are more changeable. For such real scenes, the previous methods have great limitations.
For the text detection problem in natural scenes, the existing detection technology mainly uses a regression or segmentation method and takes words as basic units to directly obtain the region of the whole word. The methods can well process texts with smaller intervals, however, the space between each character in words of many practical application scenes is larger, and when the words are processed on the basis, complete text boundary information is difficult to obtain, so that the overall detection effect is influenced. The patent mainly solves the problem of text detection in complex scenes.
Disclosure of Invention
The invention provides a weak supervision text detection method based on character supervision information, which is used for solving the problems of detection of complex background and variable fonts in a natural scene in the prior art.
The invention adopts the following technical scheme:
the technical scheme of the invention mainly comprises two parts: the first part is a process of taking characters as learning targets and extracting word central line features, and the second part is a process of combining single characters and word central line based post-processing into a complete word. In the first part, ResNet34 added with a cavity convolution layer is adopted for feature extraction, a reverse U-shaped structure is utilized for semantic information enhancement, a feature map of each character area and a feature map of a word center line are obtained, the fact that most data sets are not labeled at a character level is considered, a weak supervision mode is introduced, character information is continuously generated in the training process in an iteration mode, and meanwhile confidence level setting is added to mark the quality of a weak supervision generated result. In the second part, the character feature map is used to restore complete characters, the word central lines are used to connect characters belonging to the same word, and finally the boundary is smoothed to obtain the final text region.
A weak supervision text detection method based on character supervision information comprises the following steps:
s100: carrying out feature extraction on the backbone network;
s200: upsampling the extracted features through an upsampling network;
s300: generating character-level labels for the obtained sampling data by a watershed algorithm in a weak supervision mode;
s400: outputting a character region probability graph and a text center line after the characteristics fused by the up-sampling network pass through four layers of convolutional layers;
s500: after the character probability graph is obtained, a connected region with a high response value is obtained by using opencv, and then the region is expanded by using a Vatti algorithm to obtain a complete character boundary;
s600: traversing a text center line, processing the characters passed by the center line as the same text, respectively taking four points of upper left, upper right, lower right and lower left on the boundary of each character, and finally sequencing and connecting all the points in each text region and smoothing to obtain a final detection region.
Further, the backbone network is a ResNet34 network.
Further, three convolutional layers are embedded as a block and replace the third layer of the ResNet34 network, each convolutional layer replaces the standard convolution with a hole convolution kernel, and the hole rates are set to 1, 2 and 3 respectively.
Further, the ResNet34 network adds a layer to further feature extraction.
Furthermore, the up-sampling network consists of four blocks, and each block performs convolution operation on the extracted features twice and then performs up-sampling; and the output result of each block and the output of the block corresponding to the backbone network are added according to the bit and then input into the next block.
Further, the weak supervision generates the label of the character set by the following process: the corresponding word part is intercepted according to the provided coordinate information, and then the position information of each character is obtained by utilizing a watershed algorithm and is sent into a network as marking information to participate in training.
Further, after the character result is generated, a confidence level is generated, and the confidence level is used for measuring whether the result generated this time is credible, and the calculation formula is as follows:
Figure BDA0002969312080000021
l (w) represents the number of characters predicted in the word w, and lc (w) represents the number of characters contained in the word w in the real label, and when the predicted number of characters is the same as the original word, the result is considered to be completely credible.
Further, at S100: before the feature extraction of the backbone network, the method also comprises the step of S90: and a picture size step, namely adjusting the pictures to be in a uniform size, and processing the pictures with the sizes not meeting the requirements by using a bilinear interpolation method and/or a data augmentation method.
Further, the data augmentation method includes: and randomly rotating a certain angle to change the brightness of the image and randomly adjusting the saturation of the picture.
Further, before the step of resizing the picture at S90, the method further comprises a step of making a weakly supervised training label, by which a region probability distribution map of each character and a text center line are generated, at S80.
(1) After the picture is input, feature extraction is carried out through a backbone network. In this approach, we have chosen ResNet34 as the backbone network based on runtime and final accuracy considerations. To increase the network's field of reception while retaining as much detailed information as possible, we replaced the convolutional layer of the third layer of the ResNet 34. We rebuild three convolution layers as a new block to replace the block in the original third layer, each convolution layer uses a cavity convolution kernel to replace the standard convolution, and the cavity rate is set to be 1, 2 and 3 respectively. The use of a hole convolution kernel further increases the ability of the network to extract large features. In addition, we add a layer to further feature extraction.
(2) After the feature extraction is completed, an up-sampling module is added. The module can fuse the spatial information of the high-resolution image and the semantic information of the low-resolution image, so that the generalization capability of the whole network is improved. The network consists of four blocks, and each block performs convolution operation on the characteristics twice and then performs up-sampling. And the output result of each block and the output of the block corresponding to the backbone network are added according to the bit and then input into the next block.
(3) And (4) weak supervision learning. Because the labeling cost of the character level is too large, the existing real data set almost mainly takes the labeling of the word level as the main part, so the method adopts a weak supervision mode and iteratively generates the character-connected labels in the training process. The generation process comprises the steps of firstly intercepting a corresponding word part according to the provided coordinate information, inputting the word part into the network, then obtaining the position information of each character by using a watershed algorithm, and then sending the generated result back to the network to be used as a label for training. After the character result is generated, a confidence coefficient is generated, and the confidence coefficient is used for measuring whether the result generated by the watershed algorithm is credible or not.
(4) And outputting a character area probability graph and a text center line after the characteristics fused by the up-sampling network pass through four layers of convolutional layers. And sending the generated character result back to the network to repeat (2) to (5) for direct network convergence.
(5) And after the character probability graph is obtained, a connected region with a high response value is obtained by using opencv, and the region is expanded by using a Vatti algorithm to obtain a complete character boundary. And then traversing the text center line, wherein the characters passed by the center line are treated as the same text. And (3) respectively taking four points of upper left, upper right, lower right and lower left on each character boundary, and finally sequencing and connecting all the points in each text region and smoothing to obtain a final detection region.
The invention has the following positive effects:
a weak supervision text detection method based on character supervision information comprises the following steps:
s100: carrying out feature extraction on the backbone network;
s200: upsampling the extracted features through an upsampling network;
s300: generating character-level labels for the obtained sampling data by a watershed algorithm in a weak supervision mode;
s400: outputting a character region probability graph and a text center line after the characteristics fused by the up-sampling network pass through four layers of convolutional layers;
s500: after the character probability graph is obtained, a connected region with a high response value is obtained by using opencv, and then the region is expanded by using a Vatti algorithm to obtain a complete character boundary;
s600: traversing a text center line, processing the characters passed by the center line as the same text, respectively taking four points of upper left, upper right, lower right and lower left on the boundary of each character, and finally sequencing and connecting all the points in each text region and smoothing to obtain a final detection region.
The method can be applied to the text detection problem in various scenes, and the position of each character can be accurately positioned by means of the character detection result, so that higher detection precision is obtained. The weak supervision learning mode enables the whole network to iterate continuously, and finally a good convergence effect is achieved. The text center line is used as a learning target, so that the difficulty of network training is reduced, the network has a good effect on horizontal texts, and the texts under various inclined and bent conditions can be well detected. In addition, the network has better generalization capability, can be directly used in other scenes after training in one scene, and is also very effective for texts which are difficult to detect under weak illumination.
Drawings
Fig. 1 is a structural diagram of a backbone network ResNet34 according to an embodiment of the present invention;
FIG. 2 is a block diagram of an upsampling module in accordance with an embodiment of the present invention;
FIG. 3 is a diagram illustrating a process of transforming a two-dimensional Gaussian distribution into a quadrilateral frame by perspective transformation according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 1-3, a weak supervised text detection method based on character supervision information includes the following steps:
s100: carrying out feature extraction on the backbone network;
s200: upsampling the extracted features through an upsampling network;
s300: generating character-level labels for the obtained sampling data by a watershed algorithm in a weak supervision mode;
s400: outputting a character region probability graph and a text center line after the characteristics fused by the up-sampling network pass through four layers of convolutional layers;
s500: after the character probability graph is obtained, a connected region with a high response value is obtained by using opencv, and then the region is expanded by using a Vatti algorithm to obtain a complete character boundary;
s600: traversing a text center line, processing the characters passed by the center line as the same text, respectively taking four points of upper left, upper right, lower right and lower left on the boundary of each character, and finally sequencing and connecting all the points in each text region and smoothing to obtain a final detection region.
Further, the backbone network is a ResNet34 network.
Further, three convolutional layers are embedded as a block and replace the third layer of the ResNet34 network, each convolutional layer replaces the standard convolution with a hole convolution kernel, and the hole rates are set to 1, 2 and 3 respectively.
Further, the ResNet34 network adds a layer to further feature extraction.
Furthermore, the up-sampling network consists of four blocks, and each block performs convolution operation on the extracted features twice and then performs up-sampling; and the output result of each block and the output of the block corresponding to the backbone network are added according to the bit and then input into the next block.
Further, the weak supervision generates the label of the character set by the following process: the corresponding word part is intercepted according to the provided coordinate information, and then the position information of each character is obtained by utilizing a watershed algorithm and is sent into a network as marking information to participate in training.
Furthermore, a confidence level is generated after the character result is predicted, the confidence level is used for measuring whether the result generated this time is credible, and the calculation formula is as follows:
Figure BDA0002969312080000051
l (w) represents the number of characters predicted in the word w, and lc (w) represents the number of characters contained in the word w in the real label, and when the predicted number of characters is the same as the original word, the result is considered to be completely credible.
Further, at S100: before the feature extraction of the backbone network, the method also comprises the step of S90: and a picture size step, namely adjusting the pictures to be in a uniform size, and processing the pictures with the sizes not meeting the requirements by using a bilinear interpolation method and/or a data augmentation method.
Further, the data augmentation method includes: and randomly rotating a certain angle to change the brightness of the image and randomly adjusting the saturation of the picture.
Further, before the step of resizing the picture at S90, the method further comprises a step of making a weakly supervised training label, by which a region probability distribution map of each character and a text center line are generated, at S80.
The technical scheme of the invention mainly comprises two parts: the first part is a process of taking characters as learning targets and extracting word central line features, and the second part is a process of combining single characters and word central line based post-processing into a complete word. In the first part, ResNet34 added with a cavity convolution layer is adopted for feature extraction, a reverse U-shaped structure is utilized for semantic information enhancement, a feature map of each character area and a feature map of a word central line are obtained, the fact that most data sets are not labeled at a character level is considered, a weak supervision mode is introduced, character information is continuously generated through a watershed algorithm in the training process, and meanwhile confidence level setting is added to mark the quality of a weak supervision generated result. In the second part, the character feature map is used to restore complete characters, the word central lines are used to connect characters belonging to the same word, and finally the boundary is smoothed to obtain the final text region.
The text detection method comprises the following main steps:
(1) after the picture is input, feature extraction is carried out through a backbone network. In this approach, we have chosen ResNet34 as the backbone network based on runtime and final accuracy considerations. To increase the network's field of reception while retaining as much detailed information as possible, we replaced the convolutional layer of the third layer of the ResNet 34. Three convolutional layers are built again and embedded into a third layer as a block, each convolutional layer uses a hole convolutional kernel to replace standard convolution, and the hole rates are respectively set to be 1, 2 and 3. The use of a hole convolution kernel further increases the ability of the network to extract large features. In addition, we add a layer to further feature extraction. The adjusted network is shown in figure 1.
(2) After the feature extraction is completed, an up-sampling module is added. The module can fuse the spatial information of the high-resolution image and the semantic information of the low-resolution image, so that the generalization capability of the whole network is improved. The network consists of four blocks, and each block performs convolution operation on the characteristics twice and then performs up-sampling. And the output result of each block and the output of the block corresponding to the backbone network are added according to the bit and then input into the next block. The structure of which is shown in fig. 2.
(3) And (4) weak supervision learning. Because the labeling cost of the character level is too large, the existing real data set almost mainly takes the labeling of the word level as the main part, so the method adopts a weak supervision mode and iteratively generates the character-connected labels in the training process. The generation process comprises the steps of firstly intercepting a corresponding word part according to the provided coordinate information, inputting the word part into the network, then obtaining the position information of each character by using a watershed algorithm, and then sending the generated result back to the network to be used as a label for training. After the character result is generated, a confidence coefficient is generated, the confidence coefficient is used for measuring whether the result generated by the watershed algorithm is credible or not, and the calculation formula is as follows:
Figure BDA0002969312080000061
l (w) represents the number of predicted characters in the word w, and lc (w) represents the number of characters contained in the word w in the real label. When the predicted number of characters is the same as the original word, we consider the result to be completely credible.
(4) And outputting a character area probability graph and a text center line after the characteristics fused by the up-sampling network pass through four layers of convolutional layers. And sending the generated character result back to the network to repeat (2) to (5) for direct network convergence.
(5) And after the character probability graph is obtained, a connected region with a high response value is obtained by using opencv, and the region is expanded by using a Vatti algorithm to obtain a complete character boundary. And then traversing the text center line, wherein the characters passed by the center line are treated as the same text. And (3) respectively taking four points of upper left, upper right, lower right and lower left on each character boundary, and finally sequencing and connecting all the points in each text region and smoothing to obtain a final detection region.
The following is a specific embodiment of the present invention, and the present invention provides a text detection method based on weak supervision of character supervision information, which comprises the following specific processes:
manufacturing a training label based on weak supervision:
1. the label comprises a probability distribution graph and a text center line. For each graph, we need to generate a regional probability distribution map for each character, taking into account that there are also differences in the center and edges inside the text. Probability can be expressed by using continuous two-dimensional Gaussian distribution, pixel points positioned in the center of the character have higher position scores, and pixel points positioned at the edge of the character have lower position scores, so that the position information of the pixel points can be fully utilized. However, since the shape of the character is generally not regular, the two-dimensional gaussian distribution needs to be transformed into a quadrilateral frame through perspective transformation, and the process is as shown in fig. 3.
The centerline is generated from a pre-provided real tag. And uniformly sampling 10 points in each of the upper and lower two sides of the real frame, calculating the center point of each point in sequence, and then using a line formed by connecting 12 points in total in the middle points of the sides as the center line of the text.
2. Scene text picture preprocessing
The picture size is fixed to 800 x 800 during training, and pictures with sizes not meeting the requirements are processed by a bilinear interpolation method. The data amplification method used by the method comprises the following steps: and randomly rotating a certain angle to change the brightness of the image and randomly adjusting the saturation of the image.
3. Character-level scene text picture feature extraction based on weak supervision
And (4) inputting tensor data obtained after picture preprocessing into ResNet34 for feature extraction. Where the third layer of the original ResNet34 is replaced by a block of hole convolved components. In addition, the method adds a layer additionally for enhancing feature extraction.
4. Feature semantic information enhancement based on up-sampling module
The ResNet34 network is used to extract spatial features, however semantic information in the training process can assist in recognizing different sized text. Therefore, four upsampling modules are added for feature fusion. After the features are input into the module, the channel dimension increasing is carried out through convolution of 1 × 1, and then feature processing is carried out through convolution of 3 × 3, wherein regularization operation is added to each convolution operation to prevent overfitting. Finally, the size of the characteristic diagram is enlarged through an up-sampling operation, and the enlarged characteristic diagram is added with the output result of ResNet34 and then input into the next block.
5. Character post-processing
And after the model is converged, outputting the prediction structure and the text center line of each final character region through a deconvolution module. And taking out the center of the character according to the Gaussian thermodynamic diagram, and expanding the center by using a Vatti algorithm to obtain a complete character region and obtain a boundary coordinate point. Then, using the centerline information, the characters belonging to the same centerline (text) are recorded in the same set. Based on the character set, four vertexes are taken from each character boundary, all vertexes are finally ordered according to the clockwise direction, and the boundary of the modified text is obtained after smoothing processing and is used as a final detection result.
6. Model training
The optimization objectives in the model training process are as follows:
Figure BDA0002969312080000081
where sc (p) represents the confidence level, Sr (p) and Sr x (p) represent the predicted probability value and the generated true probability value, respectively. In addition, the optimizer chooses SGD to compute the gradient and does the back propagation. The trained batch size is set to 10, for a total of 800 epochs.
7. Model application
After the training process, a plurality of models can be obtained, the optimal model (the optimal objective function value is the minimum) is selected for application, data enhancement is not needed when the model is applied, and only the image needs to be adjusted to be 800 x 800, and the normalized image can be used as the input of the model. Parameters of the whole network model are fixed, and the detection result of the text content in the image can be obtained after the input image is subjected to feature extraction through the neural network and post-processing.
The above embodiments are merely preferred examples of the present invention and are not exhaustive of the possible implementations of the present invention. Any obvious modifications to the above would be obvious to those of ordinary skill in the art, but would not bring the invention so modified beyond the spirit and scope of the present invention.

Claims (10)

1. A weak supervision text detection method based on character supervision information is characterized by comprising the following steps:
s100: carrying out feature extraction on the backbone network; s200: upsampling the extracted features through an upsampling network;
s300: generating character-level labels for the obtained sampling data by a watershed algorithm in a weak supervision mode;
s400: outputting a character region probability graph and a text center line after the characteristics fused by the up-sampling network pass through four layers of convolutional layers;
s500: after the character probability graph is obtained, a connected region with a high response value is obtained by using opencv, and then the region is expanded by using a Vatti algorithm to obtain a complete character boundary;
s600: traversing a text center line, processing the characters passed by the center line as the same text, respectively taking four points of upper left, upper right, lower right and lower left on the boundary of each character, and finally sequencing and connecting all the points in each text region and smoothing to obtain a final detection region.
2. The method of claim 1, wherein the backbone network is a ResNet34 network.
3. The method of claim 2, wherein three convolutional layers are embedded as a block and replace the third layer of the ResNet34 network, each convolutional layer replaces the standard convolution with a hole convolution kernel, and the hole rates are set to 1, 2 and 3 respectively.
4. The method for detecting weakly supervised character supervision information based on character supervision information as claimed in claim 3, wherein a layer is additionally added to the ResNet34 network for further feature extraction.
5. The weak supervision text detection method based on character supervision information as claimed in claim 4, characterized in that the up-sampling network is composed of four blocks, each block performs convolution operation twice on the extracted features, and then performs up-sampling; and the output result of each block and the output of the block corresponding to the backbone network are added according to the bit and then input into the next block.
6. The method for detecting weak supervision text based on character supervision information as claimed in claim 5, wherein the weak supervision generates labels of character sets by: the corresponding word part is intercepted according to the provided coordinate information, and then the position information of each character is obtained by utilizing a watershed algorithm and is sent into a network as marking information to participate in training.
7. The method as claimed in claim 6, wherein a confidence level is generated after the character result is generated, and the confidence level is used to measure whether the result generated this time is reliable, and the calculation formula is:
Figure FDA0002969312070000011
l (w) denotes the number of characters predicted in respect of the word w, lc(w) represents the number of characters contained in the word w in the real label, and when the predicted number of characters is the same as the original word, the result is considered to be completely credible.
8. The character supervision information-based weakly supervised text detection method as recited in claim 7, wherein at S100: before the feature extraction of the backbone network, the method also comprises the step of S90: and a picture size step, namely adjusting the pictures to be in a uniform size, and processing the pictures with the sizes not meeting the requirements by using a bilinear interpolation method and/or a data augmentation method.
9. The method for detecting weakly supervised text based on character supervision information as recited in claim 8, wherein the data augmentation mode comprises: and randomly rotating a certain angle to change the brightness of the image and randomly adjusting the saturation of the picture.
10. The method for detecting weakly supervised character information based text as recited in claim 9, further comprising, before the picture resizing step of S90, a weakly supervised training label preparation step of S80, by which a region probability distribution map of each character and a text center line are generated.
CN202110262361.3A 2021-03-10 2021-03-10 Character supervision information-based weak supervision text detection method Pending CN113065547A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110262361.3A CN113065547A (en) 2021-03-10 2021-03-10 Character supervision information-based weak supervision text detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110262361.3A CN113065547A (en) 2021-03-10 2021-03-10 Character supervision information-based weak supervision text detection method

Publications (1)

Publication Number Publication Date
CN113065547A true CN113065547A (en) 2021-07-02

Family

ID=76560288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110262361.3A Pending CN113065547A (en) 2021-03-10 2021-03-10 Character supervision information-based weak supervision text detection method

Country Status (1)

Country Link
CN (1) CN113065547A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147786A (en) * 2019-04-11 2019-08-20 北京百度网讯科技有限公司 For text filed method, apparatus, equipment and the medium in detection image
CN111553346A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 Scene text detection method based on character region perception
CN111798480A (en) * 2020-07-23 2020-10-20 北京思图场景数据科技服务有限公司 Character detection method and device based on single character and character connection relation prediction
WO2020215236A1 (en) * 2019-04-24 2020-10-29 哈尔滨工业大学(深圳) Image semantic segmentation method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110147786A (en) * 2019-04-11 2019-08-20 北京百度网讯科技有限公司 For text filed method, apparatus, equipment and the medium in detection image
WO2020215236A1 (en) * 2019-04-24 2020-10-29 哈尔滨工业大学(深圳) Image semantic segmentation method and system
CN111553346A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 Scene text detection method based on character region perception
CN111798480A (en) * 2020-07-23 2020-10-20 北京思图场景数据科技服务有限公司 Character detection method and device based on single character and character connection relation prediction

Similar Documents

Publication Publication Date Title
CN111723585B (en) Style-controllable image text real-time translation and conversion method
Jiang et al. Edge-enhanced GAN for remote sensing image superresolution
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN110428428B (en) Image semantic segmentation method, electronic equipment and readable storage medium
CN110322495B (en) Scene text segmentation method based on weak supervised deep learning
CN108416266B (en) Method for rapidly identifying video behaviors by extracting moving object through optical flow
CN109902748A (en) A kind of image, semantic dividing method based on the full convolutional neural networks of fusion of multi-layer information
CN113609896B (en) Object-level remote sensing change detection method and system based on dual-related attention
CN111612008A (en) Image segmentation method based on convolution network
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN113673338B (en) Automatic labeling method, system and medium for weak supervision of natural scene text image character pixels
CN112396607A (en) Streetscape image semantic segmentation method for deformable convolution fusion enhancement
CN112541491B (en) End-to-end text detection and recognition method based on image character region perception
CN111310766A (en) License plate identification method based on coding and decoding and two-dimensional attention mechanism
CN111401380A (en) RGB-D image semantic segmentation method based on depth feature enhancement and edge optimization
CN113159023A (en) Scene text recognition method based on explicit supervision mechanism
CN111652273A (en) Deep learning-based RGB-D image classification method
CN116596966A (en) Segmentation and tracking method based on attention and feature fusion
CN116645592A (en) Crack detection method based on image processing and storage medium
CN114581918A (en) Text recognition model training method and device
CN111476226B (en) Text positioning method and device and model training method
CN116883650A (en) Image-level weak supervision semantic segmentation method based on attention and local stitching
CN114694133B (en) Text recognition method based on combination of image processing and deep learning
Li et al. A new algorithm of vehicle license plate location based on convolutional neural network
CN114708591A (en) Document image Chinese character detection method based on single character connection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210702

RJ01 Rejection of invention patent application after publication