CN111242120A - Character detection method and system - Google Patents

Character detection method and system Download PDF

Info

Publication number
CN111242120A
CN111242120A CN202010008296.7A CN202010008296A CN111242120A CN 111242120 A CN111242120 A CN 111242120A CN 202010008296 A CN202010008296 A CN 202010008296A CN 111242120 A CN111242120 A CN 111242120A
Authority
CN
China
Prior art keywords
network
texture information
suggestion
contour point
cutting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010008296.7A
Other languages
Chinese (zh)
Other versions
CN111242120B (en
Inventor
张勇东
王裕鑫
谢洪涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Research Institute
University of Science and Technology of China USTC
Original Assignee
Beijing Zhongke Research Institute
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Research Institute, University of Science and Technology of China USTC filed Critical Beijing Zhongke Research Institute
Priority to CN202010008296.7A priority Critical patent/CN111242120B/en
Publication of CN111242120A publication Critical patent/CN111242120A/en
Application granted granted Critical
Publication of CN111242120B publication Critical patent/CN111242120B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2193Validation; Performance evaluation; Active pattern learning techniques based on specific statistical tests
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Abstract

A character detection method and system, the method includes: performing feature extraction on an input image to obtain a feature image; predicting by using a self-adaptive regional suggestion network to obtain a suggestion frame; utilizing the suggestion frame to cut the feature image to obtain a cut feature image; respectively carrying out character texture information modeling on the cutting characteristic diagram in two orthogonal directions to obtain a contour point thermodynamic diagram corresponding to each orthogonal direction; and screening the contour points in the contour point thermodynamic diagram to obtain a contour point set so as to reconstruct characters in the input image. The adaptive region suggestion network can adapt to the scale change of characters to generate suggestion boxes corresponding to character regions, and the character texture information modeling module carries out character texture information modeling in the orthogonal direction to inhibit false positive contour points, so that the precision of character detection of scenes in any shapes is improved.

Description

Character detection method and system
Technical Field
The present disclosure relates to the field of text recognition technologies, and in particular, to a text detection method and system.
Background
The natural scene character detection means that a character area is detected in a complex background, and the character area is identified by using a surrounding frame. The result of natural scene character detection is widely applied in the fields of automatic driving, robots and the like. The character detection in the natural scene faces the difficulties of low resolution, complex background, variable font size and the like, so that the practical application effect of the traditional character detection technology is poor.
With the development of the deep learning technology, the natural scene character detection technology based on deep learning is remarkably improved, and although the detection technology can detect characters in any shapes, the detection result contains more false positive detections and is influenced by the problem of character size diversity, and the detection precision of the detection technology needs to be improved.
Disclosure of Invention
Technical problem to be solved
In view of this, the present disclosure provides a text detection method and system capable of improving text detection accuracy of scenes with arbitrary shapes.
(II) technical scheme
The present disclosure provides a text detection method, including: performing feature extraction on an input image to obtain a feature image; predicting by using a self-adaptive regional suggestion network to obtain a suggestion frame; cutting the characteristic image by using the suggestion frame to obtain a cutting characteristic diagram; respectively carrying out character texture information modeling on the cutting characteristic diagram in two orthogonal directions to obtain a contour point thermodynamic diagram corresponding to each orthogonal direction; and screening the contour points in the contour point thermodynamic diagram to obtain a contour point set so as to reconstruct the characters in the input image.
Optionally, the predicting by using the adaptive regional recommendation network to obtain a recommendation box includes: local bias prediction is carried out on the points of the preset anchor frame by utilizing the self-adaptive regional suggestion network to obtain corresponding predicted points; and determining the suggestion frame according to the predicted point.
Optionally, the two orthogonal directions are a horizontal direction and a vertical direction, and performing text texture information modeling on the clipping feature map in the two orthogonal directions respectively includes: according to the first convolution kernel, a first character texture information model of the cutting feature map in the horizontal direction is established; and establishing a second character texture information model of the cutting feature map in the vertical direction according to a second convolution kernel.
Optionally, the size of the first convolution kernel is 1 × k, the size of the second convolution kernel is k × 1, k is not greater than the size of the cropping feature map, and k is 3 in the present disclosure.
Optionally, the method further comprises: according to the cutting feature diagram, the suggestion frame is adjusted by using a fine adjustment network to obtain an adjusted suggestion frame; utilizing the adjusted suggestion frame to cut the feature image to obtain an adjusted cutting feature image; and performing upsampling on the adjusted cutting feature map to obtain an upsampling feature map.
Optionally, the performing text texture information modeling on the clipping feature map in two orthogonal directions respectively includes: and respectively carrying out character texture information modeling on the up-sampling characteristic diagram in two orthogonal directions.
Optionally, the performing text texture information modeling on the clipping feature map in two orthogonal directions respectively includes:
respectively utilizing the character texture information perception networks in the two orthogonal directions to carry out character texture information modeling on the cutting characteristic diagram;
before feature extraction on the input image, the method further comprises:
training the adaptive area suggestion network, the character texture information perception network and the fine tuning network according to a loss function by using a random gradient descent method, wherein the loss function is as follows:
L=LArpnHcpLHcpVcpLVcpboxclassLboxclassboxregLboxreg
wherein L is the loss function, LArpnProposing a loss function of the network for said adaptive area, LHcpIs a loss function of the literal texture information perception network in an orthogonal direction, LVcpFor a loss function of the literal-texture-information-aware network in the other orthogonal direction, Lboxclass、LboxregAs a loss function of said fine tuning network, λHcpFor the balanced parameter, lambda, of the text texture information perception network in said one orthogonal directionVcpFor said other orthogonal direction a balance parameter, λ, of the text-to-texture information aware networkboxclass、λboxregBalancing parameters for the fine tuning network.
Optionally, the screening the contour point thermodynamic diagrams to obtain a contour point set includes: filtering background pixel points in the contour point thermodynamic diagram by using a non-maximum value inhibition method; and screening the contour point thermodynamic diagrams according to a preset threshold value to obtain the contour point set.
Optionally, the screening the contour point thermodynamic diagrams according to a preset threshold to obtain the contour point set includes: and screening out pixel points of which the response values in the contour point thermodynamic diagrams corresponding to the two orthogonal directions are both larger than the preset threshold value to form the contour point set.
Another aspect of the present disclosure provides a text detection system, including: the extraction module is used for extracting the characteristics of the input image to obtain a characteristic image; the prediction module is used for predicting by utilizing the self-adaptive regional suggestion network to obtain a suggestion frame; the cutting module is used for cutting the characteristic image by using the suggestion frame to obtain a cutting characteristic diagram; the modeling module is used for respectively carrying out character texture information modeling on the cutting characteristic diagram in two orthogonal directions to obtain a contour point thermodynamic diagram corresponding to each orthogonal direction; and the screening module is used for screening the contour points in the contour point thermodynamic diagram to obtain a contour point set so as to reconstruct the characters in the input image.
(III) advantageous effects
According to the character detection method and system, the adaptive regional suggestion network is designed, the scale change of characters can be better adapted, the character texture information modeling is carried out in the orthogonal direction, and the false positive contour points can be restrained, so that the problems of the scale change of the characters and the prediction of the false positive are effectively solved, and the precision of character detection of scenes in any shapes is improved.
Drawings
Fig. 1 schematically illustrates a flowchart of a text detection method provided by an embodiment of the present disclosure;
FIG. 2 is a schematic diagram illustrating a predicted crop box in a text detection method according to an embodiment of the disclosure;
FIG. 3 is a schematic diagram schematically illustrating modeling of text texture information in a text detection method provided by an embodiment of the present disclosure;
fig. 4 schematically shows a block diagram of a text detection system provided in an embodiment of the present disclosure;
fig. 5 schematically illustrates a schematic diagram of a fine tuning network provided by an embodiment of the present disclosure.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
Fig. 1 schematically shows a flowchart of a text detection method provided in an embodiment of the present disclosure.
Referring to fig. 1, the method shown in fig. 1 will be described in detail with reference to fig. 2 to 3. As shown in fig. 1, the text detection method includes operations S110 to S150.
In operation S110, feature extraction is performed on the input image to obtain a feature image.
In this embodiment, a Deep Neural Network (DNN) is used for text detection, where the Deep Neural network includes a ResNet50 feature extraction network, an adaptive region suggestion network, a fine-tuning network, a text texture information perception network in the horizontal direction, a text texture information perception network in the vertical direction, and the like.
The deep neural network should be trained prior to operation S110. Specifically, for example, end-to-end training is performed by using a Stochastic Gradient Descent (SGD) method, where a loss function L of the deep neural network as a whole is:
L=LArpnHcpLHcpVcpLVcpboxclassLboxclassboxregLboxreg
wherein L isArpnProposing a loss function of the network for the adaptive region, LHcpIs a loss function of the literal texture information aware network in an orthogonal direction (e.g., horizontal direction), LVcpFor a loss function of the literal-texture-information-aware network in another orthogonal direction (e.g., vertical direction), Lboxclass、LboxregFor fine-tuning the loss function of the network, λHcpIs a balance parameter, lambda, of a literal texture information aware network in an orthogonal directionVcpFor a balance parameter, λ, of the text-texture-information-aware network in the other orthogonal directionboxclass、λboxregTo fine tune the balance parameters of the network.
Further, the adaptive area suggests a loss function L of the networkArpnComprises the following steps:
LArpn=LArpnclass+LArpnreg
Figure BDA0002355449200000041
Figure BDA0002355449200000042
wherein L isArpnclassAs a function of classification loss, LArpnregAs a function of the regression loss, piProbability of the anchor frame being a target frame (i.e., a suggestion frame), L, for a presetclsAs a cross-entropy loss function, NposThe number of positive anchor frames is shown, the Intersection is the Intersection of the anchor frame and the target frame, the Union is the Union of the anchor frame and the target frame, when the Intersection ratio of the anchor frame and the target frame is more than 0.5,
Figure BDA0002355449200000043
1, when the intersection ratio of the anchor frame and the target frame is not more than 0.5,
Figure BDA0002355449200000051
is 0.
Loss function L of text texture information perception network in horizontal directionHcpAnd a loss function L of the literal texture information aware network in the vertical directionVcpComprises the following steps:
Figure BDA0002355449200000052
wherein, yiLabels for contour points, qiAs a prediction of contour points, NnegTo predict the number of background pixels, NposThe number of predicted contour points is determined.
Loss function L of fine tuning networkboxclass、LboxregComprises the following steps:
Figure BDA0002355449200000053
Figure BDA0002355449200000054
wherein p isi1For the probability that the anchor frame is the target frame in the box branch, LclsAs a cross-entropy loss function, Npos1When the intersection ratio of the anchor frame and the target frame in the box branch is more than 0.5 for the box branch and the label to correctly match the number of the prediction frames,
Figure BDA0002355449200000055
1, when the intersection ratio of the anchor frame and the target frame in the box branch is not more than 0.5,
Figure BDA0002355449200000056
is 0, NregFor the number of frames in the box branch that need to be trimmed, tiIn order to predict the parameters of the box,
Figure BDA0002355449200000057
as parameters of the tag frame, Smoothl1Is Smoothl1A function.
In the deep neural network training process, the initial learning rate is selected to be 0.0025, when the training times reach 120000-160000 times, the learning rate is reduced to 0.1 time, for example, 180000 times of training in this embodiment, and at this time, the overall loss function L of the deep neural network meets the requirement, and then the trained deep neural network can be used for character detection.
According to the embodiment of the disclosure, feature extraction is performed on an input image by using a ResNet50 feature extraction network, so that a feature image is obtained.
Operation S120, a prediction is performed by using the adaptive regional recommendation network, and a recommendation box is obtained.
According to an embodiment of the present disclosure, operation S120 includes sub-operation S120A and sub-operation S120B.
A sub-operation S120A, which is to perform local bias prediction on the point of the preset anchor frame by using the adaptive regional suggestion network, to obtain a corresponding predicted point. Specifically, the obtained predicted points are:
Figure BDA0002355449200000058
wherein n is the number of the middle points of the preset anchor frame, xl' is the abscissa, y, of the first predicted pointl' is the ordinate, x, of the first predicted pointlFor presetting the abscissa, y, of the first point in the anchor framelFor presetting the ordinate, omega, of the first point in the anchor framecFor presetting the length of anchor frame, hcFor presetting the width of the anchor frame, Δ xlSuggesting the abscissa offset, Deltay, of the ith point in the preset anchor frame of the network output for the adaptive regionlAnd the ordinate offset of the ith point in a preset anchor frame output by the network is suggested for the adaptive area.
Referring to fig. 2, the number n of preset anchor frame midpoints is set to 9, which represents a center point and eight boundary points (including an upper left point, an upper middle point, an upper right point, a middle right point, a lower middle point, a lower left point, and a middle left point).
A sub-operation S120B determines a suggestion box based on the predicted point. Specifically, the predicted points corresponding to the four most significant coordinates (including the minimum abscissa, the minimum ordinate, the maximum abscissa, and the maximum ordinate) are obtained through maximum and minimum value screening to determine the suggestion box, as shown in fig. 2. The proposed box (proposal) position is represented by these four most valued coordinates:
Figure BDA0002355449200000061
in this embodiment, the number of the obtained suggestion boxes is one or more. And a plurality of suggestion boxes are obtained through prediction, so that the character detection precision can be further improved.
In operation S130, the feature image is clipped by using the suggestion box, so as to obtain a clipped feature map.
In this embodiment, when the number of the suggestion frames is multiple, each suggestion frame is used to crop the feature image to obtain multiple cropping feature maps, and the multiple cropping feature maps are normalized to obtain multiple cropping feature maps with the same size.
According to an embodiment of the present disclosure, after operation S130, the text detection method further includes: according to the cutting feature diagram, the suggestion frame is adjusted by using a fine adjustment network to obtain an adjusted suggestion frame; and utilizing the adjusted suggestion frame to cut the feature image to obtain an adjusted cutting feature image.
Referring to fig. 5, the trimming network is used to calculate the trimming feature map, and output the adjustment parameter of the adjustment suggestion frame, and the adjustment parameter is used to adjust the suggestion frame, where the adjusted suggestion frame is:
Figure BDA0002355449200000062
wherein x is the horizontal coordinate of the center point of the adjusted suggestion frame, y is the vertical coordinate of the center point of the adjusted suggestion frame, w is the width of the adjusted suggestion frame, h is the height of the adjusted suggestion frame, and xcFor the center point abscissa, y, of the suggestion box before adjustmentcBefore being adjustedCenter point ordinate, w, of the suggestion boxcTo adjust the width of the proposed box before, hcFor the height of the suggestion box before adjustment, xc、yc、wc、hcCan be calculated according to the most-valued coordinate of the suggestion box (proposal), t1、t2、t3、t4To fine tune the tuning parameters of the network output.
Further, the text detection method further comprises the following steps: and performing upsampling on the adjusted cutting feature map to obtain an upsampling feature map. The size of the up-sampling feature map is larger than the feature size of the adjusted cutting feature map.
And operation S140, performing text texture information modeling on the cropping feature map in two orthogonal directions, respectively, to obtain a contour point thermodynamic diagram corresponding to each orthogonal direction.
Specifically, character texture information modeling is performed on the adjusted up-sampling feature map in two orthogonal directions respectively, so that a contour point thermodynamic diagram corresponding to each orthogonal direction is obtained.
Referring to fig. 3, the two orthogonal directions are a horizontal direction and a vertical direction, and the operation S140 includes a sub-operation S140A and a sub-operation S140B.
In sub-operation S140A, a first text texture information model of the cropped feature map in the horizontal direction is created according to the first convolution kernel. Specifically, according to the first convolution kernel, a first character texture information model of the adjusted up-sampling feature map in the horizontal direction is established in a sliding mode. The first convolution kernel has a size of 1 xk, k being greater than 0 and not greater than the size of the cropped feature map, k being, for example, 3.
Suboperation S140B, according to the second convolution kernel, builds a second text texture information model of the cropped feature map in the vertical direction. Specifically, according to the second convolution kernel, a second character texture information model of the adjusted up-sampling feature map in the vertical direction is established in a sliding mode. The size of the second convolution kernel is k × 1.
Further, the first character texture information model and the second character texture information model are normalized by a Sigmoid function to obtain a contour point thermodynamic diagram Hmap in the horizontal direction and a contour point thermodynamic diagram Vmap in the vertical direction.
And operation S150, screening the contour points in the contour point thermodynamic diagram to obtain a set of contour points, so as to reconstruct the characters in the input image.
In this embodiment, the contour point regressing algorithm is used to screen the contour point thermodynamic diagrams to obtain pixel points having high response values in the two contour point thermodynamic diagrams at the same time, so as to form a contour point set.
According to an embodiment of the present disclosure, operation S150 includes sub-operation S150A and sub-operation S150B.
In sub-operation S150A, a non-maximum suppression method is used to filter background pixels in the contour thermodynamic diagram. Specifically, for example, a 1 × 3 sliding window is used to process the horizontal contour point thermodynamic diagram, a 3 × 1 sliding window is used to process the vertical contour point thermodynamic diagram, and the largest pixel point in the current window is output, and the remaining pixel points are suppressed.
And a sub-operation S150B, which is to filter the contour point thermodynamic diagrams according to a preset threshold to obtain a contour point set. Specifically, traversing each pixel point position in the contour point thermodynamic diagram after the non-maximum value is suppressed, and screening out pixel points of which the response values in the contour point thermodynamic diagrams corresponding to the horizontal direction and the vertical direction are both greater than a preset threshold value to form a contour point set. The preset threshold value is, for example, 0.5.
Furthermore, a character area in the input image is reconstructed according to the screened contour point set, so that characters in the input image are detected.
In the embodiment of the disclosure, the character detection method is used for detecting a large number of characters in scenes with any shapes, and detection results show that the character detection method has very good detection performance. For example, the recall rate, accuracy and F value of the text detection method on an ICDAR2015 data set are respectively 86.1%, 87.6% and 86.9%, and the FPS is 3.5; the recall rate, the accuracy rate and the F value on the Total-Text data set are respectively 83.9 percent, 86.9 percent and 85.4 percent, and the FPS is 3.8 percent; recall, accuracy, F-number on CTW1500 dataset were 84.1%, 83.7%, 83.9%, respectively, FPS was 4.5.
Fig. 4 schematically shows a block diagram of a text detection system provided in an embodiment of the present disclosure.
The embodiment of the disclosure also provides a character detection system. The word detection system 400 includes an extraction module 410, a prediction module 420, a cropping module 430, a modeling module 440, and a screening module 450.
The extraction module 410 may perform operation S110, for example, to perform feature extraction on the input image to obtain a feature image.
The prediction module 420 may perform operation S120, for example, for predicting with the adaptive regional suggestion network, resulting in a suggestion box.
The cropping module 430 may perform operation S130, for example, to crop the feature image using the suggestion box, resulting in a cropped feature map.
The modeling module 440 may perform operation S140, for example, to perform text texture information modeling on the cropping feature map in two orthogonal directions, respectively, to obtain a contour point thermodynamic diagram corresponding to each orthogonal direction.
The filtering module 450 may perform operation S150, for example, to filter the contour points in the contour point thermodynamic diagram to obtain a set of contour points, so as to reconstruct the text in the input image.
In the embodiment, please refer to the text detection method in the embodiment shown in fig. 1-3.
To sum up, the text detection method and system in the embodiments of the present disclosure perform feature extraction on an input image to obtain a feature image, predict the feature image by using an adaptive regional suggestion network to obtain a suggestion frame, crop the feature image by using the suggestion frame to obtain a cropping feature map, adjust the suggestion frame by a fine tuning network according to the cropping feature map, crop the feature image according to the adjusted suggestion frame to obtain an adjusted cropping feature map, perform text texture information modeling on the adjusted cropping feature map in two orthogonal directions respectively to obtain a contour point thermodynamic diagram corresponding to each orthogonal direction, screen contour points in the contour point thermodynamic diagrams to obtain a contour point set to reconstruct text in the input image, design the adaptive regional suggestion network to better adapt to scale changes of the text, perform text texture information modeling in the orthogonal directions, the method can inhibit false positive contour points, thereby effectively solving the problems of character scale change and false positive prediction and improving the precision of character detection in scenes with any shapes.
The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (10)

1. A text detection method, comprising:
performing feature extraction on an input image to obtain a feature image;
predicting by using a self-adaptive regional suggestion network to obtain a suggestion frame;
cutting the characteristic image by using the suggestion frame to obtain a cutting characteristic diagram;
respectively carrying out character texture information modeling on the cutting characteristic diagram in two orthogonal directions to obtain a contour point thermodynamic diagram corresponding to each orthogonal direction;
and screening the contour points in the contour point thermodynamic diagram to obtain a contour point set so as to reconstruct the characters in the input image.
2. The method of claim 1, wherein the predicting with the adaptive regional recommendation network resulting in a recommendation box comprises:
local bias prediction is carried out on the points of the preset anchor frame by utilizing the self-adaptive regional suggestion network to obtain corresponding predicted points;
and determining the suggestion frame according to the predicted point.
3. The method of claim 1, wherein the two orthogonal directions are a horizontal direction and a vertical direction, and wherein the text texture information modeling the cropped feature map in the two orthogonal directions, respectively, comprises:
according to the first convolution kernel, a first character texture information model of the cutting feature map in the horizontal direction is established;
and establishing a second character texture information model of the cutting feature map in the vertical direction according to a second convolution kernel.
4. The method of claim 3, wherein the first convolution kernel is 1 x k in size and the second convolution kernel is k x 1 in size, k being no greater than the size of the cropped feature map.
5. The method of claim 1, wherein the method further comprises:
according to the cutting feature diagram, the suggestion frame is adjusted by using a fine adjustment network to obtain an adjusted suggestion frame;
utilizing the adjusted suggestion frame to cut the feature image to obtain an adjusted cutting feature image;
and performing upsampling on the adjusted cutting feature map to obtain an upsampling feature map.
6. The method of claim 5, wherein said separately modeling the cropped feature map texturally in two orthogonal directions comprises:
and respectively carrying out character texture information modeling on the up-sampling characteristic diagram in two orthogonal directions.
7. The method of claim 5, wherein said separately modeling the cropped feature map texturally in two orthogonal directions comprises:
respectively utilizing the character texture information perception networks in the two orthogonal directions to carry out character texture information modeling on the cutting characteristic diagram;
before feature extraction on the input image, the method further comprises:
training the adaptive area suggestion network, the character texture information perception network and the fine tuning network according to a loss function by using a random gradient descent method, wherein the loss function is as follows:
L=LArpnHcpLHcpVcpLVcpboxclassLboxclassboxregLboxreg
wherein L is the loss function, LArpnProposing a loss function of the network for said adaptive area, LHcpIs a loss function of the literal texture information perception network in an orthogonal direction, LVcpFor a loss function of the literal-texture-information-aware network in the other orthogonal direction, Lboxclass、LboxregAs a loss function of said fine tuning network, λHcpFor the balanced parameter, lambda, of the text texture information perception network in said one orthogonal directionVcpFor said other orthogonal direction a balance parameter, λ, of the text-to-texture information aware networkboxclass、λboxregBalancing parameters for the fine tuning network.
8. The method of claim 1, wherein the screening the contour point thermodynamic diagram for a set of contour points comprises:
filtering background pixel points in the contour point thermodynamic diagram by using a non-maximum value inhibition method;
and screening the contour point thermodynamic diagrams according to a preset threshold value to obtain the contour point set.
9. The method of claim 8, wherein the filtering the contour point thermodynamic diagram according to a preset threshold to obtain the set of contour points comprises:
and screening out pixel points of which the response values in the contour point thermodynamic diagrams corresponding to the two orthogonal directions are both larger than the preset threshold value to form the contour point set.
10. A text detection system, comprising:
the extraction module is used for extracting the characteristics of the input image to obtain a characteristic image;
the prediction module is used for predicting by utilizing the self-adaptive regional suggestion network to obtain a suggestion frame;
the cutting module is used for cutting the characteristic image by using the suggestion frame to obtain a cutting characteristic diagram;
the modeling module is used for respectively carrying out character texture information modeling on the cutting characteristic diagram in two orthogonal directions to obtain a contour point thermodynamic diagram corresponding to each orthogonal direction;
and the screening module is used for screening the contour points in the contour point thermodynamic diagram to obtain a contour point set so as to reconstruct the characters in the input image.
CN202010008296.7A 2020-01-03 2020-01-03 Character detection method and system Active CN111242120B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010008296.7A CN111242120B (en) 2020-01-03 2020-01-03 Character detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010008296.7A CN111242120B (en) 2020-01-03 2020-01-03 Character detection method and system

Publications (2)

Publication Number Publication Date
CN111242120A true CN111242120A (en) 2020-06-05
CN111242120B CN111242120B (en) 2022-07-29

Family

ID=70868604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010008296.7A Active CN111242120B (en) 2020-01-03 2020-01-03 Character detection method and system

Country Status (1)

Country Link
CN (1) CN111242120B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783801A (en) * 2020-07-17 2020-10-16 上海明波通信技术股份有限公司 Object contour extraction method and system and object contour prediction method and system
CN111914843A (en) * 2020-08-20 2020-11-10 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Character detection method, system, equipment and storage medium
CN111783801B (en) * 2020-07-17 2024-04-23 上海明波通信技术股份有限公司 Object contour extraction method and system and object contour prediction method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574513A (en) * 2015-12-22 2016-05-11 北京旷视科技有限公司 Character detection method and device
US20180349722A1 (en) * 2017-06-05 2018-12-06 Intuit Inc. Detecting font size in a digital image
CN109447078A (en) * 2018-10-23 2019-03-08 四川大学 A kind of detection recognition method of natural scene image sensitivity text
CN109670494A (en) * 2018-12-13 2019-04-23 深源恒际科技有限公司 A kind of Method for text detection and system of subsidiary recognition confidence
CN109886077A (en) * 2018-12-28 2019-06-14 北京旷视科技有限公司 Image-recognizing method, device, computer equipment and storage medium
CN110059685A (en) * 2019-04-26 2019-07-26 腾讯科技(深圳)有限公司 Word area detection method, apparatus and storage medium
CN110263877A (en) * 2019-06-27 2019-09-20 中国科学技术大学 Scene character detecting method
CN110363252A (en) * 2019-07-24 2019-10-22 山东大学 It is intended to scene text detection end to end and recognition methods and system
CN110598698A (en) * 2019-08-29 2019-12-20 华中科技大学 Natural scene text detection method and system based on adaptive regional suggestion network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574513A (en) * 2015-12-22 2016-05-11 北京旷视科技有限公司 Character detection method and device
US20180349722A1 (en) * 2017-06-05 2018-12-06 Intuit Inc. Detecting font size in a digital image
CN109447078A (en) * 2018-10-23 2019-03-08 四川大学 A kind of detection recognition method of natural scene image sensitivity text
CN109670494A (en) * 2018-12-13 2019-04-23 深源恒际科技有限公司 A kind of Method for text detection and system of subsidiary recognition confidence
CN109886077A (en) * 2018-12-28 2019-06-14 北京旷视科技有限公司 Image-recognizing method, device, computer equipment and storage medium
CN110059685A (en) * 2019-04-26 2019-07-26 腾讯科技(深圳)有限公司 Word area detection method, apparatus and storage medium
CN110263877A (en) * 2019-06-27 2019-09-20 中国科学技术大学 Scene character detecting method
CN110363252A (en) * 2019-07-24 2019-10-22 山东大学 It is intended to scene text detection end to end and recognition methods and system
CN110598698A (en) * 2019-08-29 2019-12-20 华中科技大学 Natural scene text detection method and system based on adaptive regional suggestion network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHUHUI XUE 等: "MSR: Multi-Scale Shape Regression for Scene Text Detection", 《ARXIV:1901.02596V1》 *
ZHUOTAO TIAN 等: "Learning Shape-Aware Embedding for Scene Text Detection", 《2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
李飞: "基于移动终端的图像文字识别系统的研究及实现", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *
郝学智: "基于机器视觉的复杂背景下的数字字符识别", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783801A (en) * 2020-07-17 2020-10-16 上海明波通信技术股份有限公司 Object contour extraction method and system and object contour prediction method and system
CN111783801B (en) * 2020-07-17 2024-04-23 上海明波通信技术股份有限公司 Object contour extraction method and system and object contour prediction method and system
CN111914843A (en) * 2020-08-20 2020-11-10 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Character detection method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN111242120B (en) 2022-07-29

Similar Documents

Publication Publication Date Title
CN109711316B (en) Pedestrian re-identification method, device, equipment and storage medium
CN108509978B (en) Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion
US10846524B2 (en) Table layout determination using a machine learning system
CN109902600B (en) Road area detection method
CN112232349A (en) Model training method, image segmentation method and device
CN110135446B (en) Text detection method and computer storage medium
CN109191498B (en) Target detection method and system based on dynamic memory and motion perception
CN114627052A (en) Infrared image air leakage and liquid leakage detection method and system based on deep learning
CN111832453B (en) Unmanned scene real-time semantic segmentation method based on two-way deep neural network
CN113361432B (en) Video character end-to-end detection and identification method based on deep learning
CN111723841A (en) Text detection method and device, electronic equipment and storage medium
CN110633633B (en) Remote sensing image road extraction method based on self-adaptive threshold
CN111209858A (en) Real-time license plate detection method based on deep convolutional neural network
CN111079864A (en) Short video classification method and system based on optimized video key frame extraction
CN114241277A (en) Attention-guided multi-feature fusion disguised target detection method, device, equipment and medium
CN115359366A (en) Remote sensing image target detection method based on parameter optimization
CN116645592A (en) Crack detection method based on image processing and storage medium
CN110751157B (en) Image significance segmentation and image significance model training method and device
CN113591831A (en) Font identification method and system based on deep learning and storage medium
CN115641632A (en) Face counterfeiting detection method based on separation three-dimensional convolution neural network
CN115810149A (en) High-resolution remote sensing image building extraction method based on superpixel and image convolution
CN111242120B (en) Character detection method and system
CN114998373A (en) Improved U-Net cloud picture segmentation method based on multi-scale loss function
CN108710881B (en) Neural network model, candidate target area generation method and model training method
CN111931572B (en) Target detection method for remote sensing image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant