CN111242120B - Character detection method and system - Google Patents

Character detection method and system Download PDF

Info

Publication number
CN111242120B
CN111242120B CN202010008296.7A CN202010008296A CN111242120B CN 111242120 B CN111242120 B CN 111242120B CN 202010008296 A CN202010008296 A CN 202010008296A CN 111242120 B CN111242120 B CN 111242120B
Authority
CN
China
Prior art keywords
texture information
suggestion
network
cutting
orthogonal directions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010008296.7A
Other languages
Chinese (zh)
Other versions
CN111242120A (en
Inventor
张勇东
王裕鑫
谢洪涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Research Institute
University of Science and Technology of China USTC
Original Assignee
Beijing Zhongke Research Institute
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Research Institute, University of Science and Technology of China USTC filed Critical Beijing Zhongke Research Institute
Priority to CN202010008296.7A priority Critical patent/CN111242120B/en
Publication of CN111242120A publication Critical patent/CN111242120A/en
Application granted granted Critical
Publication of CN111242120B publication Critical patent/CN111242120B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2193Validation; Performance evaluation; Active pattern learning techniques based on specific statistical tests
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

A character detection method and system, the method includes: performing feature extraction on an input image to obtain a feature image; predicting by using a self-adaptive regional suggestion network to obtain a suggestion frame; utilizing the suggestion frame to cut the feature image to obtain a cut feature image; respectively carrying out character texture information modeling on the cutting characteristic diagram in two orthogonal directions to obtain a contour point thermodynamic diagram corresponding to each orthogonal direction; and screening the contour points in the contour point thermodynamic diagram to obtain a contour point set so as to reconstruct characters in the input image. The adaptive region suggestion network can adapt to the scale change of characters to generate suggestion boxes corresponding to character regions, and the character texture information modeling module carries out character texture information modeling in the orthogonal direction to inhibit false positive contour points, so that the precision of character detection of scenes in any shapes is improved.

Description

Character detection method and system
Technical Field
The present disclosure relates to the field of text recognition technologies, and in particular, to a text detection method and system.
Background
The natural scene character detection means that a character area is detected in a complex background, and the character area is identified by using a surrounding frame. The result of natural scene character detection is widely applied in the fields of automatic driving, robots and the like. The character detection in the natural scene faces the difficulties of low resolution, complex background, variable font size and the like, so that the practical application effect of the traditional character detection technology is poor.
With the development of the deep learning technology, the natural scene character detection technology based on deep learning is remarkably improved, and although the detection technology can detect characters in any shapes, the detection result contains more false positive detections and is influenced by the problem of character size diversity, and the detection precision of the detection technology needs to be improved.
Disclosure of Invention
Technical problem to be solved
In view of this, the present disclosure provides a text detection method and system capable of improving text detection accuracy of scenes with arbitrary shapes.
(II) technical scheme
The present disclosure provides a text detection method, including: performing feature extraction on an input image to obtain a feature image; predicting by using a self-adaptive regional suggestion network to obtain a suggestion frame; cutting the characteristic image by using the suggestion frame to obtain a cutting characteristic diagram; respectively carrying out character texture information modeling on the cutting characteristic diagram in two orthogonal directions to obtain a contour point thermodynamic diagram corresponding to each orthogonal direction; and screening the contour points in the contour point thermodynamic diagram to obtain a contour point set so as to reconstruct the characters in the input image.
Optionally, the predicting by using the adaptive regional recommendation network to obtain a recommendation box includes: local bias prediction is carried out on the points of the preset anchor frame by utilizing the self-adaptive regional suggestion network to obtain corresponding predicted points; and determining the suggestion frame according to the predicted point.
Optionally, the two orthogonal directions are a horizontal direction and a vertical direction, and performing text texture information modeling on the clipping feature map in the two orthogonal directions respectively includes: according to the first convolution kernel, a first character texture information model of the cutting feature map in the horizontal direction is established; and establishing a second character texture information model of the cutting feature map in the vertical direction according to a second convolution kernel.
Optionally, the size of the first convolution kernel is 1 × k, the size of the second convolution kernel is k × 1, k is not greater than the size of the cropping feature map, and k is 3 in the present disclosure.
Optionally, the method further comprises: according to the cutting feature diagram, the suggestion frame is adjusted by using a fine adjustment network to obtain an adjusted suggestion frame; utilizing the adjusted suggestion frame to cut the feature image to obtain an adjusted cutting feature image; and performing upsampling on the adjusted cutting feature map to obtain an upsampling feature map.
Optionally, the performing text texture information modeling on the clipping feature map in two orthogonal directions respectively includes: and respectively carrying out character texture information modeling on the up-sampling characteristic diagram in two orthogonal directions.
Optionally, the performing text texture information modeling on the clipping feature map in two orthogonal directions respectively includes:
respectively utilizing the character texture information perception networks in the two orthogonal directions to carry out character texture information modeling on the cutting characteristic diagram;
before feature extraction on the input image, the method further comprises:
training the adaptive area suggestion network, the character texture information perception network and the fine tuning network according to a loss function by using a random gradient descent method, wherein the loss function is as follows:
L=L ArpnHcp L HcpVcp L Vcpboxclass L boxclassboxreg L boxreg
wherein L is the loss function, L Arpn Proposing a loss function of the network for said adaptive area, L Hcp Is a loss function of the literal texture information perception network in an orthogonal direction, L Vcp For a loss function of the literal-texture-information-aware network in the other orthogonal direction, L boxclass 、L boxreg As a loss function of said fine tuning network, λ Hcp For the balanced parameter, lambda, of the text texture information perception network in said one orthogonal direction Vcp For said other orthogonal direction a balance parameter, λ, of the text-to-texture information aware network boxclass 、λ boxreg Balancing parameters for the fine tuning network.
Optionally, the screening the contour point thermodynamic diagrams to obtain a contour point set includes: filtering background pixel points in the contour point thermodynamic diagram by using a non-maximum value inhibition method; and screening the contour point thermodynamic diagrams according to a preset threshold value to obtain the contour point set.
Optionally, the screening the contour point thermodynamic diagram according to a preset threshold to obtain the contour point set includes: and screening out pixel points of which the response values in the contour point thermodynamic diagrams corresponding to the two orthogonal directions are both larger than the preset threshold value to form the contour point set.
Another aspect of the present disclosure provides a text detection system, including: the extraction module is used for extracting the characteristics of the input image to obtain a characteristic image; the prediction module is used for predicting by utilizing the self-adaptive regional suggestion network to obtain a suggestion frame; the cutting module is used for cutting the characteristic image by using the suggestion frame to obtain a cutting characteristic diagram; the modeling module is used for respectively carrying out character texture information modeling on the cutting characteristic diagram in two orthogonal directions to obtain a contour point thermodynamic diagram corresponding to each orthogonal direction; and the screening module is used for screening the contour points in the contour point thermodynamic diagram to obtain a contour point set so as to reconstruct the characters in the input image.
(III) advantageous effects
According to the character detection method and system, the adaptive regional suggestion network is designed, the scale change of characters can be better adapted, the character texture information modeling is carried out in the orthogonal direction, and the false positive contour points can be restrained, so that the problems of the scale change of the characters and the prediction of the false positive are effectively solved, and the precision of character detection of scenes in any shapes is improved.
Drawings
Fig. 1 schematically illustrates a flowchart of a text detection method provided in an embodiment of the present disclosure;
FIG. 2 is a schematic diagram illustrating a predicted crop box in a text detection method according to an embodiment of the disclosure;
FIG. 3 is a schematic diagram schematically illustrating modeling of text texture information in a text detection method provided by an embodiment of the present disclosure;
fig. 4 schematically shows a block diagram of a text detection system provided in an embodiment of the present disclosure;
fig. 5 schematically illustrates a schematic diagram of a fine tuning network provided by an embodiment of the present disclosure.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
Fig. 1 schematically shows a flowchart of a text detection method provided in an embodiment of the present disclosure.
Referring to fig. 1, the method shown in fig. 1 will be described in detail with reference to fig. 2 to 3. As shown in fig. 1, the text detection method includes operations S110 to S150.
In operation S110, feature extraction is performed on the input image to obtain a feature image.
In this embodiment, a Deep Neural Network (DNN) is used for text detection, where the Deep Neural network includes a ResNet50 feature extraction network, an adaptive region suggestion network, a fine-tuning network, a text texture information perception network in the horizontal direction, a text texture information perception network in the vertical direction, and the like.
The deep neural network should be trained prior to operation S110. Specifically, for example, end-to-end training is performed by using a Stochastic Gradient Descent (SGD) method, where a loss function L of the deep neural network as a whole is:
L=L ArpnHcp L HcpVcp L Vcpboxclass L boxclassboxreg L boxreg
wherein L is Arpn Proposing a loss function of the network for the adaptive region, L Hcp Is a loss function of the literal texture information aware network in an orthogonal direction (e.g., horizontal direction), L Vcp For a loss function of the literal-texture-information-aware network in another orthogonal direction (e.g., vertical direction), L boxclass 、L boxreg For fine-tuning the loss function of the network, λ Hcp Is a balance parameter, lambda, of a literal texture information aware network in an orthogonal direction Vcp For a balance parameter, λ, of the text-texture-information-aware network in the other orthogonal direction boxclass 、λ boxreg To fine tune the balance parameters of the network.
Further, the adaptive area suggests a loss function L of the network Arpn Comprises the following steps:
L Arpn =L Arpnclass +L Arpnreg
Figure BDA0002355449200000041
Figure BDA0002355449200000042
wherein L is Arpnclass As a function of classification loss, L Arpnreg As a function of the regression loss, p i Probability of the anchor frame being a target frame (i.e., a suggestion frame), L, for a preset cls As a cross-entropy loss function, N pos The number of positive anchor frames, the Intersection is the Intersection of the anchor frame and the target frame, the Union is the Union of the anchor frame and the target frame, and the Intersection ratio of the anchor frame and the target frame When the concentration of the carbon dioxide is more than 0.5,
Figure BDA0002355449200000043
1, when the intersection ratio of the anchor frame and the target frame is not more than 0.5,
Figure BDA0002355449200000051
is 0.
Loss function L of text texture information perception network in horizontal direction Hcp And a loss function L of the literal texture information aware network in the vertical direction Vcp Comprises the following steps:
Figure BDA0002355449200000052
wherein, y i Labels for contour points, q i As a prediction of contour points, N neg To predict the number of background pixels, N pos The number of predicted contour points is determined.
Loss function L of fine tuning network boxclass 、L boxreg Comprises the following steps:
Figure BDA0002355449200000053
Figure BDA0002355449200000054
wherein p is i1 For the probability that the anchor frame is the target frame in the box branch, L cls As a cross-entropy loss function, N pos1 When the intersection ratio of the anchor frame and the target frame in the box branch is more than 0.5 for the box branch and the label to correctly match the number of the prediction frames,
Figure BDA0002355449200000055
1, when the intersection ratio of the anchor frame and the target frame in the box branch is not more than 0.5,
Figure BDA0002355449200000056
is 0, N reg For the number of frames in the box branch that need to be trimmed, t i In order to predict the parameters of the box,
Figure BDA0002355449200000057
as parameters of the tag frame, Smooth l1 Is Smoothl 1 A function.
In the deep neural network training process, the initial learning rate is selected to be 0.0025, when the training times reach 120000-160000 times, the learning rate is reduced to 0.1 time, for example, 180000 times of training in this embodiment, and at this time, the overall loss function L of the deep neural network meets the requirement, and then the trained deep neural network can be used for character detection.
According to the embodiment of the disclosure, feature extraction is performed on an input image by using a ResNet50 feature extraction network, so that a feature image is obtained.
Operation S120, a prediction is performed by using the adaptive regional recommendation network, and a recommendation box is obtained.
According to an embodiment of the present disclosure, operation S120 includes sub-operation S120A and sub-operation S120B.
A sub-operation S120A, which is to perform local bias prediction on the point of the preset anchor frame by using the adaptive regional suggestion network, to obtain a corresponding predicted point. Specifically, the obtained predicted points are:
Figure BDA0002355449200000058
wherein n is the number of the middle points of the preset anchor frame, and x l ' is the abscissa, y, of the first predicted point l ' is the ordinate, x, of the first predicted point l For presetting the abscissa, y, of the first point in the anchor frame l For presetting the ordinate, omega, of the first point in the anchor frame c For presetting the length of anchor frame, h c For presetting the width of the anchor frame, Δ x l Suggesting the abscissa offset, Deltay, of the ith point in the preset anchor frame of the network output for the adaptive region l And the ordinate offset of the ith point in a preset anchor frame output by the network is suggested for the adaptive area.
Referring to fig. 2, the number n of preset anchor frame midpoints is set to 9, which represents a center point and eight boundary points (including an upper left point, an upper middle point, an upper right point, a middle right point, a lower middle point, a lower left point, and a middle left point).
A sub-operation S120B determines a suggestion box according to the predicted point. Specifically, the predicted points corresponding to the four most significant coordinates (including the minimum abscissa, the minimum ordinate, the maximum abscissa, and the maximum ordinate) are obtained through maximum and minimum value screening to determine the suggestion box, as shown in fig. 2. The proposed box (proposal) position is represented by these four most valued coordinates:
Figure BDA0002355449200000061
in this embodiment, the number of the obtained suggestion boxes is one or more. And a plurality of suggestion boxes are obtained through prediction, so that the character detection precision can be further improved.
In operation S130, the feature image is clipped by using the suggestion box, so as to obtain a clipped feature map.
In this embodiment, when the number of the suggestion frames is multiple, each suggestion frame is used to crop the feature image to obtain multiple cropping feature maps, and the multiple cropping feature maps are normalized to obtain multiple cropping feature maps with the same size.
According to an embodiment of the present disclosure, after operation S130, the text detection method further includes: according to the cutting feature diagram, the suggestion frame is adjusted by using a fine adjustment network to obtain an adjusted suggestion frame; and utilizing the adjusted suggestion frame to cut the feature image to obtain an adjusted cutting feature image.
Referring to fig. 5, the trimming network is used to calculate the trimming feature map, and output the adjustment parameter of the adjustment suggestion frame, and the adjustment parameter is used to adjust the suggestion frame, where the adjusted suggestion frame is:
Figure BDA0002355449200000062
whereinX is the horizontal coordinate of the center point of the adjusted suggestion frame, y is the vertical coordinate of the center point of the adjusted suggestion frame, w is the width of the adjusted suggestion frame, h is the height of the adjusted suggestion frame, and x c For the center point abscissa, y, of the suggestion box before adjustment c For the ordinate of the centre point of the suggestion box before adjustment, w c To adjust the width of the proposed box before, h c For the height of the suggestion box before adjustment, x c 、y c 、w c 、h c Can be calculated according to the most-valued coordinate of the suggestion box (proposal), t 1 、t 2 、t 3 、t 4 To fine tune the tuning parameters of the network output.
Further, the text detection method further comprises the following steps: and performing upsampling on the adjusted cutting feature map to obtain an upsampling feature map. The size of the up-sampling feature map is larger than the feature size of the adjusted cutting feature map.
And operation S140, performing text texture information modeling on the cropping feature map in two orthogonal directions, respectively, to obtain a contour point thermodynamic diagram corresponding to each orthogonal direction.
Specifically, character texture information modeling is performed on the adjusted up-sampling feature map in two orthogonal directions respectively, so that a contour point thermodynamic diagram corresponding to each orthogonal direction is obtained.
Referring to fig. 3, the two orthogonal directions are a horizontal direction and a vertical direction, and the operation S140 includes a sub-operation S140A and a sub-operation S140B.
In sub-operation S140A, a first text texture information model of the cropped feature map in the horizontal direction is created according to the first convolution kernel. Specifically, according to the first convolution kernel, a first character texture information model of the adjusted up-sampling feature map in the horizontal direction is established in a sliding mode. The first convolution kernel has a size of 1 xk, k being greater than 0 and not greater than the size of the cropped feature map, k being, for example, 3.
Suboperation S140B, according to the second convolution kernel, builds a second text texture information model of the cropped feature map in the vertical direction. Specifically, according to the second convolution kernel, a second character texture information model of the adjusted up-sampling feature map in the vertical direction is established in a sliding mode. The size of the second convolution kernel is k × 1.
Further, the first character texture information model and the second character texture information model are normalized by a Sigmoid function to obtain a contour point thermodynamic diagram Hmap in the horizontal direction and a contour point thermodynamic diagram Vmap in the vertical direction.
And operation S150, screening the contour points in the contour point thermodynamic diagram to obtain a set of contour points, so as to reconstruct the characters in the input image.
In this embodiment, the contour point thermodynamic diagrams are screened by using a contour point regressing algorithm, so that pixel points with high response values in the two contour point thermodynamic diagrams are obtained at the same time, and a contour point set is formed.
According to an embodiment of the present disclosure, operation S150 includes sub-operation S150A and sub-operation S150B.
In sub-operation S150A, a non-maximum suppression method is used to filter background pixels in the contour thermodynamic diagram. Specifically, for example, a 1 × 3 sliding window is used to process the horizontal contour point thermodynamic diagram, a 3 × 1 sliding window is used to process the vertical contour point thermodynamic diagram, and the largest pixel point in the current window is output, and the remaining pixel points are suppressed.
And a sub-operation S150B, which is to filter the contour point thermodynamic diagrams according to a preset threshold to obtain a contour point set. Specifically, traversing each pixel point position in the contour point thermodynamic diagram after the non-maximum value is suppressed, and screening out pixel points of which the response values in the contour point thermodynamic diagrams corresponding to the horizontal direction and the vertical direction are both greater than a preset threshold value to form a contour point set. The preset threshold value is, for example, 0.5.
Furthermore, a character area in the input image is reconstructed according to the screened contour point set, so that characters in the input image are detected.
In the embodiment of the disclosure, the character detection method is used for detecting a large number of characters in scenes with any shapes, and detection results show that the character detection method has very good detection performance. For example, the recall rate, accuracy and F value of the text detection method on an ICDAR2015 data set are respectively 86.1%, 87.6% and 86.9%, and the FPS is 3.5; the recall rate, the accuracy rate and the F value on the Total-Text data set are respectively 83.9 percent, 86.9 percent and 85.4 percent, and the FPS is 3.8; recall, accuracy, F-number on CTW1500 dataset were 84.1%, 83.7%, 83.9%, respectively, FPS was 4.5.
Fig. 4 schematically shows a block diagram of a text detection system provided in an embodiment of the present disclosure.
The embodiment of the disclosure also provides a character detection system. The word detection system 400 includes an extraction module 410, a prediction module 420, a cropping module 430, a modeling module 440, and a screening module 450.
The extraction module 410 may perform operation S110, for example, to perform feature extraction on the input image to obtain a feature image.
The prediction module 420 may perform operation S120, for example, for predicting with the adaptive regional suggestion network, resulting in a suggestion box.
The cropping module 430 may perform operation S130, for example, to crop the feature image using the suggestion box, resulting in a cropped feature map.
The modeling module 440 may perform operation S140, for example, to perform text texture information modeling on the cropping feature map in two orthogonal directions, respectively, so as to obtain a contour point thermodynamic diagram corresponding to each orthogonal direction.
The screening module 450 may perform operation S150, for example, to screen the contour points in the contour point thermodynamic diagram to obtain a set of contour points, so as to reconstruct the text in the input image.
Please refer to the text detection method in the embodiment shown in fig. 1-3.
To sum up, the text detection method and system in the embodiments of the present disclosure perform feature extraction on an input image to obtain a feature image, predict the feature image by using an adaptive regional suggestion network to obtain a suggestion frame, crop the feature image by using the suggestion frame to obtain a cropping feature map, adjust the suggestion frame by a fine tuning network according to the cropping feature map, crop the feature image according to the adjusted suggestion frame to obtain an adjusted cropping feature map, perform text texture information modeling on the adjusted cropping feature map in two orthogonal directions respectively to obtain a contour point thermodynamic diagram corresponding to each orthogonal direction, screen contour points in the contour point thermodynamic diagrams to obtain a contour point set to reconstruct text in the input image, design the adaptive regional suggestion network to better adapt to scale changes of the text, perform text texture information modeling in the orthogonal directions, the method can inhibit false positive contour points, thereby effectively solving the problems of character scale change and false positive prediction and improving the precision of character detection in scenes with any shapes.
The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (6)

1. A text detection method, comprising:
performing feature extraction on an input image to obtain a feature image;
predicting by using the adaptive regional suggestion network to obtain a suggestion box, which specifically comprises the following steps: local bias prediction is carried out on the points of the preset anchor frame by utilizing the self-adaptive regional suggestion network to obtain corresponding predicted points; determining the suggestion box according to the predicted point;
cutting the characteristic image by using the suggestion frame to obtain a cutting characteristic diagram;
respectively performing character texture information modeling on the cutting characteristic diagram in two orthogonal directions to obtain a contour point thermodynamic diagram corresponding to each orthogonal direction, wherein the two orthogonal directions are a horizontal direction and a vertical direction, and the character texture information modeling is performed on the cutting characteristic diagram in the two orthogonal directions respectively, and comprises the following steps: according to the first convolution kernel, a first character texture information model of the cutting feature map in the horizontal direction is established; according to a second convolution kernel, establishing a second character texture information model of the cutting feature map in the vertical direction;
And filtering background pixel points in the contour point thermodynamic diagrams by using a non-maximum value inhibition method, and screening pixel points of which the response values in the contour point thermodynamic diagrams corresponding to the two orthogonal directions are both greater than a preset threshold value to obtain a contour point set so as to reconstruct characters in the input image.
2. The method of claim 1, wherein the first convolution kernel is 1 x k in size and the second convolution kernel is k x 1 in size, k being no greater than the size of the cropped feature map.
3. The method of claim 1, wherein the method further comprises:
according to the cutting feature diagram, the suggestion frame is adjusted by using a fine adjustment network to obtain an adjusted suggestion frame;
utilizing the adjusted suggestion frame to cut the feature image to obtain an adjusted cutting feature image;
and upsampling the adjusted cutting characteristic diagram to obtain an upsampled characteristic diagram.
4. The method of claim 3, wherein said separately modeling the cropped feature map texturally in two orthogonal directions comprises:
and respectively carrying out character texture information modeling on the up-sampling characteristic diagram in two orthogonal directions.
5. The method of claim 3, wherein said separately modeling the cropped feature map texturally in two orthogonal directions comprises:
respectively utilizing the character texture information perception networks in the two orthogonal directions to carry out character texture information modeling on the cutting characteristic diagram;
before feature extraction on the input image, the method further comprises:
training the adaptive area suggestion network, the character texture information perception network and the fine tuning network according to a loss function by using a random gradient descent method, wherein the loss function is as follows:
L=L ArpnHcp L HcpVcp L Vcpboxclass L boxclassboxreg L boxreg
wherein L is the loss function, L Arpn Proposing a loss function of the network for said adaptive area, L Hcp Is a loss function of the literal texture information perception network in an orthogonal direction, L Vcp For a loss function of the literal-texture-information-aware network in the other orthogonal direction, L boxclass 、L boxreg As a loss function of said fine tuning network, λ Hcp A balance parameter, λ, for the textual texture information aware network in said one orthogonal direction Vcp For said other orthogonal direction a balance parameter, λ, of the text-to-texture information aware network boxclass 、λ boxreg Balancing parameters for the fine tuning network.
6. A text detection system, comprising:
The extraction module is used for extracting the features of the input image to obtain a feature image;
the prediction module is configured to perform prediction by using a self-adaptive regional suggestion network to obtain a suggestion box, and specifically includes: local bias prediction is carried out on the points of the preset anchor frame by utilizing the self-adaptive regional suggestion network to obtain corresponding predicted points; determining the suggestion box according to the predicted point;
the cutting module is used for cutting the characteristic image by using the suggestion frame to obtain a cutting characteristic diagram;
the modeling module is configured to perform text texture information modeling on the clipping feature map in two orthogonal directions respectively to obtain a contour point thermodynamic diagram corresponding to each of the orthogonal directions, where the two orthogonal directions are a horizontal direction and a vertical direction, and the text texture information modeling is performed on the clipping feature map in the two orthogonal directions respectively, and includes: according to the first convolution kernel, a first character texture information model of the cutting feature map in the horizontal direction is established; according to a second convolution kernel, establishing a second character texture information model of the cutting feature map in the vertical direction;
and the screening module is used for filtering background pixel points in the contour point thermodynamic diagrams by using a non-maximum value inhibition method, and then screening pixel points of which the response values in the contour point thermodynamic diagrams corresponding to the two orthogonal directions are both larger than a preset threshold value to obtain a contour point set so as to reconstruct characters in the input image.
CN202010008296.7A 2020-01-03 2020-01-03 Character detection method and system Active CN111242120B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010008296.7A CN111242120B (en) 2020-01-03 2020-01-03 Character detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010008296.7A CN111242120B (en) 2020-01-03 2020-01-03 Character detection method and system

Publications (2)

Publication Number Publication Date
CN111242120A CN111242120A (en) 2020-06-05
CN111242120B true CN111242120B (en) 2022-07-29

Family

ID=70868604

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010008296.7A Active CN111242120B (en) 2020-01-03 2020-01-03 Character detection method and system

Country Status (1)

Country Link
CN (1) CN111242120B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783801B (en) * 2020-07-17 2024-04-23 上海明波通信技术股份有限公司 Object contour extraction method and system and object contour prediction method and system
CN111914843B (en) * 2020-08-20 2021-04-16 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Character detection method, system, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574513A (en) * 2015-12-22 2016-05-11 北京旷视科技有限公司 Character detection method and device
CN109447078A (en) * 2018-10-23 2019-03-08 四川大学 A kind of detection recognition method of natural scene image sensitivity text
CN109670494A (en) * 2018-12-13 2019-04-23 深源恒际科技有限公司 A kind of Method for text detection and system of subsidiary recognition confidence
CN109886077A (en) * 2018-12-28 2019-06-14 北京旷视科技有限公司 Image-recognizing method, device, computer equipment and storage medium
CN110059685A (en) * 2019-04-26 2019-07-26 腾讯科技(深圳)有限公司 Word area detection method, apparatus and storage medium
CN110263877A (en) * 2019-06-27 2019-09-20 中国科学技术大学 Scene character detecting method
CN110363252A (en) * 2019-07-24 2019-10-22 山东大学 It is intended to scene text detection end to end and recognition methods and system
CN110598698A (en) * 2019-08-29 2019-12-20 华中科技大学 Natural scene text detection method and system based on adaptive regional suggestion network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10354161B2 (en) * 2017-06-05 2019-07-16 Intuit, Inc. Detecting font size in a digital image

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574513A (en) * 2015-12-22 2016-05-11 北京旷视科技有限公司 Character detection method and device
CN109447078A (en) * 2018-10-23 2019-03-08 四川大学 A kind of detection recognition method of natural scene image sensitivity text
CN109670494A (en) * 2018-12-13 2019-04-23 深源恒际科技有限公司 A kind of Method for text detection and system of subsidiary recognition confidence
CN109886077A (en) * 2018-12-28 2019-06-14 北京旷视科技有限公司 Image-recognizing method, device, computer equipment and storage medium
CN110059685A (en) * 2019-04-26 2019-07-26 腾讯科技(深圳)有限公司 Word area detection method, apparatus and storage medium
CN110263877A (en) * 2019-06-27 2019-09-20 中国科学技术大学 Scene character detecting method
CN110363252A (en) * 2019-07-24 2019-10-22 山东大学 It is intended to scene text detection end to end and recognition methods and system
CN110598698A (en) * 2019-08-29 2019-12-20 华中科技大学 Natural scene text detection method and system based on adaptive regional suggestion network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Learning Shape-Aware Embedding for Scene Text Detection;Zhuotao Tian 等;《2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)》;20191231;4234-4243 *
MSR: Multi-Scale Shape Regression for Scene Text Detection;Chuhui Xue 等;《arXiv:1901.02596v1》;20190109;1-9 *
基于机器视觉的复杂背景下的数字字符识别;郝学智;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》;20180131;第2019年卷(第01期);I138-2538 *
基于移动终端的图像文字识别系统的研究及实现;李飞;《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》;20151215;第2015年卷(第12期);I138-797 *

Also Published As

Publication number Publication date
CN111242120A (en) 2020-06-05

Similar Documents

Publication Publication Date Title
US10846524B2 (en) Table layout determination using a machine learning system
CN109902600B (en) Road area detection method
CN112232349B (en) Model training method, image segmentation method and device
CN108509978B (en) Multi-class target detection method and model based on CNN (CNN) multi-level feature fusion
CN109711316B (en) Pedestrian re-identification method, device, equipment and storage medium
CN112150493B (en) Semantic guidance-based screen area detection method in natural scene
CN112149594B (en) Urban construction assessment method based on deep learning and high-resolution satellite images
CN113361432B (en) Video character end-to-end detection and identification method based on deep learning
CN111242120B (en) Character detection method and system
CN110135446B (en) Text detection method and computer storage medium
CN114627052A (en) Infrared image air leakage and liquid leakage detection method and system based on deep learning
CN113627228A (en) Lane line detection method based on key point regression and multi-scale feature fusion
CN113822352B (en) Infrared dim target detection method based on multi-feature fusion
CN111723841A (en) Text detection method and device, electronic equipment and storage medium
CN113191213B (en) High-resolution remote sensing image newly-added building detection method
CN111209858A (en) Real-time license plate detection method based on deep convolutional neural network
CN111832453A (en) Unmanned scene real-time semantic segmentation method based on double-path deep neural network
CN113591831A (en) Font identification method and system based on deep learning and storage medium
CN114663392A (en) Knowledge distillation-based industrial image defect detection method
CN115641632A (en) Face counterfeiting detection method based on separation three-dimensional convolution neural network
CN110751157B (en) Image significance segmentation and image significance model training method and device
CN114998373A (en) Improved U-Net cloud picture segmentation method based on multi-scale loss function
CN111931572B (en) Target detection method for remote sensing image
CN117456376A (en) Remote sensing satellite image target detection method based on deep learning
JP2011170890A (en) Face detecting method, face detection device, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant