CN111695377A - Text detection method and device and computer equipment - Google Patents

Text detection method and device and computer equipment Download PDF

Info

Publication number
CN111695377A
CN111695377A CN201910188639.XA CN201910188639A CN111695377A CN 111695377 A CN111695377 A CN 111695377A CN 201910188639 A CN201910188639 A CN 201910188639A CN 111695377 A CN111695377 A CN 111695377A
Authority
CN
China
Prior art keywords
image
neural network
text region
text
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910188639.XA
Other languages
Chinese (zh)
Other versions
CN111695377B (en
Inventor
王杰
李明键
钮毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201910188639.XA priority Critical patent/CN111695377B/en
Publication of CN111695377A publication Critical patent/CN111695377A/en
Application granted granted Critical
Publication of CN111695377B publication Critical patent/CN111695377B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application provides a text detection method and device and computer equipment. The text detection method provided by the application comprises the following steps: acquiring appointed information from an image to be detected containing a text, inputting the appointed information into a pre-established target neural network for constructing a spatial relationship between the text in the image and a concerned target, outputting spatial information by the target neural network, and correcting the candidate text region according to the spatial information to obtain a final selected text region in the image to be detected. The specific information comprises a feature vector of a candidate text region positioned from the image to be detected, and the attention target comprises at least one of a text in the image, a specific target which has a spatial relationship with the text in the image and attribute information of the image. The text detection method, the text detection device and the computer equipment can accurately position the text area in the image to be detected.

Description

Text detection method and device and computer equipment
Technical Field
The present application relates to the field of image detection, and in particular, to a text detection method and apparatus, and a computer device.
Background
With the widespread use of image acquisition devices, image detection techniques based on image content are receiving more and more attention. Among contents included in an image, text information is more easily understood, and thus a character recognition technology has received a great deal of attention.
The text recognition technology mainly comprises text detection and character recognition. The text detection means positioning a text region from an image to be detected; the character recognition means recognizing a text region and outputting text information. According to the text detection method disclosed by the related technology, a large number of anchor points are established, anchor points close to the text are screened out through a related algorithm, and the offset between the anchor points and the text is regressed, so that a text region is obtained. The method only carries out text detection through the fixed receptive field, and is low in accuracy.
Disclosure of Invention
In view of this, the present application provides a text detection method, a text detection device, and a computer device, so as to provide a text detection method with higher accuracy.
A first aspect of the present application provides a text detection method, including:
acquiring specified information from an image to be detected containing a text; the specified information comprises a characteristic vector of a candidate text region positioned from the image to be detected;
inputting the specified information into a pre-established target neural network for constructing a spatial relationship between a text in the image and an attention target, and outputting spatial information by the target neural network; wherein the attention target includes at least one of a text in the image, a specified target in the image having a spatial relationship with the text, and attribute information of the image;
and correcting the candidate text region according to the spatial information to obtain a final selection text region in the image to be detected.
A second aspect of the present application provides a text detection apparatus comprising an element generation module, a spatial relationship modeling module, and a text detection module, wherein,
the element generation module is used for acquiring specified information from the image to be detected containing the text; the specified information comprises a characteristic vector of a candidate text region positioned from the image to be detected;
the spatial relationship establishing module is used for inputting the specified information into a pre-established target neural network for establishing a spatial relationship between a text in the image and an attention target, and the target neural network outputs spatial information; wherein the attention target includes at least one of a text in the image, a specified target in the image having a spatial relationship with the text, and attribute information of the image;
and the text detection module is used for correcting the candidate text region according to the spatial information to obtain a final selection text region in the image to be detected.
A third aspect of the present application provides a computer storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods provided by the first aspect of the present application.
A fourth aspect of the present application provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods provided in the first aspect of the present application when executing the program.
The application provides a text detection method, a text detection device and a computer device, through obtaining the appointed information from the image to be detected containing the text, will the appointed information is input to the target neural network that is used for constructing the spatial relationship between the text in the image and the concerned target that is established in advance, by the target neural network outputs spatial information, the basis spatial information is right candidate text region is rectified, obtains the final selection text region in the image to be detected. The specific information comprises a feature vector of a candidate text region positioned from the image to be detected, and the attention target comprises at least one of a text in the image, a specific target which has a spatial relationship with the text in the image and attribute information of the image. Therefore, the spatial relationship between the text and the target is fully considered, the spatial information is fully utilized to position the final selected text area, missing detection and false detection can be avoided, and the accuracy of text detection can be improved.
Drawings
Fig. 1 is a flowchart of a first embodiment of a text detection method provided in the present application;
fig. 2 is a flowchart of a second embodiment of a text detection method provided in the present application;
fig. 3 is a flowchart of a third embodiment of a text detection method provided in the present application;
fig. 4 is a flowchart of a fourth embodiment of a text detection method provided in the present application;
fig. 5 is a hardware structure diagram of a computer device in which a text detection apparatus according to an exemplary embodiment of the present application is located;
fig. 6 is a schematic structural diagram of a first embodiment of a text detection apparatus provided in the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The application provides a text detection method, a text detection device and computer equipment, and aims to provide a text detection method with high accuracy.
In the following, specific examples are given to describe the technical solutions of the present application in detail. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 1 is a flowchart of a first embodiment of a text detection method provided in the present application. Referring to fig. 1, the method provided in this embodiment may include:
s101, acquiring specified information from an image to be detected containing a text, wherein the specified information comprises a feature vector of a candidate text region positioned from the image to be detected.
Specifically, the specific implementation process of this step may include:
(1) extracting the features of the image to be detected to obtain a feature map;
(2) and acquiring the specified information from the characteristic diagram.
Specifically, the feature extraction may be performed on the image to be detected by using a conventional method. For example, a Scale-invariant Feature Transform (SIFT) algorithm is used to extract features of an image to be detected. Of course, a neural network may also be used to perform feature extraction on the image to be detected, for example, in an embodiment, a specific implementation process of this step may include:
inputting the image to be detected into a neural network for feature extraction, and extracting the features of the image to be detected by a specified layer in the neural network; the designated layer comprises a convolutional layer, or the designated layer comprises a convolutional layer and at least one of a pooling layer and a fully-connected layer; and determining the output result of the specified layer as the feature map.
Specifically, the neural network for feature extraction may include a convolutional layer, and the convolutional layer is configured to perform filtering processing on an input image to be detected. Further, at this time, the filtering result output by the convolutional layer is the feature map extracted. In addition, the neural network for feature extraction may further include a pooling layer and/or a fully connected layer. For example, in an embodiment, the neural network for feature extraction includes a convolutional layer, a pooling layer, and a full-link layer, where the convolutional layer is configured to perform filtering processing on an input image to be detected; the pooling layer is used for compressing the filtering result; and the full connection layer is used for carrying out aggregation processing on the compression processing result. Further, at this time, the aggregation processing result output by the full connection layer is the extracted feature map.
Further, in an embodiment, the specifying information includes a feature vector of a candidate text located from the image to be detected; in another embodiment, the specifying information includes a feature vector of a candidate text located from the image to be detected, and at least one of a feature vector of a specified target located from the image to be detected and having a spatial relationship with the candidate text region and attribute information of the image to be detected. The following description will take as an example that the specifying information includes a feature vector of a candidate text located from an image to be detected, a feature vector of a specified target located from an image to be detected and having a spatial relationship with the candidate text region, and attribute information of the image to be detected.
The designated target may be a hidden variable or a custom target related to the task. For example, if the vehicle where the license plate is located has no calibration of the vehicle, a hidden variable can be set; if the vehicle is calibrated, the vehicle can be regarded as a user-defined specified target and can be used for assisting the detection of the license plate.
In addition, the attribute information of the image to be detected may include a deformation attribute, a color attribute, a font attribute, a texture attribute, a perspective attribute, and the like of the image to be detected. In the present embodiment, this is not limited. The following description will take the example that the attribute information of the image to be detected includes the deformation attribute of the image to be detected. It should be noted that, when the attribute information of the image to be detected is the deformation attribute of the image to be detected, the deformation attribute may be represented by the rotation angle θ of each pixel point in the image to be detected.
Specifically, in an embodiment, the process of obtaining the feature vector of the candidate text region from the feature map may include:
(1) inputting the feature map into a neural network for information extraction, carrying out convolution processing on the feature map by a first convolution layer in the neural network to obtain a convolution processing result, processing the convolution processing result by a softmax layer of the neural network, and outputting the probability that each pixel point in the image to be detected belongs to a text; and carrying out convolution processing on the characteristic graph by a second convolution layer in the neural network, and outputting the deviation of each pixel point distance text in the image to be detected.
For example, in one embodiment, the size of the image to be detected is 9 × 9, the dimension of the obtained feature map is 9 × 256, and the dimension of the first convolution layer of the neural network used for information extraction is 1 × 2, so that the dimension of the convolution processing result output by the first convolution layer is 9 × 2. Further, after the convolution processing result is processed by the softmax layer, the dimensionality of the output processing result is 9 × 1, and the probabilities that all pixel points in the image to be detected belong to the text are represented respectively. For another example, in this embodiment, the dimension of the second convolution layer in the neural network for information extraction is 1 × 8, so that after the second convolution layer performs convolution processing on the feature map, the dimension of the output convolution processing result is 9 × 8, and represents the deviation of each pixel point in the image to be detected from the text. In this example, the deviation of a pixel from the text is characterized by the deviation of the pixel from the four corners of the text (corresponding to 8 channels of the convolution processing result).
(2) And positioning a candidate text region in the image to be detected according to the probability of the text of each pixel point in the image to be detected and the deviation of each pixel point in the image to be detected from the text to obtain the characteristic vector of the candidate text region.
Specifically, based on the probability that each pixel point in the image to be detected belongs to the text and the deviation of each pixel point in the image to be detected from the text, the candidate text region in the image to be detected can be positioned, and then the feature vector of the candidate text region is obtained. Further, the feature vector of the candidate text region includes a coordinate of a center point of the candidate text region, a width value, a height value, an angle, a feature vector corresponding to the candidate text region in the feature map, and a confidence of the candidate text region (where the confidence of the candidate text region may be equal to an average value of probabilities that target pixel points located in the candidate text region in the image to be detected belong to the text). It should be noted that, for the specific implementation principle and implementation process related to locating the candidate text region in the image to be detected according to the probability that each pixel point in the image to be detected belongs to the text and the deviation of each pixel point in the image to be detected from the text, to obtain the feature vector of the candidate text region, reference may be made to the description in the related art, and details are not repeated here.
In addition, in another embodiment, the process of obtaining a feature vector of a specific target having a spatial relationship with the candidate text region from the feature map may include:
(1) inputting the feature map into a neural network for information extraction, performing convolution processing on the feature map by a first convolution layer in the neural network to obtain a convolution processing result, processing the convolution processing result by a softmax layer of the neural network, and outputting the probability that each pixel point in the image to be detected belongs to the specified target; and carrying out convolution processing on the characteristic graph by a second convolution layer in the neural network, and outputting the deviation of each pixel point in the image to be detected from the specified target.
(2) And positioning the appointed target in the image to be detected according to the probability that each pixel point in the image to be detected belongs to the appointed target and the deviation of each pixel point in the image to be detected from the appointed target to obtain the characteristic vector of the appointed target.
Specifically, the specific implementation process and implementation principle of obtaining the feature vector of the designated target are similar to the specific implementation process and implementation principle of obtaining the feature vector of the candidate text region, and are not described herein again.
Further, in another embodiment, when the attribute information of the image to be detected is deformation information of the image to be detected, and the deformation information is represented by a rotation angle θ of each pixel point in the image to be detected, the process of extracting the attribute information of the image to be detected from the feature map may include:
(1) inputting the characteristic diagram into a neural network for information extraction, carrying out convolution processing on the characteristic diagram by a convolution layer in the neural network to obtain a convolution processing result, carrying out normalization processing on the convolution processing result by a softmax layer in the neural network to obtain a normalization processing result, and converting the normalization processing result into a rotation angle theta of each pixel point in the image to be detected by a bias layer of the neural network.
Specifically, for example, the size of the image to be detected is 9 × 9, the dimension of the obtained feature map is 9 × 256, and the dimension of the convolution layer of the neural network used for information extraction is 1 × 2, so that after the convolution processing, the dimension of the obtained convolution processing result is 9 × 2. In addition, the softmax layer normalizes the convolution processing result to obtain a normalized processing result with a dimension of 9 × 1. Further, after the bias layer converts the normalization processing result, the dimension of the obtained conversion result is 9 × 1, and the rotation angle θ of each pixel point in the image to be detected is respectively represented.
It should be noted that the process of acquiring other attribute information is similar to the above process, and is not described herein again.
S102, inputting the specified information into a pre-established target neural network for constructing a spatial relationship between a text in an image and a target of interest, and outputting spatial information by the target neural network; wherein the attention target includes at least one of a text in the image, a designated target in the image having a spatial relationship with the text, and attribute information of the image.
S103, correcting the candidate text area according to the spatial information to obtain a final selected text area in the image to be detected.
Specifically, the spatial information can be obtained by using the target neural network using the above-mentioned specifying information.
For example, in an embodiment, when the specifying information includes a feature vector of the candidate text region, the specific implementation procedure of step S102 may include:
(1) and inputting the specified information into a first neural network in the target neural network, and processing the specified information by the first neural network to obtain the confidence of the candidate text region and a position probability map of the suspected text region in the image to be detected.
Specifically, the first neural network is used for constructing a spatial relationship between a text and the text, the input of the neural network is a feature vector of a candidate text region, and the output is a confidence coefficient of the candidate text region and a position probability map of a suspected text region in an image to be detected. For example, in one embodiment, the feature vector of the candidate text region 1 and the feature vector of the candidate text region 2 are input, and the position probability map of the suspected text region 3 is output. It should be noted that the position probability map of the suspected text region 3 refers to the position probability distribution of the next possible text under the condition that all the current candidate text regions exist, and includes the probability that each pixel in the image to be detected is the suspected text region and the deviation of each pixel from the suspected text region.
Specifically, after step S101, a feature vector of at least one candidate text region may be extracted. For example, in one embodiment, the feature vectors of the candidate text regions 1 and the feature vectors of the candidate text regions 2 are extracted. In this step, the feature vector of the candidate text region 1 and the feature vector of the candidate text region 2 are input to the first neural network, and then the concat layer of the first neural network performs fusion processing on the feature vectors to obtain fused specified information, and the full-link layer of the first neural network performs weighting processing on the fused specified information to obtain the confidence of the candidate text region and the position probability map of the suspected text region in the image to be detected.
It should be noted that, for the specific implementation principle and implementation procedure of the fusion process and the weighting process, reference may be made to the description in the related art, and details are not described here.
With reference to the foregoing example, for example, in an embodiment, the size of the image to be detected is 9 × 9, the dimension of the feature vector of the candidate text region 1 is n, the dimension of the feature vector of the candidate text region 2 is n, after the fusion processing, the dimension of the fused specified information is 2n, and the dimension of the full connection coefficient (the network parameter learned in advance) of the full connection layer of the first neural network is 2n (1+1+9 × 9), at this time, after the weighting processing, the dimension of the weighting processing result is 1+1+9 × 9, where the first two dimensions represent the confidence degrees of the candidate text region 1 and the candidate text region 2, the 3 rd to 11 th dimensions represent the probability that each pixel point in the image to be detected is the text suspected region, and the last 8 × 9 dimensions represent the deviation of each pixel point in the image to be detected from the text suspected region (the deviation of each pixel point from the text suspected region is the distance from the text region to be detected) The deviation of four corner points of the region is characterized, that is, the deviation of each point from the suspected region of the text has 8 dimensions). It should be noted that the combination of the probability that each pixel in the image to be detected is a suspected text region and the deviation of each pixel from the suspected text region is a position probability map of the suspected text region in the image to be detected.
(2) Determining the confidence of the candidate text region and the position probability map as the spatial information.
Further, in this embodiment, after obtaining the spatial information in step S102, in step S103, the final text region may be determined according to the following method, where the method includes:
(1) and determining a first candidate text region and the confidence of the first candidate text region according to the position probability map.
Specifically, the specific implementation process of this step may include: in the position probability map, a first target pixel point with a probability (see the above introduction, where the probability refers to the probability that each pixel point in an image to be detected belongs to a text) greater than a first preset threshold is searched, a second target pixel point with a probability greater than a second preset threshold is searched in a specified field of the first target pixel point, the second target pixel point is determined as a pixel point for constructing a first candidate text region, and the first candidate text region constructed based on the second target pixel point is determined according to a deviation of the second target pixel point from the text in the position probability map (a specific implementation process related to this step may refer to descriptions in related technologies, and details are not repeated here).
Further, in an embodiment, an average value of probabilities that each pixel point in the first candidate text region belongs to the text may be determined as the confidence of the first candidate text region.
It should be noted that the first preset threshold is greater than the second preset threshold, and specific values of the first preset threshold and the second preset threshold are set according to actual needs, and in this embodiment, the specific values are not limited. For example, in one embodiment, the first predetermined threshold is 0.7, and the second predetermined threshold is 0.5.
(2) And judging whether the probability corresponding to the candidate text region in the position probability map is smaller than a preset threshold value.
Specifically, according to the feature vector of the candidate text region, the position coordinates of the candidate text region can be known, and then the position of the candidate text region in the position probability map is known, so that the probability corresponding to the candidate text region in the position probability map is obtained. For example, in an embodiment, an average value of probabilities that all pixel points in the candidate text region in the position probability map belong to the text is determined as a probability corresponding to the candidate text region.
The preset threshold is set according to actual needs, and in the present embodiment, the preset threshold is not limited thereto. For example, in one embodiment, the predetermined threshold may be 0.3.
(3) If so, deleting the candidate text region, and performing non-maximum suppression processing on the first candidate region according to the confidence coefficient of the first candidate text region to obtain the final selected text region;
(4) if not, determining the candidate text region as a second candidate text region, and performing non-maximum suppression processing on the first candidate region and the second candidate region according to the confidence coefficient of the first candidate text region and the confidence coefficient of the second candidate text region to obtain the final selected text region.
Specifically, the specific implementation principle and implementation step of the non-maximum suppression processing may be referred to in the description of the related art, and are not described herein again.
In the method provided by the embodiment, the specified information is acquired from the image to be detected containing the text, the specified information is input into a pre-established target neural network for constructing the spatial relationship between the text in the image and the attention target, the target neural network outputs the spatial information, and the candidate text region is corrected according to the spatial information to obtain the final selected text region in the image to be detected. The specific information comprises a feature vector of a candidate text region positioned from the image to be detected, and the attention target comprises at least one of a text in the image, a specific target which has a spatial relationship with the text in the image and attribute information of the image. Therefore, the spatial relationship between the text and the target is fully considered, the spatial information is fully utilized to position the final selected text area, and the accuracy can be improved.
In the following, some more specific examples are given to describe the technical solutions provided in the present application in detail.
Fig. 2 is a flowchart of a second embodiment of a text detection method provided in the present application. In the method provided by this embodiment, the specifying information further includes a feature vector of a specified target located in the image to be detected and having a spatial relationship with the candidate text region, and step S102 may include:
s201, inputting the specified information into a second neural network in the target neural network, processing the specified information by the second neural network, and outputting the confidence of the candidate text region and the position probability map of the text suspected region in the image to be detected.
Specifically, for a specific implementation method and an implementation principle related to extracting a feature vector of a specified target, reference may be made to the description in the foregoing embodiments, and details are not described here.
In addition, the second neural network is used for constructing a spatial relationship between the text and the specified target, and the input of the neural network can be a feature vector of the candidate text region and a feature vector of the specified target, and the output can be confidence of the candidate text region and a position probability map of the text suspected region in the image to be detected. For example, in one embodiment, the feature vector of the candidate text region 1 and the feature vector of the designated target are input, and the confidence of the candidate text region 1 and the position probability map of the suspected text region 2 are output.
Further, the processing of the specified information by the second neural network may include: and performing fusion processing on the specified information to obtain fused specified information, and performing weighting processing on the fused specified information. For example, in an embodiment, the second neural network may include a concat layer and a full connection layer, where the concat layer is configured to perform fusion processing on the specifying information to obtain fused specifying information, and the full connection layer is configured to perform weighting processing on the fused specifying information.
It should be noted that, through the second neural network, text regions that are missed to be detected can be searched, text targets are prevented from being lost, false detection texts are screened out, and accuracy of text detection can be improved. For example, a license plate is generally placed on a vehicle body, a shop name is generally in a door shop range, and the like, so that a false-check text which does not satisfy a certain positional relationship with a specified target can be screened out by training and learning the positional relationship between the specified target and the text.
And S202, determining the confidence of the candidate text region and the position probability map as the spatial information.
Specifically, in this embodiment, after obtaining the spatial information, in step S103, the final text region may be determined according to the method described in the above embodiment, and details are not repeated here.
According to the method provided by the embodiment, the spatial relationship between the text and the specified target can be constructed through the second neural network, so that the spatial information is obtained. Therefore, the final text area is determined based on the obtained spatial information, the text area which is missed to be detected can be searched, the text target is prevented from being lost, the false detection area is screened out, and the accuracy of text detection can be improved.
Fig. 3 is a flowchart of a third embodiment of a text detection method provided in the present application. Referring to fig. 3, in the method provided in this embodiment, the specifying information further includes attribute information of the image to be detected. Step S102 may include:
s301, inputting the specification information into a third neural network in the target neural network, processing the specification information by the third neural network, and outputting the corrected position coordinates of the candidate text regions.
Specifically, the third neural network is used for constructing a spatial relationship between the text and the attribute information of the image to be detected, and the input of the third neural network may be a feature vector of the candidate text region and the attribute information of the image to be detected, and the output may be a corrected position coordinate of the candidate text region. For example, in an embodiment, the feature vector of the candidate text region 1 and the rotation angle θ of each pixel point in the image to be detected are input, and the corrected position coordinates of the candidate text region 1 are output.
It should be noted that, the process of processing the specified information by the third neural network may include: the neural network carries out fusion processing on the specified information to obtain fused specified information, and carries out weighting processing on the fused specified information. For example, in an embodiment, the third neural network may include a concat layer and a full connection layer, where the concat layer is configured to perform fusion processing on the specifying information to obtain fused specifying information, and the full connection layer is configured to perform weighting processing on the fused specifying information.
For example, in an embodiment, the dimension of the feature vector of the candidate text region 1 is n, the dimension of the attribute information of the image to be detected is 1, and at this time, the dimension of the fused specified information is n + 1. Further, the dimension of the full-link coefficient of the full-link layer of the third neural network is (n +1) × 8, so that the dimension of the weighting processing result obtained after the weighting processing is 8, and the corrected position coordinates of the candidate text region 1 (the position coordinates are represented by the coordinates of the four corner points of the candidate text region, and thus are 8 dimensions) are represented. It should be noted that, for the specific implementation procedures and implementation principles of the fusion process and the weighting process, reference may be made to descriptions in the related art, and details are not described here.
S302, determining the corrected position coordinates of the text candidate region as the spatial information.
Accordingly, in this embodiment, when the spatial information is the position coordinates of the text region candidate after correction, in step S103, the position of the text region candidate may be finely adjusted based on the position coordinates of the text region candidate after correction, so as to obtain the final text region. For example, in one embodiment, the final text region may be determined directly based on the revised position coordinates of the candidate text region. For another example, in another embodiment, the final selected text region may also be determined based on the corrected position coordinates of the candidate text region and the initial position coordinates of the candidate text region determined in step S101 (the initial position coordinates of the candidate text region may be determined based on the probability that each pixel point in the to-be-detected image belongs to the text and the deviation of each pixel point in the to-be-detected image from the text). For example, in one embodiment, the final text region is determined based on the corrected position coordinates and an average of the initial position coordinates.
According to the method provided by the embodiment, the spatial information is obtained by constructing the spatial relationship between the text and the attribute information of the image to be detected, and the final selected text region is determined according to the spatial information, so that the fine adjustment of the position of the text region can be realized, and the accuracy of text detection is improved.
Fig. 4 is a flowchart of a fourth embodiment of a text detection method provided in the present application. Referring to fig. 4, in the method provided in this embodiment, the specifying information further includes a feature vector of a specified target located in the image to be detected and having a spatial relationship with the candidate text region, and attribute information of the image to be detected; step S102, comprising:
and S401, inputting the specified information into a fourth neural network in the target neural network, processing the specified information by the fourth neural network, and outputting the confidence of the candidate text region, the corrected position coordinates of the candidate text region and the position probability map of the text suspected region in the image to be detected.
The method for acquiring the feature vector of the designated target and the attribute information of the image to be detected can be referred to the description in the previous embodiment, and will not be described herein again.
Specifically, the fourth neural network is used for constructing a spatial relationship among the text, the designated target and the attribute information of the image to be detected, the input of the fourth neural network can be the characteristic vector of the candidate text region, the characteristic vector of the specified target and the attribute information of the image to be detected, the output is the confidence coefficient of the candidate text region, the position coordinates of the candidate text region after correction and the position probability map of the text suspected region in the image to be detected, for example, in one embodiment, the feature vector of the candidate text region 1, the feature vector of the candidate text region 2, the feature vector of the designated target, and the rotation angle θ of each pixel point in the image to be detected are input, and the confidence coefficient and the corrected position coordinate of the candidate text region 1, the confidence coefficient and the corrected position coordinate of the candidate text region 2, and the position probability map of the suspected text region 3 are output.
It should be noted that, in an embodiment, the fourth neural network may include a concat layer and a full connection layer, where the concat layer is configured to perform fusion processing on the specifying information to obtain fused specifying information, and the full connection layer is configured to perform weighting processing on the fused specifying information, and output a confidence of the candidate text region, a corrected position coordinate of the candidate text region, and a position probability map of a text suspected region in the image to be detected.
With reference to the above example, for example, in an embodiment, the size of the image to be detected is 9 × 9, the dimension of the feature vector of the candidate text region 1 is n, the dimension of the feature vector of the candidate text region 2 is n, the dimension of the feature vector of the designated target is n, the dimension of the attribute information of the image to be detected is 1, and the dimension of the fused designated information is 3n + 1. In this example, the dimension of the full-connection coefficient of the full-connection layer is 9+9+9 × 9, after the weighting processing, the dimension of the weighting processing result is 9+9+9 × 9, wherein the first 9 dimensions represent the confidence coefficient and the corrected position coordinate (8 dimensions) of the candidate text region 1, the middle 9 dimensions represent the confidence coefficient and the corrected position coordinate of the candidate text region 2, and finally the 9-9 dimensions represent the position probability map of the suspected text region in the image to be detected, that is, represent the probability that each pixel in the image to be detected belongs to the text and the deviation of each pixel from the text.
And S402, determining the confidence of the candidate text region, the corrected position coordinates of the candidate text region and the position probability map as the spatial information.
Specifically, in this implementation, the spatial information includes the confidence of the candidate text region, the position coordinates of the candidate text region after correction, and the position probability map of the suspected text region in the image to be detected, and in step S103, the final selected text region may be determined based on the same method as in the first embodiment, where the only difference from the first embodiment is that: when judging whether the probability corresponding to the candidate text region in the position probability map is smaller than a preset threshold value, at this time, the position of the candidate text region in the position probability map can be obtained according to the position coordinates of the candidate text region after correction, and then the probability corresponding to the candidate text region in the position probability map is obtained. Alternatively, when it is determined whether the probability corresponding to the candidate text region in the position probability map is smaller than the preset threshold, at this time, the candidate text region may be refined based on the corrected position coordinate and the initial position coordinate (the initial position coordinate is referred to the above description) of the candidate text region, so as to obtain an adjusted position coordinate (for example, in an embodiment, the adjusted position coordinate is equal to an average value of the corrected position coordinate and the initial position coordinate), and then, based on the adjusted position coordinate, the position of the candidate text region in the position probability map is determined, so as to obtain the probability corresponding to the candidate text region in the position probability map. In addition, when the non-maximum value suppression processing is performed, the processing is performed in accordance with the position coordinates after the adjustment of the candidate text region.
According to the method provided by the embodiment, through the fourth neural network, the spatial relationship among the text, the designated target and the attribute information of the image to be detected can be constructed, and the spatial information can be obtained. Therefore, the final selected text area is determined based on the obtained spatial information, so that the text area which is missed to be detected can be searched, the text target is prevented from being lost, the position of the text area can be finely adjusted, and the accuracy of text detection can be improved.
Specifically, the target neural network is pre-established by the following method:
acquiring a training sample set; the training sample set comprises a plurality of pictures;
establishing a standby neural network for constructing a spatial relationship between the text in the image and the designated target; the input of the standby neural network is the designated information, and the output is the spatial information;
and training the standby neural network by adopting the training sample set to obtain the target neural network.
Corresponding to the embodiment of the text detection method, the application also provides an embodiment of a text detection device.
The embodiment of the text detection device can be applied to computer equipment. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and is formed by reading corresponding computer program instructions in the memory into the memory for operation through the processor of the computer device where the software implementation is located as a logical means. From a hardware aspect, as shown in fig. 5, a hardware structure diagram of a computer device where a text detection apparatus is located is shown in an exemplary embodiment of the present application, except for the storage 510, the processor 520, the memory 530, and the network interface 540 shown in fig. 5, the computer device where the apparatus is located in the embodiment may also include other hardware according to an actual function of the text detection apparatus, which is not described again.
Fig. 6 is a schematic structural diagram of a first embodiment of a text detection apparatus provided in the present application. Referring to fig. 6, the text detection apparatus provided in the present application may include an element generation module 610, a spatial relationship modeling module 620, and a text detection module 630, wherein,
the extracting module 610 is configured to obtain specified information from an image to be detected that includes a text; the specified information comprises a characteristic vector of a candidate text region positioned from the image to be detected;
the spatial relationship modeling module 620 is configured to input the specific information into a pre-established target neural network for constructing a spatial relationship between a text in an image and an attention target, and output spatial information by the target neural network; wherein the attention target includes at least one of a text in the image, a specified target in the image having a spatial relationship with the text, and attribute information of the image;
the text detection module 630 is configured to correct the candidate text region according to the spatial information, so as to obtain a final selected text region in the image to be detected.
The apparatus of this embodiment may be used to implement the technical solution of the method embodiment shown in fig. 1, and the implementation principle and the technical effect are similar, which are not described herein again.
Further, the spatial relationship modeling module 620 is specifically configured to input the specified information into a first neural network in the target neural network, process the specified information by the first neural network, output a confidence of the candidate text region and a position probability map of a suspected text region in the image to be detected, and determine the confidence of the candidate text region and the position probability map as the spatial information.
Further, the specified information also comprises a feature vector of a specified target which is positioned from the image to be detected and has a spatial relationship with the candidate text region; the spatial relationship modeling module 620 is specifically configured to input the specified information into a second neural network in the target neural network, process the specified information by the second neural network, output the confidence of the candidate text region and the position probability map of the suspected text region in the image to be detected, and determine the confidence of the candidate text region and the position probability map as the spatial information.
Further, the specified information also comprises attribute information of the image to be detected; the spatial relationship modeling module 620 is specifically configured to input the specification information into a third neural network in the target neural network, process the specification information by the third neural network, output the position coordinates after the candidate text region is corrected, and determine the position coordinates after the candidate text region is corrected as the spatial information.
Further, the specified information also comprises a feature vector of a specified target which is positioned from the image to be detected and has a spatial relationship with the candidate text region and attribute information of the image to be detected; the spatial relationship modeling module 620 is specifically configured to input the specified information into a fourth neural network in the target neural network, process the specified information by the fourth neural network, output the confidence of the candidate text region, the corrected position coordinates of the candidate text region, and the position probability map of the suspected text region in the image to be detected, and determine the confidence of the candidate text region, the corrected position coordinates of the candidate text region, and the position probability map as the spatial information.
Further, the processing the specified information includes:
and performing fusion processing on the specified information to obtain fused specified information, and performing weighting processing on the fused specified information.
Further, the text detection module 630 is specifically configured to:
determining a first candidate text region and the confidence of the first candidate text region according to the position probability map;
judging whether the probability corresponding to the candidate text region in the position probability map is smaller than a preset threshold value or not;
if so, deleting the candidate text region, and performing non-maximum suppression processing on the first candidate region according to the confidence coefficient of the first candidate text region to obtain the final selected text region;
if not, determining the candidate text region as a second candidate text region, and performing non-maximum suppression processing on the first candidate region and the second candidate region according to the confidence coefficient of the first candidate text region and the confidence coefficient of the second candidate text region to obtain the final selected text region.
Further, the present application also provides a computer readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of any of the methods provided in the first aspect of the present application.
In particular, computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disk or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks.
With continued reference to fig. 5, the present application further provides a computer device comprising a memory 510, a processor 520 and a computer program stored on the memory 510 and executable on the processor 520, wherein the processor 520 executes the program to perform the steps of any of the methods provided in the first aspect of the present application.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims (10)

1. A text detection method, the method comprising:
acquiring specified information from an image to be detected containing a text; the specified information comprises a characteristic vector of a candidate text region positioned from the image to be detected;
inputting the specified information into a pre-established target neural network for constructing a spatial relationship between a text in the image and an attention target, and outputting spatial information by the target neural network; wherein the attention target includes at least one of a text in the image, a specified target in the image having a spatial relationship with the text, and attribute information of the image;
and correcting the candidate text region according to the spatial information to obtain a final selection text region in the image to be detected.
2. The method of claim 1, wherein the inputting the specified information into a pre-established target neural network for constructing a spatial relationship between text in an image and an object of interest, spatial information being output by the target neural network, comprises:
inputting the specified information into a first neural network in the target neural network, processing the specified information by the first neural network, and outputting the confidence of the candidate text region and a position probability map of the suspected text region in the image to be detected;
determining the confidence of the candidate text region and the position probability map as the spatial information.
3. The method according to claim 1, wherein the specified information further comprises a feature vector of a specified target located in the image to be detected and having a spatial relationship with the candidate text region; inputting the specified information into a pre-established target neural network for constructing a spatial relationship between a text in an image and an attention target, and outputting spatial information by the target neural network, wherein the method comprises the following steps:
inputting the specified information into a second neural network in the target neural network, processing the specified information by the second neural network, and outputting the confidence of the candidate text region and a position probability map of the suspected text region in the image to be detected;
determining the confidence of the candidate text region and the position probability map as the spatial information.
4. The method according to claim 1, wherein the designation information further includes attribute information of the image to be detected; inputting the specified information into a pre-established target neural network for constructing a spatial relationship between a text in an image and an attention target, and outputting spatial information by the target neural network, wherein the method comprises the following steps:
inputting the specified information into a third neural network used in the target neural network, processing the specified information by the third neural network, and outputting the position coordinates of the candidate text region after being corrected;
and determining the position coordinates of the candidate text regions after correction as the spatial information.
5. The method according to claim 1, wherein the specified information further comprises a feature vector of a specified target located from the image to be detected and having a spatial relationship with the candidate text region and attribute information of the image to be detected; inputting the specified information into a pre-established target neural network for constructing a spatial relationship between a text in an image and an attention target, and outputting spatial information by the target neural network, wherein the method comprises the following steps:
inputting the specified information into a fourth neural network in the target neural network, processing the specified information by the fourth neural network, and outputting the confidence of the candidate text region, the corrected position coordinates of the candidate text region and a position probability map of the suspected text region in the image to be detected;
and determining the confidence of the candidate text region, the corrected position coordinates of the candidate text region and the position probability map as the spatial information.
6. The method according to any one of claims 2, 3 and 5, wherein the correcting the candidate text region according to the spatial information to obtain a final text region in the image to be detected comprises:
determining a first candidate text region and the confidence of the first candidate text region according to the position probability map;
judging whether the probability corresponding to the candidate text region in the position probability map is smaller than a preset threshold value or not;
if so, deleting the candidate text region, and performing non-maximum suppression processing on the first candidate region according to the confidence coefficient of the first candidate text region to obtain the final selected text region;
if not, determining the candidate text region as a second candidate text region, and performing non-maximum suppression processing on the first candidate region and the second candidate region according to the confidence coefficient of the first candidate text region and the confidence coefficient of the second candidate text region to obtain the final selected text region.
7. The method of claim 1, wherein the target neural network is pre-established by:
acquiring a training sample set; the training sample set comprises a plurality of pictures;
establishing a standby neural network for constructing a spatial relationship between the text in the image and the designated target; the input of the standby neural network is the designated information, and the output is the spatial information;
and training the standby neural network by adopting the training sample set to obtain the target neural network.
8. A text detection apparatus, comprising an element generation module, a spatial relationship modeling module, and a text detection module, wherein,
the element generation module is used for acquiring specified information from the image to be detected containing the text; the specified information comprises a characteristic vector of a candidate text region positioned from the image to be detected;
the spatial relationship establishing module is used for inputting the specified information into a pre-established target neural network for establishing a spatial relationship between a text in the image and an attention target, and the target neural network outputs spatial information; wherein the attention target includes at least one of a text in the image, a specified target in the image having a spatial relationship with the text, and attribute information of the image;
and the text detection module is used for correcting the candidate text region according to the spatial information to obtain a final selection text region in the image to be detected.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1-8 are implemented when the program is executed by the processor.
CN201910188639.XA 2019-03-13 2019-03-13 Text detection method and device and computer equipment Active CN111695377B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910188639.XA CN111695377B (en) 2019-03-13 2019-03-13 Text detection method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910188639.XA CN111695377B (en) 2019-03-13 2019-03-13 Text detection method and device and computer equipment

Publications (2)

Publication Number Publication Date
CN111695377A true CN111695377A (en) 2020-09-22
CN111695377B CN111695377B (en) 2023-09-29

Family

ID=72475629

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910188639.XA Active CN111695377B (en) 2019-03-13 2019-03-13 Text detection method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN111695377B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330379A (en) * 2020-11-25 2021-02-05 税友软件集团股份有限公司 Invoice content generation method and system, electronic equipment and storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5884296A (en) * 1995-03-13 1999-03-16 Minolta Co., Ltd. Network and image area attribute discriminating device and method for use with said neural network
US20050281455A1 (en) * 2004-06-17 2005-12-22 Chun-Chia Huang System of using neural network to distinguish text and picture in images and method thereof
KR101631694B1 (en) * 2015-08-24 2016-06-21 수원대학교산학협력단 Pedestrian detection method by using the feature of hog-pca and rbfnns pattern classifier
CN106570497A (en) * 2016-10-08 2017-04-19 中国科学院深圳先进技术研究院 Text detection method and device for scene image
CN106980858A (en) * 2017-02-28 2017-07-25 中国科学院信息工程研究所 The language text detection of a kind of language text detection with alignment system and the application system and localization method
CN107886082A (en) * 2017-11-24 2018-04-06 腾讯科技(深圳)有限公司 Mathematical formulae detection method, device, computer equipment and storage medium in image
US20180314715A1 (en) * 2013-05-01 2018-11-01 Cloudsight, Inc. Content Based Image Management and Selection
CN108805131A (en) * 2018-05-22 2018-11-13 北京旷视科技有限公司 Text line detection method, apparatus and system
CN108885699A (en) * 2018-07-11 2018-11-23 深圳前海达闼云端智能科技有限公司 Character identifying method, device, storage medium and electronic equipment
US20180342061A1 (en) * 2016-07-15 2018-11-29 Beijing Sensetime Technology Development Co., Ltd Methods and systems for structured text detection, and non-transitory computer-readable medium
CN109389091A (en) * 2018-10-22 2019-02-26 重庆邮电大学 The character identification system and method combined based on neural network and attention mechanism

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5884296A (en) * 1995-03-13 1999-03-16 Minolta Co., Ltd. Network and image area attribute discriminating device and method for use with said neural network
US20050281455A1 (en) * 2004-06-17 2005-12-22 Chun-Chia Huang System of using neural network to distinguish text and picture in images and method thereof
US20180314715A1 (en) * 2013-05-01 2018-11-01 Cloudsight, Inc. Content Based Image Management and Selection
KR101631694B1 (en) * 2015-08-24 2016-06-21 수원대학교산학협력단 Pedestrian detection method by using the feature of hog-pca and rbfnns pattern classifier
US20180342061A1 (en) * 2016-07-15 2018-11-29 Beijing Sensetime Technology Development Co., Ltd Methods and systems for structured text detection, and non-transitory computer-readable medium
CN106570497A (en) * 2016-10-08 2017-04-19 中国科学院深圳先进技术研究院 Text detection method and device for scene image
CN106980858A (en) * 2017-02-28 2017-07-25 中国科学院信息工程研究所 The language text detection of a kind of language text detection with alignment system and the application system and localization method
CN107886082A (en) * 2017-11-24 2018-04-06 腾讯科技(深圳)有限公司 Mathematical formulae detection method, device, computer equipment and storage medium in image
CN108805131A (en) * 2018-05-22 2018-11-13 北京旷视科技有限公司 Text line detection method, apparatus and system
CN108885699A (en) * 2018-07-11 2018-11-23 深圳前海达闼云端智能科技有限公司 Character identifying method, device, storage medium and electronic equipment
CN109389091A (en) * 2018-10-22 2019-02-26 重庆邮电大学 The character identification system and method combined based on neural network and attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄同城;丁友东;: "基于小波神经网络的图像文本信息提取技术研究", no. 02 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112330379A (en) * 2020-11-25 2021-02-05 税友软件集团股份有限公司 Invoice content generation method and system, electronic equipment and storage medium
CN112330379B (en) * 2020-11-25 2023-10-31 税友软件集团股份有限公司 Invoice content generation method, invoice content generation system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111695377B (en) 2023-09-29

Similar Documents

Publication Publication Date Title
CN108229490B (en) Key point detection method, neural network training method, device and electronic equipment
CN108960211B (en) Multi-target human body posture detection method and system
JP5188334B2 (en) Image processing apparatus, image processing method, and program
JP5714599B2 (en) Fast subspace projection of descriptor patches for image recognition
KR101896357B1 (en) Method, device and program for detecting an object
US11017210B2 (en) Image processing apparatus and method
US20130089260A1 (en) Systems, Methods, and Software Implementing Affine-Invariant Feature Detection Implementing Iterative Searching of an Affine Space
CN110059728B (en) RGB-D image visual saliency detection method based on attention model
US20100290708A1 (en) Image retrieval apparatus, control method for the same, and storage medium
CN111709313B (en) Pedestrian re-identification method based on local and channel combination characteristics
CN111612822B (en) Object tracking method, device, computer equipment and storage medium
CN113269257A (en) Image classification method and device, terminal equipment and storage medium
CN111898428A (en) Unmanned aerial vehicle feature point matching method based on ORB
EP3671635B1 (en) Curvilinear object segmentation with noise priors
CN110020593B (en) Information processing method and device, medium and computing equipment
CN114549861A (en) Target matching method based on feature point and convolution optimization calculation and storage medium
CN113159103B (en) Image matching method, device, electronic equipment and storage medium
CN111695377B (en) Text detection method and device and computer equipment
CN113269752A (en) Image detection method, device terminal equipment and storage medium
CN110969176A (en) License plate sample amplification method and device and computer equipment
CN116704206A (en) Image processing method, device, computer equipment and storage medium
CN116758419A (en) Multi-scale target detection method, device and equipment for remote sensing image
CN116246161A (en) Method and device for identifying target fine type of remote sensing image under guidance of domain knowledge
CN114119970B (en) Target tracking method and device
Wu et al. An accurate feature point matching algorithm for automatic remote sensing image registration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant