CN111695377A - Text detection method and device and computer equipment - Google Patents
Text detection method and device and computer equipment Download PDFInfo
- Publication number
- CN111695377A CN111695377A CN201910188639.XA CN201910188639A CN111695377A CN 111695377 A CN111695377 A CN 111695377A CN 201910188639 A CN201910188639 A CN 201910188639A CN 111695377 A CN111695377 A CN 111695377A
- Authority
- CN
- China
- Prior art keywords
- image
- neural network
- text region
- text
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
- G06V10/464—Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The application provides a text detection method and device and computer equipment. The text detection method provided by the application comprises the following steps: acquiring appointed information from an image to be detected containing a text, inputting the appointed information into a pre-established target neural network for constructing a spatial relationship between the text in the image and a concerned target, outputting spatial information by the target neural network, and correcting the candidate text region according to the spatial information to obtain a final selected text region in the image to be detected. The specific information comprises a feature vector of a candidate text region positioned from the image to be detected, and the attention target comprises at least one of a text in the image, a specific target which has a spatial relationship with the text in the image and attribute information of the image. The text detection method, the text detection device and the computer equipment can accurately position the text area in the image to be detected.
Description
Technical Field
The present application relates to the field of image detection, and in particular, to a text detection method and apparatus, and a computer device.
Background
With the widespread use of image acquisition devices, image detection techniques based on image content are receiving more and more attention. Among contents included in an image, text information is more easily understood, and thus a character recognition technology has received a great deal of attention.
The text recognition technology mainly comprises text detection and character recognition. The text detection means positioning a text region from an image to be detected; the character recognition means recognizing a text region and outputting text information. According to the text detection method disclosed by the related technology, a large number of anchor points are established, anchor points close to the text are screened out through a related algorithm, and the offset between the anchor points and the text is regressed, so that a text region is obtained. The method only carries out text detection through the fixed receptive field, and is low in accuracy.
Disclosure of Invention
In view of this, the present application provides a text detection method, a text detection device, and a computer device, so as to provide a text detection method with higher accuracy.
A first aspect of the present application provides a text detection method, including:
acquiring specified information from an image to be detected containing a text; the specified information comprises a characteristic vector of a candidate text region positioned from the image to be detected;
inputting the specified information into a pre-established target neural network for constructing a spatial relationship between a text in the image and an attention target, and outputting spatial information by the target neural network; wherein the attention target includes at least one of a text in the image, a specified target in the image having a spatial relationship with the text, and attribute information of the image;
and correcting the candidate text region according to the spatial information to obtain a final selection text region in the image to be detected.
A second aspect of the present application provides a text detection apparatus comprising an element generation module, a spatial relationship modeling module, and a text detection module, wherein,
the element generation module is used for acquiring specified information from the image to be detected containing the text; the specified information comprises a characteristic vector of a candidate text region positioned from the image to be detected;
the spatial relationship establishing module is used for inputting the specified information into a pre-established target neural network for establishing a spatial relationship between a text in the image and an attention target, and the target neural network outputs spatial information; wherein the attention target includes at least one of a text in the image, a specified target in the image having a spatial relationship with the text, and attribute information of the image;
and the text detection module is used for correcting the candidate text region according to the spatial information to obtain a final selection text region in the image to be detected.
A third aspect of the present application provides a computer storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of any of the methods provided by the first aspect of the present application.
A fourth aspect of the present application provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods provided in the first aspect of the present application when executing the program.
The application provides a text detection method, a text detection device and a computer device, through obtaining the appointed information from the image to be detected containing the text, will the appointed information is input to the target neural network that is used for constructing the spatial relationship between the text in the image and the concerned target that is established in advance, by the target neural network outputs spatial information, the basis spatial information is right candidate text region is rectified, obtains the final selection text region in the image to be detected. The specific information comprises a feature vector of a candidate text region positioned from the image to be detected, and the attention target comprises at least one of a text in the image, a specific target which has a spatial relationship with the text in the image and attribute information of the image. Therefore, the spatial relationship between the text and the target is fully considered, the spatial information is fully utilized to position the final selected text area, missing detection and false detection can be avoided, and the accuracy of text detection can be improved.
Drawings
Fig. 1 is a flowchart of a first embodiment of a text detection method provided in the present application;
fig. 2 is a flowchart of a second embodiment of a text detection method provided in the present application;
fig. 3 is a flowchart of a third embodiment of a text detection method provided in the present application;
fig. 4 is a flowchart of a fourth embodiment of a text detection method provided in the present application;
fig. 5 is a hardware structure diagram of a computer device in which a text detection apparatus according to an exemplary embodiment of the present application is located;
fig. 6 is a schematic structural diagram of a first embodiment of a text detection apparatus provided in the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The application provides a text detection method, a text detection device and computer equipment, and aims to provide a text detection method with high accuracy.
In the following, specific examples are given to describe the technical solutions of the present application in detail. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 1 is a flowchart of a first embodiment of a text detection method provided in the present application. Referring to fig. 1, the method provided in this embodiment may include:
s101, acquiring specified information from an image to be detected containing a text, wherein the specified information comprises a feature vector of a candidate text region positioned from the image to be detected.
Specifically, the specific implementation process of this step may include:
(1) extracting the features of the image to be detected to obtain a feature map;
(2) and acquiring the specified information from the characteristic diagram.
Specifically, the feature extraction may be performed on the image to be detected by using a conventional method. For example, a Scale-invariant Feature Transform (SIFT) algorithm is used to extract features of an image to be detected. Of course, a neural network may also be used to perform feature extraction on the image to be detected, for example, in an embodiment, a specific implementation process of this step may include:
inputting the image to be detected into a neural network for feature extraction, and extracting the features of the image to be detected by a specified layer in the neural network; the designated layer comprises a convolutional layer, or the designated layer comprises a convolutional layer and at least one of a pooling layer and a fully-connected layer; and determining the output result of the specified layer as the feature map.
Specifically, the neural network for feature extraction may include a convolutional layer, and the convolutional layer is configured to perform filtering processing on an input image to be detected. Further, at this time, the filtering result output by the convolutional layer is the feature map extracted. In addition, the neural network for feature extraction may further include a pooling layer and/or a fully connected layer. For example, in an embodiment, the neural network for feature extraction includes a convolutional layer, a pooling layer, and a full-link layer, where the convolutional layer is configured to perform filtering processing on an input image to be detected; the pooling layer is used for compressing the filtering result; and the full connection layer is used for carrying out aggregation processing on the compression processing result. Further, at this time, the aggregation processing result output by the full connection layer is the extracted feature map.
Further, in an embodiment, the specifying information includes a feature vector of a candidate text located from the image to be detected; in another embodiment, the specifying information includes a feature vector of a candidate text located from the image to be detected, and at least one of a feature vector of a specified target located from the image to be detected and having a spatial relationship with the candidate text region and attribute information of the image to be detected. The following description will take as an example that the specifying information includes a feature vector of a candidate text located from an image to be detected, a feature vector of a specified target located from an image to be detected and having a spatial relationship with the candidate text region, and attribute information of the image to be detected.
The designated target may be a hidden variable or a custom target related to the task. For example, if the vehicle where the license plate is located has no calibration of the vehicle, a hidden variable can be set; if the vehicle is calibrated, the vehicle can be regarded as a user-defined specified target and can be used for assisting the detection of the license plate.
In addition, the attribute information of the image to be detected may include a deformation attribute, a color attribute, a font attribute, a texture attribute, a perspective attribute, and the like of the image to be detected. In the present embodiment, this is not limited. The following description will take the example that the attribute information of the image to be detected includes the deformation attribute of the image to be detected. It should be noted that, when the attribute information of the image to be detected is the deformation attribute of the image to be detected, the deformation attribute may be represented by the rotation angle θ of each pixel point in the image to be detected.
Specifically, in an embodiment, the process of obtaining the feature vector of the candidate text region from the feature map may include:
(1) inputting the feature map into a neural network for information extraction, carrying out convolution processing on the feature map by a first convolution layer in the neural network to obtain a convolution processing result, processing the convolution processing result by a softmax layer of the neural network, and outputting the probability that each pixel point in the image to be detected belongs to a text; and carrying out convolution processing on the characteristic graph by a second convolution layer in the neural network, and outputting the deviation of each pixel point distance text in the image to be detected.
For example, in one embodiment, the size of the image to be detected is 9 × 9, the dimension of the obtained feature map is 9 × 256, and the dimension of the first convolution layer of the neural network used for information extraction is 1 × 2, so that the dimension of the convolution processing result output by the first convolution layer is 9 × 2. Further, after the convolution processing result is processed by the softmax layer, the dimensionality of the output processing result is 9 × 1, and the probabilities that all pixel points in the image to be detected belong to the text are represented respectively. For another example, in this embodiment, the dimension of the second convolution layer in the neural network for information extraction is 1 × 8, so that after the second convolution layer performs convolution processing on the feature map, the dimension of the output convolution processing result is 9 × 8, and represents the deviation of each pixel point in the image to be detected from the text. In this example, the deviation of a pixel from the text is characterized by the deviation of the pixel from the four corners of the text (corresponding to 8 channels of the convolution processing result).
(2) And positioning a candidate text region in the image to be detected according to the probability of the text of each pixel point in the image to be detected and the deviation of each pixel point in the image to be detected from the text to obtain the characteristic vector of the candidate text region.
Specifically, based on the probability that each pixel point in the image to be detected belongs to the text and the deviation of each pixel point in the image to be detected from the text, the candidate text region in the image to be detected can be positioned, and then the feature vector of the candidate text region is obtained. Further, the feature vector of the candidate text region includes a coordinate of a center point of the candidate text region, a width value, a height value, an angle, a feature vector corresponding to the candidate text region in the feature map, and a confidence of the candidate text region (where the confidence of the candidate text region may be equal to an average value of probabilities that target pixel points located in the candidate text region in the image to be detected belong to the text). It should be noted that, for the specific implementation principle and implementation process related to locating the candidate text region in the image to be detected according to the probability that each pixel point in the image to be detected belongs to the text and the deviation of each pixel point in the image to be detected from the text, to obtain the feature vector of the candidate text region, reference may be made to the description in the related art, and details are not repeated here.
In addition, in another embodiment, the process of obtaining a feature vector of a specific target having a spatial relationship with the candidate text region from the feature map may include:
(1) inputting the feature map into a neural network for information extraction, performing convolution processing on the feature map by a first convolution layer in the neural network to obtain a convolution processing result, processing the convolution processing result by a softmax layer of the neural network, and outputting the probability that each pixel point in the image to be detected belongs to the specified target; and carrying out convolution processing on the characteristic graph by a second convolution layer in the neural network, and outputting the deviation of each pixel point in the image to be detected from the specified target.
(2) And positioning the appointed target in the image to be detected according to the probability that each pixel point in the image to be detected belongs to the appointed target and the deviation of each pixel point in the image to be detected from the appointed target to obtain the characteristic vector of the appointed target.
Specifically, the specific implementation process and implementation principle of obtaining the feature vector of the designated target are similar to the specific implementation process and implementation principle of obtaining the feature vector of the candidate text region, and are not described herein again.
Further, in another embodiment, when the attribute information of the image to be detected is deformation information of the image to be detected, and the deformation information is represented by a rotation angle θ of each pixel point in the image to be detected, the process of extracting the attribute information of the image to be detected from the feature map may include:
(1) inputting the characteristic diagram into a neural network for information extraction, carrying out convolution processing on the characteristic diagram by a convolution layer in the neural network to obtain a convolution processing result, carrying out normalization processing on the convolution processing result by a softmax layer in the neural network to obtain a normalization processing result, and converting the normalization processing result into a rotation angle theta of each pixel point in the image to be detected by a bias layer of the neural network.
Specifically, for example, the size of the image to be detected is 9 × 9, the dimension of the obtained feature map is 9 × 256, and the dimension of the convolution layer of the neural network used for information extraction is 1 × 2, so that after the convolution processing, the dimension of the obtained convolution processing result is 9 × 2. In addition, the softmax layer normalizes the convolution processing result to obtain a normalized processing result with a dimension of 9 × 1. Further, after the bias layer converts the normalization processing result, the dimension of the obtained conversion result is 9 × 1, and the rotation angle θ of each pixel point in the image to be detected is respectively represented.
It should be noted that the process of acquiring other attribute information is similar to the above process, and is not described herein again.
S102, inputting the specified information into a pre-established target neural network for constructing a spatial relationship between a text in an image and a target of interest, and outputting spatial information by the target neural network; wherein the attention target includes at least one of a text in the image, a designated target in the image having a spatial relationship with the text, and attribute information of the image.
S103, correcting the candidate text area according to the spatial information to obtain a final selected text area in the image to be detected.
Specifically, the spatial information can be obtained by using the target neural network using the above-mentioned specifying information.
For example, in an embodiment, when the specifying information includes a feature vector of the candidate text region, the specific implementation procedure of step S102 may include:
(1) and inputting the specified information into a first neural network in the target neural network, and processing the specified information by the first neural network to obtain the confidence of the candidate text region and a position probability map of the suspected text region in the image to be detected.
Specifically, the first neural network is used for constructing a spatial relationship between a text and the text, the input of the neural network is a feature vector of a candidate text region, and the output is a confidence coefficient of the candidate text region and a position probability map of a suspected text region in an image to be detected. For example, in one embodiment, the feature vector of the candidate text region 1 and the feature vector of the candidate text region 2 are input, and the position probability map of the suspected text region 3 is output. It should be noted that the position probability map of the suspected text region 3 refers to the position probability distribution of the next possible text under the condition that all the current candidate text regions exist, and includes the probability that each pixel in the image to be detected is the suspected text region and the deviation of each pixel from the suspected text region.
Specifically, after step S101, a feature vector of at least one candidate text region may be extracted. For example, in one embodiment, the feature vectors of the candidate text regions 1 and the feature vectors of the candidate text regions 2 are extracted. In this step, the feature vector of the candidate text region 1 and the feature vector of the candidate text region 2 are input to the first neural network, and then the concat layer of the first neural network performs fusion processing on the feature vectors to obtain fused specified information, and the full-link layer of the first neural network performs weighting processing on the fused specified information to obtain the confidence of the candidate text region and the position probability map of the suspected text region in the image to be detected.
It should be noted that, for the specific implementation principle and implementation procedure of the fusion process and the weighting process, reference may be made to the description in the related art, and details are not described here.
With reference to the foregoing example, for example, in an embodiment, the size of the image to be detected is 9 × 9, the dimension of the feature vector of the candidate text region 1 is n, the dimension of the feature vector of the candidate text region 2 is n, after the fusion processing, the dimension of the fused specified information is 2n, and the dimension of the full connection coefficient (the network parameter learned in advance) of the full connection layer of the first neural network is 2n (1+1+9 × 9), at this time, after the weighting processing, the dimension of the weighting processing result is 1+1+9 × 9, where the first two dimensions represent the confidence degrees of the candidate text region 1 and the candidate text region 2, the 3 rd to 11 th dimensions represent the probability that each pixel point in the image to be detected is the text suspected region, and the last 8 × 9 dimensions represent the deviation of each pixel point in the image to be detected from the text suspected region (the deviation of each pixel point from the text suspected region is the distance from the text region to be detected) The deviation of four corner points of the region is characterized, that is, the deviation of each point from the suspected region of the text has 8 dimensions). It should be noted that the combination of the probability that each pixel in the image to be detected is a suspected text region and the deviation of each pixel from the suspected text region is a position probability map of the suspected text region in the image to be detected.
(2) Determining the confidence of the candidate text region and the position probability map as the spatial information.
Further, in this embodiment, after obtaining the spatial information in step S102, in step S103, the final text region may be determined according to the following method, where the method includes:
(1) and determining a first candidate text region and the confidence of the first candidate text region according to the position probability map.
Specifically, the specific implementation process of this step may include: in the position probability map, a first target pixel point with a probability (see the above introduction, where the probability refers to the probability that each pixel point in an image to be detected belongs to a text) greater than a first preset threshold is searched, a second target pixel point with a probability greater than a second preset threshold is searched in a specified field of the first target pixel point, the second target pixel point is determined as a pixel point for constructing a first candidate text region, and the first candidate text region constructed based on the second target pixel point is determined according to a deviation of the second target pixel point from the text in the position probability map (a specific implementation process related to this step may refer to descriptions in related technologies, and details are not repeated here).
Further, in an embodiment, an average value of probabilities that each pixel point in the first candidate text region belongs to the text may be determined as the confidence of the first candidate text region.
It should be noted that the first preset threshold is greater than the second preset threshold, and specific values of the first preset threshold and the second preset threshold are set according to actual needs, and in this embodiment, the specific values are not limited. For example, in one embodiment, the first predetermined threshold is 0.7, and the second predetermined threshold is 0.5.
(2) And judging whether the probability corresponding to the candidate text region in the position probability map is smaller than a preset threshold value.
Specifically, according to the feature vector of the candidate text region, the position coordinates of the candidate text region can be known, and then the position of the candidate text region in the position probability map is known, so that the probability corresponding to the candidate text region in the position probability map is obtained. For example, in an embodiment, an average value of probabilities that all pixel points in the candidate text region in the position probability map belong to the text is determined as a probability corresponding to the candidate text region.
The preset threshold is set according to actual needs, and in the present embodiment, the preset threshold is not limited thereto. For example, in one embodiment, the predetermined threshold may be 0.3.
(3) If so, deleting the candidate text region, and performing non-maximum suppression processing on the first candidate region according to the confidence coefficient of the first candidate text region to obtain the final selected text region;
(4) if not, determining the candidate text region as a second candidate text region, and performing non-maximum suppression processing on the first candidate region and the second candidate region according to the confidence coefficient of the first candidate text region and the confidence coefficient of the second candidate text region to obtain the final selected text region.
Specifically, the specific implementation principle and implementation step of the non-maximum suppression processing may be referred to in the description of the related art, and are not described herein again.
In the method provided by the embodiment, the specified information is acquired from the image to be detected containing the text, the specified information is input into a pre-established target neural network for constructing the spatial relationship between the text in the image and the attention target, the target neural network outputs the spatial information, and the candidate text region is corrected according to the spatial information to obtain the final selected text region in the image to be detected. The specific information comprises a feature vector of a candidate text region positioned from the image to be detected, and the attention target comprises at least one of a text in the image, a specific target which has a spatial relationship with the text in the image and attribute information of the image. Therefore, the spatial relationship between the text and the target is fully considered, the spatial information is fully utilized to position the final selected text area, and the accuracy can be improved.
In the following, some more specific examples are given to describe the technical solutions provided in the present application in detail.
Fig. 2 is a flowchart of a second embodiment of a text detection method provided in the present application. In the method provided by this embodiment, the specifying information further includes a feature vector of a specified target located in the image to be detected and having a spatial relationship with the candidate text region, and step S102 may include:
s201, inputting the specified information into a second neural network in the target neural network, processing the specified information by the second neural network, and outputting the confidence of the candidate text region and the position probability map of the text suspected region in the image to be detected.
Specifically, for a specific implementation method and an implementation principle related to extracting a feature vector of a specified target, reference may be made to the description in the foregoing embodiments, and details are not described here.
In addition, the second neural network is used for constructing a spatial relationship between the text and the specified target, and the input of the neural network can be a feature vector of the candidate text region and a feature vector of the specified target, and the output can be confidence of the candidate text region and a position probability map of the text suspected region in the image to be detected. For example, in one embodiment, the feature vector of the candidate text region 1 and the feature vector of the designated target are input, and the confidence of the candidate text region 1 and the position probability map of the suspected text region 2 are output.
Further, the processing of the specified information by the second neural network may include: and performing fusion processing on the specified information to obtain fused specified information, and performing weighting processing on the fused specified information. For example, in an embodiment, the second neural network may include a concat layer and a full connection layer, where the concat layer is configured to perform fusion processing on the specifying information to obtain fused specifying information, and the full connection layer is configured to perform weighting processing on the fused specifying information.
It should be noted that, through the second neural network, text regions that are missed to be detected can be searched, text targets are prevented from being lost, false detection texts are screened out, and accuracy of text detection can be improved. For example, a license plate is generally placed on a vehicle body, a shop name is generally in a door shop range, and the like, so that a false-check text which does not satisfy a certain positional relationship with a specified target can be screened out by training and learning the positional relationship between the specified target and the text.
And S202, determining the confidence of the candidate text region and the position probability map as the spatial information.
Specifically, in this embodiment, after obtaining the spatial information, in step S103, the final text region may be determined according to the method described in the above embodiment, and details are not repeated here.
According to the method provided by the embodiment, the spatial relationship between the text and the specified target can be constructed through the second neural network, so that the spatial information is obtained. Therefore, the final text area is determined based on the obtained spatial information, the text area which is missed to be detected can be searched, the text target is prevented from being lost, the false detection area is screened out, and the accuracy of text detection can be improved.
Fig. 3 is a flowchart of a third embodiment of a text detection method provided in the present application. Referring to fig. 3, in the method provided in this embodiment, the specifying information further includes attribute information of the image to be detected. Step S102 may include:
s301, inputting the specification information into a third neural network in the target neural network, processing the specification information by the third neural network, and outputting the corrected position coordinates of the candidate text regions.
Specifically, the third neural network is used for constructing a spatial relationship between the text and the attribute information of the image to be detected, and the input of the third neural network may be a feature vector of the candidate text region and the attribute information of the image to be detected, and the output may be a corrected position coordinate of the candidate text region. For example, in an embodiment, the feature vector of the candidate text region 1 and the rotation angle θ of each pixel point in the image to be detected are input, and the corrected position coordinates of the candidate text region 1 are output.
It should be noted that, the process of processing the specified information by the third neural network may include: the neural network carries out fusion processing on the specified information to obtain fused specified information, and carries out weighting processing on the fused specified information. For example, in an embodiment, the third neural network may include a concat layer and a full connection layer, where the concat layer is configured to perform fusion processing on the specifying information to obtain fused specifying information, and the full connection layer is configured to perform weighting processing on the fused specifying information.
For example, in an embodiment, the dimension of the feature vector of the candidate text region 1 is n, the dimension of the attribute information of the image to be detected is 1, and at this time, the dimension of the fused specified information is n + 1. Further, the dimension of the full-link coefficient of the full-link layer of the third neural network is (n +1) × 8, so that the dimension of the weighting processing result obtained after the weighting processing is 8, and the corrected position coordinates of the candidate text region 1 (the position coordinates are represented by the coordinates of the four corner points of the candidate text region, and thus are 8 dimensions) are represented. It should be noted that, for the specific implementation procedures and implementation principles of the fusion process and the weighting process, reference may be made to descriptions in the related art, and details are not described here.
S302, determining the corrected position coordinates of the text candidate region as the spatial information.
Accordingly, in this embodiment, when the spatial information is the position coordinates of the text region candidate after correction, in step S103, the position of the text region candidate may be finely adjusted based on the position coordinates of the text region candidate after correction, so as to obtain the final text region. For example, in one embodiment, the final text region may be determined directly based on the revised position coordinates of the candidate text region. For another example, in another embodiment, the final selected text region may also be determined based on the corrected position coordinates of the candidate text region and the initial position coordinates of the candidate text region determined in step S101 (the initial position coordinates of the candidate text region may be determined based on the probability that each pixel point in the to-be-detected image belongs to the text and the deviation of each pixel point in the to-be-detected image from the text). For example, in one embodiment, the final text region is determined based on the corrected position coordinates and an average of the initial position coordinates.
According to the method provided by the embodiment, the spatial information is obtained by constructing the spatial relationship between the text and the attribute information of the image to be detected, and the final selected text region is determined according to the spatial information, so that the fine adjustment of the position of the text region can be realized, and the accuracy of text detection is improved.
Fig. 4 is a flowchart of a fourth embodiment of a text detection method provided in the present application. Referring to fig. 4, in the method provided in this embodiment, the specifying information further includes a feature vector of a specified target located in the image to be detected and having a spatial relationship with the candidate text region, and attribute information of the image to be detected; step S102, comprising:
and S401, inputting the specified information into a fourth neural network in the target neural network, processing the specified information by the fourth neural network, and outputting the confidence of the candidate text region, the corrected position coordinates of the candidate text region and the position probability map of the text suspected region in the image to be detected.
The method for acquiring the feature vector of the designated target and the attribute information of the image to be detected can be referred to the description in the previous embodiment, and will not be described herein again.
Specifically, the fourth neural network is used for constructing a spatial relationship among the text, the designated target and the attribute information of the image to be detected, the input of the fourth neural network can be the characteristic vector of the candidate text region, the characteristic vector of the specified target and the attribute information of the image to be detected, the output is the confidence coefficient of the candidate text region, the position coordinates of the candidate text region after correction and the position probability map of the text suspected region in the image to be detected, for example, in one embodiment, the feature vector of the candidate text region 1, the feature vector of the candidate text region 2, the feature vector of the designated target, and the rotation angle θ of each pixel point in the image to be detected are input, and the confidence coefficient and the corrected position coordinate of the candidate text region 1, the confidence coefficient and the corrected position coordinate of the candidate text region 2, and the position probability map of the suspected text region 3 are output.
It should be noted that, in an embodiment, the fourth neural network may include a concat layer and a full connection layer, where the concat layer is configured to perform fusion processing on the specifying information to obtain fused specifying information, and the full connection layer is configured to perform weighting processing on the fused specifying information, and output a confidence of the candidate text region, a corrected position coordinate of the candidate text region, and a position probability map of a text suspected region in the image to be detected.
With reference to the above example, for example, in an embodiment, the size of the image to be detected is 9 × 9, the dimension of the feature vector of the candidate text region 1 is n, the dimension of the feature vector of the candidate text region 2 is n, the dimension of the feature vector of the designated target is n, the dimension of the attribute information of the image to be detected is 1, and the dimension of the fused designated information is 3n + 1. In this example, the dimension of the full-connection coefficient of the full-connection layer is 9+9+9 × 9, after the weighting processing, the dimension of the weighting processing result is 9+9+9 × 9, wherein the first 9 dimensions represent the confidence coefficient and the corrected position coordinate (8 dimensions) of the candidate text region 1, the middle 9 dimensions represent the confidence coefficient and the corrected position coordinate of the candidate text region 2, and finally the 9-9 dimensions represent the position probability map of the suspected text region in the image to be detected, that is, represent the probability that each pixel in the image to be detected belongs to the text and the deviation of each pixel from the text.
And S402, determining the confidence of the candidate text region, the corrected position coordinates of the candidate text region and the position probability map as the spatial information.
Specifically, in this implementation, the spatial information includes the confidence of the candidate text region, the position coordinates of the candidate text region after correction, and the position probability map of the suspected text region in the image to be detected, and in step S103, the final selected text region may be determined based on the same method as in the first embodiment, where the only difference from the first embodiment is that: when judging whether the probability corresponding to the candidate text region in the position probability map is smaller than a preset threshold value, at this time, the position of the candidate text region in the position probability map can be obtained according to the position coordinates of the candidate text region after correction, and then the probability corresponding to the candidate text region in the position probability map is obtained. Alternatively, when it is determined whether the probability corresponding to the candidate text region in the position probability map is smaller than the preset threshold, at this time, the candidate text region may be refined based on the corrected position coordinate and the initial position coordinate (the initial position coordinate is referred to the above description) of the candidate text region, so as to obtain an adjusted position coordinate (for example, in an embodiment, the adjusted position coordinate is equal to an average value of the corrected position coordinate and the initial position coordinate), and then, based on the adjusted position coordinate, the position of the candidate text region in the position probability map is determined, so as to obtain the probability corresponding to the candidate text region in the position probability map. In addition, when the non-maximum value suppression processing is performed, the processing is performed in accordance with the position coordinates after the adjustment of the candidate text region.
According to the method provided by the embodiment, through the fourth neural network, the spatial relationship among the text, the designated target and the attribute information of the image to be detected can be constructed, and the spatial information can be obtained. Therefore, the final selected text area is determined based on the obtained spatial information, so that the text area which is missed to be detected can be searched, the text target is prevented from being lost, the position of the text area can be finely adjusted, and the accuracy of text detection can be improved.
Specifically, the target neural network is pre-established by the following method:
acquiring a training sample set; the training sample set comprises a plurality of pictures;
establishing a standby neural network for constructing a spatial relationship between the text in the image and the designated target; the input of the standby neural network is the designated information, and the output is the spatial information;
and training the standby neural network by adopting the training sample set to obtain the target neural network.
Corresponding to the embodiment of the text detection method, the application also provides an embodiment of a text detection device.
The embodiment of the text detection device can be applied to computer equipment. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and is formed by reading corresponding computer program instructions in the memory into the memory for operation through the processor of the computer device where the software implementation is located as a logical means. From a hardware aspect, as shown in fig. 5, a hardware structure diagram of a computer device where a text detection apparatus is located is shown in an exemplary embodiment of the present application, except for the storage 510, the processor 520, the memory 530, and the network interface 540 shown in fig. 5, the computer device where the apparatus is located in the embodiment may also include other hardware according to an actual function of the text detection apparatus, which is not described again.
Fig. 6 is a schematic structural diagram of a first embodiment of a text detection apparatus provided in the present application. Referring to fig. 6, the text detection apparatus provided in the present application may include an element generation module 610, a spatial relationship modeling module 620, and a text detection module 630, wherein,
the extracting module 610 is configured to obtain specified information from an image to be detected that includes a text; the specified information comprises a characteristic vector of a candidate text region positioned from the image to be detected;
the spatial relationship modeling module 620 is configured to input the specific information into a pre-established target neural network for constructing a spatial relationship between a text in an image and an attention target, and output spatial information by the target neural network; wherein the attention target includes at least one of a text in the image, a specified target in the image having a spatial relationship with the text, and attribute information of the image;
the text detection module 630 is configured to correct the candidate text region according to the spatial information, so as to obtain a final selected text region in the image to be detected.
The apparatus of this embodiment may be used to implement the technical solution of the method embodiment shown in fig. 1, and the implementation principle and the technical effect are similar, which are not described herein again.
Further, the spatial relationship modeling module 620 is specifically configured to input the specified information into a first neural network in the target neural network, process the specified information by the first neural network, output a confidence of the candidate text region and a position probability map of a suspected text region in the image to be detected, and determine the confidence of the candidate text region and the position probability map as the spatial information.
Further, the specified information also comprises a feature vector of a specified target which is positioned from the image to be detected and has a spatial relationship with the candidate text region; the spatial relationship modeling module 620 is specifically configured to input the specified information into a second neural network in the target neural network, process the specified information by the second neural network, output the confidence of the candidate text region and the position probability map of the suspected text region in the image to be detected, and determine the confidence of the candidate text region and the position probability map as the spatial information.
Further, the specified information also comprises attribute information of the image to be detected; the spatial relationship modeling module 620 is specifically configured to input the specification information into a third neural network in the target neural network, process the specification information by the third neural network, output the position coordinates after the candidate text region is corrected, and determine the position coordinates after the candidate text region is corrected as the spatial information.
Further, the specified information also comprises a feature vector of a specified target which is positioned from the image to be detected and has a spatial relationship with the candidate text region and attribute information of the image to be detected; the spatial relationship modeling module 620 is specifically configured to input the specified information into a fourth neural network in the target neural network, process the specified information by the fourth neural network, output the confidence of the candidate text region, the corrected position coordinates of the candidate text region, and the position probability map of the suspected text region in the image to be detected, and determine the confidence of the candidate text region, the corrected position coordinates of the candidate text region, and the position probability map as the spatial information.
Further, the processing the specified information includes:
and performing fusion processing on the specified information to obtain fused specified information, and performing weighting processing on the fused specified information.
Further, the text detection module 630 is specifically configured to:
determining a first candidate text region and the confidence of the first candidate text region according to the position probability map;
judging whether the probability corresponding to the candidate text region in the position probability map is smaller than a preset threshold value or not;
if so, deleting the candidate text region, and performing non-maximum suppression processing on the first candidate region according to the confidence coefficient of the first candidate text region to obtain the final selected text region;
if not, determining the candidate text region as a second candidate text region, and performing non-maximum suppression processing on the first candidate region and the second candidate region according to the confidence coefficient of the first candidate text region and the confidence coefficient of the second candidate text region to obtain the final selected text region.
Further, the present application also provides a computer readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of any of the methods provided in the first aspect of the present application.
In particular, computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disk or removable disks), magneto-optical disks, and CD ROM and DVD-ROM disks.
With continued reference to fig. 5, the present application further provides a computer device comprising a memory 510, a processor 520 and a computer program stored on the memory 510 and executable on the processor 520, wherein the processor 520 executes the program to perform the steps of any of the methods provided in the first aspect of the present application.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.
Claims (10)
1. A text detection method, the method comprising:
acquiring specified information from an image to be detected containing a text; the specified information comprises a characteristic vector of a candidate text region positioned from the image to be detected;
inputting the specified information into a pre-established target neural network for constructing a spatial relationship between a text in the image and an attention target, and outputting spatial information by the target neural network; wherein the attention target includes at least one of a text in the image, a specified target in the image having a spatial relationship with the text, and attribute information of the image;
and correcting the candidate text region according to the spatial information to obtain a final selection text region in the image to be detected.
2. The method of claim 1, wherein the inputting the specified information into a pre-established target neural network for constructing a spatial relationship between text in an image and an object of interest, spatial information being output by the target neural network, comprises:
inputting the specified information into a first neural network in the target neural network, processing the specified information by the first neural network, and outputting the confidence of the candidate text region and a position probability map of the suspected text region in the image to be detected;
determining the confidence of the candidate text region and the position probability map as the spatial information.
3. The method according to claim 1, wherein the specified information further comprises a feature vector of a specified target located in the image to be detected and having a spatial relationship with the candidate text region; inputting the specified information into a pre-established target neural network for constructing a spatial relationship between a text in an image and an attention target, and outputting spatial information by the target neural network, wherein the method comprises the following steps:
inputting the specified information into a second neural network in the target neural network, processing the specified information by the second neural network, and outputting the confidence of the candidate text region and a position probability map of the suspected text region in the image to be detected;
determining the confidence of the candidate text region and the position probability map as the spatial information.
4. The method according to claim 1, wherein the designation information further includes attribute information of the image to be detected; inputting the specified information into a pre-established target neural network for constructing a spatial relationship between a text in an image and an attention target, and outputting spatial information by the target neural network, wherein the method comprises the following steps:
inputting the specified information into a third neural network used in the target neural network, processing the specified information by the third neural network, and outputting the position coordinates of the candidate text region after being corrected;
and determining the position coordinates of the candidate text regions after correction as the spatial information.
5. The method according to claim 1, wherein the specified information further comprises a feature vector of a specified target located from the image to be detected and having a spatial relationship with the candidate text region and attribute information of the image to be detected; inputting the specified information into a pre-established target neural network for constructing a spatial relationship between a text in an image and an attention target, and outputting spatial information by the target neural network, wherein the method comprises the following steps:
inputting the specified information into a fourth neural network in the target neural network, processing the specified information by the fourth neural network, and outputting the confidence of the candidate text region, the corrected position coordinates of the candidate text region and a position probability map of the suspected text region in the image to be detected;
and determining the confidence of the candidate text region, the corrected position coordinates of the candidate text region and the position probability map as the spatial information.
6. The method according to any one of claims 2, 3 and 5, wherein the correcting the candidate text region according to the spatial information to obtain a final text region in the image to be detected comprises:
determining a first candidate text region and the confidence of the first candidate text region according to the position probability map;
judging whether the probability corresponding to the candidate text region in the position probability map is smaller than a preset threshold value or not;
if so, deleting the candidate text region, and performing non-maximum suppression processing on the first candidate region according to the confidence coefficient of the first candidate text region to obtain the final selected text region;
if not, determining the candidate text region as a second candidate text region, and performing non-maximum suppression processing on the first candidate region and the second candidate region according to the confidence coefficient of the first candidate text region and the confidence coefficient of the second candidate text region to obtain the final selected text region.
7. The method of claim 1, wherein the target neural network is pre-established by:
acquiring a training sample set; the training sample set comprises a plurality of pictures;
establishing a standby neural network for constructing a spatial relationship between the text in the image and the designated target; the input of the standby neural network is the designated information, and the output is the spatial information;
and training the standby neural network by adopting the training sample set to obtain the target neural network.
8. A text detection apparatus, comprising an element generation module, a spatial relationship modeling module, and a text detection module, wherein,
the element generation module is used for acquiring specified information from the image to be detected containing the text; the specified information comprises a characteristic vector of a candidate text region positioned from the image to be detected;
the spatial relationship establishing module is used for inputting the specified information into a pre-established target neural network for establishing a spatial relationship between a text in the image and an attention target, and the target neural network outputs spatial information; wherein the attention target includes at least one of a text in the image, a specified target in the image having a spatial relationship with the text, and attribute information of the image;
and the text detection module is used for correcting the candidate text region according to the spatial information to obtain a final selection text region in the image to be detected.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1-8 are implemented when the program is executed by the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910188639.XA CN111695377B (en) | 2019-03-13 | 2019-03-13 | Text detection method and device and computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910188639.XA CN111695377B (en) | 2019-03-13 | 2019-03-13 | Text detection method and device and computer equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111695377A true CN111695377A (en) | 2020-09-22 |
CN111695377B CN111695377B (en) | 2023-09-29 |
Family
ID=72475629
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910188639.XA Active CN111695377B (en) | 2019-03-13 | 2019-03-13 | Text detection method and device and computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111695377B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112330379A (en) * | 2020-11-25 | 2021-02-05 | 税友软件集团股份有限公司 | Invoice content generation method and system, electronic equipment and storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5884296A (en) * | 1995-03-13 | 1999-03-16 | Minolta Co., Ltd. | Network and image area attribute discriminating device and method for use with said neural network |
US20050281455A1 (en) * | 2004-06-17 | 2005-12-22 | Chun-Chia Huang | System of using neural network to distinguish text and picture in images and method thereof |
KR101631694B1 (en) * | 2015-08-24 | 2016-06-21 | 수원대학교산학협력단 | Pedestrian detection method by using the feature of hog-pca and rbfnns pattern classifier |
CN106570497A (en) * | 2016-10-08 | 2017-04-19 | 中国科学院深圳先进技术研究院 | Text detection method and device for scene image |
CN106980858A (en) * | 2017-02-28 | 2017-07-25 | 中国科学院信息工程研究所 | The language text detection of a kind of language text detection with alignment system and the application system and localization method |
CN107886082A (en) * | 2017-11-24 | 2018-04-06 | 腾讯科技(深圳)有限公司 | Mathematical formulae detection method, device, computer equipment and storage medium in image |
US20180314715A1 (en) * | 2013-05-01 | 2018-11-01 | Cloudsight, Inc. | Content Based Image Management and Selection |
CN108805131A (en) * | 2018-05-22 | 2018-11-13 | 北京旷视科技有限公司 | Text line detection method, apparatus and system |
CN108885699A (en) * | 2018-07-11 | 2018-11-23 | 深圳前海达闼云端智能科技有限公司 | Character identifying method, device, storage medium and electronic equipment |
US20180342061A1 (en) * | 2016-07-15 | 2018-11-29 | Beijing Sensetime Technology Development Co., Ltd | Methods and systems for structured text detection, and non-transitory computer-readable medium |
CN109389091A (en) * | 2018-10-22 | 2019-02-26 | 重庆邮电大学 | The character identification system and method combined based on neural network and attention mechanism |
-
2019
- 2019-03-13 CN CN201910188639.XA patent/CN111695377B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5884296A (en) * | 1995-03-13 | 1999-03-16 | Minolta Co., Ltd. | Network and image area attribute discriminating device and method for use with said neural network |
US20050281455A1 (en) * | 2004-06-17 | 2005-12-22 | Chun-Chia Huang | System of using neural network to distinguish text and picture in images and method thereof |
US20180314715A1 (en) * | 2013-05-01 | 2018-11-01 | Cloudsight, Inc. | Content Based Image Management and Selection |
KR101631694B1 (en) * | 2015-08-24 | 2016-06-21 | 수원대학교산학협력단 | Pedestrian detection method by using the feature of hog-pca and rbfnns pattern classifier |
US20180342061A1 (en) * | 2016-07-15 | 2018-11-29 | Beijing Sensetime Technology Development Co., Ltd | Methods and systems for structured text detection, and non-transitory computer-readable medium |
CN106570497A (en) * | 2016-10-08 | 2017-04-19 | 中国科学院深圳先进技术研究院 | Text detection method and device for scene image |
CN106980858A (en) * | 2017-02-28 | 2017-07-25 | 中国科学院信息工程研究所 | The language text detection of a kind of language text detection with alignment system and the application system and localization method |
CN107886082A (en) * | 2017-11-24 | 2018-04-06 | 腾讯科技(深圳)有限公司 | Mathematical formulae detection method, device, computer equipment and storage medium in image |
CN108805131A (en) * | 2018-05-22 | 2018-11-13 | 北京旷视科技有限公司 | Text line detection method, apparatus and system |
CN108885699A (en) * | 2018-07-11 | 2018-11-23 | 深圳前海达闼云端智能科技有限公司 | Character identifying method, device, storage medium and electronic equipment |
CN109389091A (en) * | 2018-10-22 | 2019-02-26 | 重庆邮电大学 | The character identification system and method combined based on neural network and attention mechanism |
Non-Patent Citations (1)
Title |
---|
黄同城;丁友东;: "基于小波神经网络的图像文本信息提取技术研究", no. 02 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112330379A (en) * | 2020-11-25 | 2021-02-05 | 税友软件集团股份有限公司 | Invoice content generation method and system, electronic equipment and storage medium |
CN112330379B (en) * | 2020-11-25 | 2023-10-31 | 税友软件集团股份有限公司 | Invoice content generation method, invoice content generation system, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111695377B (en) | 2023-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108229490B (en) | Key point detection method, neural network training method, device and electronic equipment | |
CN108960211B (en) | Multi-target human body posture detection method and system | |
JP5188334B2 (en) | Image processing apparatus, image processing method, and program | |
JP5714599B2 (en) | Fast subspace projection of descriptor patches for image recognition | |
KR101896357B1 (en) | Method, device and program for detecting an object | |
US11017210B2 (en) | Image processing apparatus and method | |
US20130089260A1 (en) | Systems, Methods, and Software Implementing Affine-Invariant Feature Detection Implementing Iterative Searching of an Affine Space | |
CN110059728B (en) | RGB-D image visual saliency detection method based on attention model | |
US20100290708A1 (en) | Image retrieval apparatus, control method for the same, and storage medium | |
CN111709313B (en) | Pedestrian re-identification method based on local and channel combination characteristics | |
CN111612822B (en) | Object tracking method, device, computer equipment and storage medium | |
CN113269257A (en) | Image classification method and device, terminal equipment and storage medium | |
CN111898428A (en) | Unmanned aerial vehicle feature point matching method based on ORB | |
EP3671635B1 (en) | Curvilinear object segmentation with noise priors | |
CN110020593B (en) | Information processing method and device, medium and computing equipment | |
CN114549861A (en) | Target matching method based on feature point and convolution optimization calculation and storage medium | |
CN113159103B (en) | Image matching method, device, electronic equipment and storage medium | |
CN111695377B (en) | Text detection method and device and computer equipment | |
CN113269752A (en) | Image detection method, device terminal equipment and storage medium | |
CN110969176A (en) | License plate sample amplification method and device and computer equipment | |
CN116704206A (en) | Image processing method, device, computer equipment and storage medium | |
CN116758419A (en) | Multi-scale target detection method, device and equipment for remote sensing image | |
CN116246161A (en) | Method and device for identifying target fine type of remote sensing image under guidance of domain knowledge | |
CN114119970B (en) | Target tracking method and device | |
Wu et al. | An accurate feature point matching algorithm for automatic remote sensing image registration |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |