WO2017167046A1 - 一种字符识别方法和装置 - Google Patents

一种字符识别方法和装置 Download PDF

Info

Publication number
WO2017167046A1
WO2017167046A1 PCT/CN2017/077254 CN2017077254W WO2017167046A1 WO 2017167046 A1 WO2017167046 A1 WO 2017167046A1 CN 2017077254 W CN2017077254 W CN 2017077254W WO 2017167046 A1 WO2017167046 A1 WO 2017167046A1
Authority
WO
WIPO (PCT)
Prior art keywords
classifier
layer
character
probability
picture
Prior art date
Application number
PCT/CN2017/077254
Other languages
English (en)
French (fr)
Inventor
毛旭东
施兴
褚崴
程孟力
周文猛
Original Assignee
阿里巴巴集团控股有限公司
毛旭东
施兴
褚崴
程孟力
周文猛
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 毛旭东, 施兴, 褚崴, 程孟力, 周文猛 filed Critical 阿里巴巴集团控股有限公司
Priority to EP17773076.9A priority Critical patent/EP3422256B1/en
Publication of WO2017167046A1 publication Critical patent/WO2017167046A1/zh
Priority to US16/144,219 priority patent/US10872274B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/2163Partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/147Determination of region of interest
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/416Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present application relates to the field of image recognition technologies, and in particular, to a character recognition method and a character recognition device.
  • the currently implemented scheme mainly includes the following two types: 1) first extracting features of the image, such as HOG (Histogram of Oriented Gradient) features, and then using SVM (Support Vector Machine, support vector) (computer) classifier, neural network classifier and other training character recognition model; 2) using the Convolutional Neural Networks (CNN), training character recognition model.
  • HOG Heistogram of Oriented Gradient
  • SVM Small Vector Machine, support vector
  • CNN Convolutional Neural Networks
  • the detected character position may be inaccurate, and some noise may exist, for example, a spot other than the character is detected as Characters, which in turn cause some image data containing noise in the divided image data, so the use of the classifier to identify each image data After that, you need to filter out the output that is not noise.
  • FIG. 1A it is a single-word diagram obtained by segmenting the ID card number area of an ID card picture. As shown in Fig. 1A, when the picture itself is vague, the positioning number line will not be very accurate, and there will be some noise in the head or the tail. When the word is cut, the noise in the front side will be cut out, resulting in the cut word being greater than 18 You need to select 18 numbers.
  • the prior art solution is to add a CNN classifier that recognizes "is a character”. With this classifier, the image data that is not a number is excluded, and then the character classifier of "what character” is used to identify the remaining picture data, and the specific characters are identified from the remaining picture data.
  • this technical solution has certain disadvantages, as follows:
  • a character recognition method which includes:
  • the first classifier is a classifying a specific character from the picture data
  • the second classifier is a classifier that identifies whether the picture data is a character picture
  • the recognition result of the character is output.
  • the method further comprises:
  • the first classifier includes a first N layer calculation layer and a rear M layer calculation layer;
  • the parameters of the first N layer calculation layer of the fixed second classifier are parameters of the first N layer calculation layer of the first classifier, and the parameter values of the latter L layer of the second classifier are trained by using the non-character picture samples and the character picture samples.
  • the computing layer shared by the first classifier and the second classifier includes:
  • a convolutional layer or a convolutional layer, and at least one layer of fully connected layers.
  • the characters are numbers.
  • the step of acquiring picture data includes:
  • Each picture data is segmented from the number area of the picture of the identity certificate.
  • the step of calculating, according to the first probability and the second probability, that the picture data is recognized as a confidence of each character comprises:
  • the step of outputting the recognition result of the character according to the confidence level comprises:
  • the number corresponding to the picture that meets the specified number of the ID card is selected and output in order.
  • the present application also discloses a character recognition device, which includes:
  • a picture acquisition module adapted to acquire picture data
  • a first result calculation module configured to calculate, by using a calculation layer shared by the first classifier and the second classifier, the picture data to obtain a first result;
  • the first classifier is to identify a specific character from the picture data a classifier;
  • the second classifier is a classifier that identifies whether the picture data is a character picture;
  • a first probability calculation module configured to bring the first result into a calculation layer remaining in the first classifier except the shared computing layer, to obtain a first probability corresponding to each character
  • a second probability calculation module configured to bring the first result into a calculation layer remaining in the second classifier except the shared computing layer, to obtain a second probability
  • a confidence calculation module configured to calculate, according to the first probability and the second probability, a confidence that the picture data is recognized as each character
  • the output module is adapted to output a recognition result of the character according to the confidence level.
  • the method further comprises:
  • a first classifier training module configured to: use a character picture sample to train parameter values of each calculation layer of the first classifier; the first classifier includes a first N layer calculation layer and a rear M layer calculation layer;
  • the first N layers of the first classifier calculate parameters of the layer, and use the non-character picture samples and the character picture samples to train the parameter values of the latter L layer of the second classifier.
  • the computing layer shared by the first classifier and the second classifier includes:
  • a convolutional layer or a convolutional layer, and at least one layer of fully connected layers.
  • the characters are numbers.
  • the picture obtaining module includes:
  • the picture cut molecular module is adapted to segment each picture data from the number area of the picture of the identity certificate.
  • the confidence calculation module includes:
  • the confidence calculation sub-module is adapted to multiply the maximum first probability by the second probability to obtain a confidence level of the number corresponding to the first probability that the picture data is the largest.
  • the output module comprises:
  • the output sub-module is adapted to select, from each picture data, a number corresponding to the top-ranked picture that meets the specified number of the ID card, and output in order.
  • the picture data is calculated by using a calculation layer shared by the first classifier and the second classifier to obtain a first result; and then the first result is respectively brought into the first classifier.
  • the second classifier shares a part of the computing layer with the first classifier.
  • the calculation process and the calculation result are also shared, so the character is relative to the background technology.
  • the process of adding a complete "is not a character" classifier before the classifier, and then calculating the picture in order, the relative increase of the calculation amount is relatively small, reducing the calculation time, and improving the efficiency of character recognition relative to the background art. .
  • the second classifier and the first classifier are not used in sequence, but the probability values obtained by the two classifiers are respectively multiplied to obtain a confidence, and then the corresponding recognition result is output according to the confidence value, relative to the background.
  • the technique improves the accuracy of character recognition and does not have a big impact on the entire recognition process due to problems with the "is not a character" classifier.
  • FIG. 1 is a flow chart showing the steps of an embodiment of a character recognition method of the present application.
  • 1A is a schematic diagram of an identity card number of the present application.
  • 1B is a schematic diagram of a first classifier and a second classifier of the present application
  • FIG. 2 is a flow chart showing the steps of an embodiment of a character recognition method of the present application.
  • FIG. 3 is a structural block diagram of an embodiment of a character recognition apparatus of the present application.
  • FIG. 4 is a block diagram showing the structure of an embodiment of a character recognition apparatus of the present application.
  • One of the core concepts of the embodiments of the present application is to calculate the picture data by using a computing layer shared by the first classifier and the second classifier after acquiring the picture data, to obtain a first node. Then, respectively, the first result is brought into the first classifier to calculate the remaining computing layer except the shared computing layer, to obtain a first probability corresponding to each character; and the first result is brought into the second classifier Calculating the remaining computing layer except the shared computing layer to obtain a second probability; further calculating, according to the first probability and the second probability, the confidence that the image data is recognized as each character; and finally outputting the character according to the confidence level Identification result.
  • the present application enables a first classifier that identifies a specific character from the picture data and a second classifier that identifies whether the picture data is a character picture.
  • the second classifier can share part of the calculation layer data, so that the picture data can be simultaneously calculated and calculated.
  • the process overlaps, reduces the amount of calculation, improves the calculation accuracy, and identifies the picture together by the two classifiers to improve the accuracy and reduce the impact of the second classifier on the whole recognition process.
  • FIG. 1 a flow chart of steps of an embodiment of a character recognition method of the present application is shown, which may specifically include the following steps:
  • Step 110 Acquire image data.
  • This application introduces a character recognition method for pictures, which first needs to acquire picture data to be identified.
  • a picture is a flat medium composed of graphics, images, and the like.
  • the picture data described in this application is a digital picture.
  • the digital pictures are commonly used in many storage formats, such as BMP (Bitmap, Standard Image File Format), TIFF (Tagged Image File Format), and JPEG (Joint Photographic Experts Group, Joint Photographic Experts Group), GIF (Graphics Interchange Format), PSD (Photoshop-Specific Format), PDF (Portable Document Format), and other formats.
  • BMP Bitmap, Standard Image File Format
  • TIFF Tagged Image File Format
  • JPEG Joint Photographic Experts Group
  • GIF Graphics Interchange Format
  • PSD Photoshop-Specific Format
  • PDF Portable Document Format
  • the storage format of the specific digital picture is not limited in this application.
  • the picture data acquired in step 110 may be pre-processed, for example, the picture data is grayed out, so that the color picture data can be changed into gray-scale picture data, thereby reducing the calculation amount.
  • Step 120 Calculate the picture data by using a calculation layer shared by the first classifier and the second classifier to obtain a first result;
  • the first classifier is a classifier that identifies a specific character from the picture data;
  • the second classifier is a classifier that identifies whether the picture data is a character picture.
  • the present application uses two classifiers to identify the image data, wherein the first classifier is used to identify specific characters from the picture data, and the second classifier is used to identify whether the picture data is a character picture.
  • the first classifier and the second classifier share a part of the computing layer.
  • the specific characters such as the Arabic numerals of 0-9, such as the Greek characters ⁇ , ⁇ , ⁇ , etc., may also be other characters, and may be set according to actual needs.
  • the corresponding model can also be determined according to the corresponding characters.
  • Both the first classifier and the second classifier can use a Convolutional Neural Networks (CNN) classification model.
  • the convolutional neural network classification model includes at least one convolutional layer, at least one fully connected layer, and one Softmax layer.
  • FIG. 1B is a schematic diagram of a first classifier and a second classifier. Wherein, the convolution layer a, the convolution layer b, the fully connected layer c, the fully connected layer d, the Softmax layer, and the digital classifier constitute a first classifier, and the convolution layer a, the convolution layer b, and the fully connected layer c The fully connected layer e, the Softmax layer, and whether or not the digital classifier constitutes the second classifier.
  • the computing layer shared by the first classifier and the second classifier is a convolution layer a, a convolution layer b, and a fully connected layer c.
  • the first classifier and the second classifier do not share a fully connected layer.
  • a Softmax layer In FIG. 1B, both the first classifier and the second classifier comprise two convolutional layers, two fully connected layers, and one Softmax layer.
  • the convolution layer included in the first classifier and the second classifier can be flexibly set according to requirements.
  • the specific number of the fully connected layers is not limited in this embodiment of the present application.
  • the first classifier may be trained by using the picture data samples of the corresponding characters. For example, using a digital picture sample of 0-9, the first classifier that trains "what kind of number" is trained. When training the first classifier, it is actually training the parameter values of the various calculation layers of the classifier. Then, after training the first classifier, the embodiment of the present application can fix the parameters of the previous part of the calculation layer, such as fixing the parameters of all the convolution layers, and then using the image data samples that are not numbers and the image data samples that are digital, and fixed. In the case of the parameter values of the convolutional layer, the parameter values of the subsequent calculation layers are trained. Thus, the second classifier and the first classifier share a computing layer with the same parameter value.
  • the optimal case is a shared convolution layer, and a fully connected layer except the last layer fully connected layer. This can reduce the amount of calculation and increase the accuracy.
  • the dimension of the input picture data is C ⁇ N ⁇ N
  • the size of the convolution layer of the convolution layer is m ⁇ m.
  • C is the R channel of R (Red, red), G (Green, green), B (Blue, blue) of the picture data
  • the two Ns before and after N ⁇ N respectively represent the pixel size of the picture data in the lateral direction.
  • the value of the pixel in the longitudinal direction may be different from the value of the input image data, and may be the same.
  • the size of the convolution kernel m ⁇ m the values of the two ms may be the same or different, and the embodiment of the present invention is not limited thereto. It should be noted that the larger value of the two m should be smaller than the smaller of the two N.
  • k, i, j represent the coordinates of the output value
  • k corresponds to the R, G, and B of the picture data.
  • the channel, i corresponds to the pixel position of the picture data in the lateral direction
  • j corresponds to the pixel position of the picture data in the longitudinal direction.
  • w is the parameter value of the convolutional layer
  • x is the input value
  • y is the output value.
  • w is a known parameter and can be obtained by prior training on convolutional layer.
  • each convolution layer there may be multiple convolution kernels, for example, the number of convolution kernels may be different from the input picture data except for the pixel size in the lateral direction and the pixel size in the longitudinal direction.
  • the outer dimensions are the same, for example, the C in the three-dimensional matrix C ⁇ N ⁇ N of the foregoing picture data, because C represents the three channels of the picture data R, G, and B, so the convolution layer can have three m ⁇ m as described above.
  • the convolution kernel is a 3 ⁇ m ⁇ m three-dimensional matrix composed of the convolution kernel of the convolution layer at this time, that is, a convolution matrix of the volume base layer.
  • each convolution kernel m ⁇ m is convolved with the three-dimensional matrix C ⁇ N ⁇ N of the input picture data to obtain a two-dimensional matrix.
  • the second convolution kernel m ⁇ m and C are convolved in the picture data of the G channel to obtain a two-dimensional matrix
  • the third convolution kernel m ⁇ m and C are convolved in the picture data of the B channel to obtain a two-dimensional matrix
  • the three two-dimensional matrices obtained by convolving the three convolution kernels form a three-dimensional matrix, which is the output result of the convolution layer shown in equation (1).
  • ⁇ (*) is the sigmoid function
  • w is the parameter of the fully connected layer.
  • x is the input value
  • j represents each category
  • y represents the category label
  • is the parameter of the Softmax layer
  • e is a constant.
  • the category with the number y as the example includes 0, 1, 2, .... Then the formula can calculate the probability that the digital picture data corresponds to the nine numbers 0, 1, 2, .
  • the image data is first calculated by using a computing layer shared by the first classifier and the second classifier to obtain a first result.
  • the method before step 120, the method further includes:
  • Step S11 training parameter values of each computing layer of the first classifier by using character picture samples; the first classifier includes a first N layer computing layer and a back M layer computing layer.
  • the first classifier and the second classifier may be unknown, or in order to further improve the accuracy of the two, in the embodiment of the present application, the first classifier and the second are utilized. Before the classifier performs the calculation process separately, it is necessary to train the parameters in the first classifier and the second classifier.
  • the picture sample trains the parameter values of the respective calculation layers of the first classifier.
  • the first classifier includes a first N layer computing layer and a back M layer computing layer, the first N layer computing layer is a computing layer shared by the second classifier, and the latter M layer computing layer is not shared with the second classifier. Computational layer.
  • the first classifier may be trained by using at least one character picture sample, wherein the character picture sample refers to a character picture sample in which the character has been clearly recognized, and the character picture sample includes a character type that is greater than a set number.
  • the character picture sample may be used as the input of the first classifier, and the classification of the character picture sample with a probability of 0 and the classification with the probability of 1 as the ideal output may be used to train the parameters of the first classifier.
  • the training process consists of four main steps, which are divided into two phases:
  • the first phase the forward propagation phase:
  • the second phase, the post-propagation phase is the second phase, the post-propagation phase:
  • the work of these two stages should generally be controlled by the accuracy requirements, and the accuracy requirements can be flexibly set according to the requirements, which is not limited in this application.
  • the first classifier For the training of the first classifier, it is actually training the parameters w in the formula (1) corresponding to the respective convolutional layers, the parameter w in the formula (2) corresponding to each fully connected layer, and the parameter ⁇ in the Softmax layer.
  • the different convolutional layer formula (1) w is different
  • the different fully connected layer formula (2) w is different.
  • Step S12 the parameters of the first N layer calculation layer of the second classifier are fixed as the parameters of the first N layer calculation layer of the first classifier, and the non-character picture samples and the character picture samples are used to train the rear L layer of the second classifier. Parameter value.
  • the parameters of the first N-layer computing layer of the second classifier are also determined.
  • the parameter values of the L-layer of the second classifier may be trained, and L and M may be the same or different, which is not limited in this application.
  • the second classifier calculates the probability that the input picture data is a character picture
  • the non-character picture needs to be considered in the process of training, so in the application embodiment, at least one non-character picture sample and at least A character picture sample trains the parameter values of the second L layer calculation layer of the second classifier.
  • each character picture sample can be used as the input of the second classifier, and the probability is 1 as the ideal output; and each non-character picture sample is used as the input of the second classifier.
  • the rate is 0 as the ideal output; the parameters of the second classifier are trained.
  • the specific training process is similar to the training process of the first classifier in step S11, and mainly includes four steps, and the four steps are divided into two phases:
  • the first phase the forward propagation phase:
  • the second phase, the post-propagation phase is the second phase, the post-propagation phase:
  • the work of these two stages should generally be controlled by the accuracy requirements, and the accuracy requirements of the second classifier can also be flexibly set according to requirements, which is not limited in this application.
  • the parameter values of the first N layer calculation layer and the latter L layer calculation layer of the second classifier may be trained by using the non-character picture sample and the character picture sample. Then, the parameters of the first N layer calculation layer of the first classifier are fixed as the parameters of the first N layer calculation layer of the second classifier, and the parameter values of the back M layer calculation layer of the first classifier are trained by the character picture samples.
  • the parameters of the formula (1) of each convolution layer are determined by step S11; then the foregoing (character picture data sample + non-character is actually utilized) Image data sample) Train the parameters in formula (2) corresponding to each fully connected layer The number w, and the parameter ⁇ in the Softmax layer.
  • the convolutional layer + partial fully connected layer is shared with the first classifier, and of course the shared fully connected layer is shared according to the input order of the parameters, that is, the fully connected layer of the shared order is shared, then the formula of each convolution layer ( The parameter of 1) is determined by step S11, and the parameter w in the formula (2) of the fully connected layer of the shared portion is also determined by step S11. Then, using the aforementioned (character picture data sample + non-character picture data sample), the parameter w in the formula (2) corresponding to the remaining unshared fully connected layer, and the parameter ⁇ in the Softmax layer are trained.
  • Step 130 Bring the first result into a computing layer remaining in the first classifier except the shared computing layer, and obtain a first probability corresponding to each character.
  • the first classifier is a classifier for identifying a specific character from the picture data, and the first result obtained in step 120 is carried into the calculation layer remaining in the first classifier except the shared computing layer, that is, Getting the picture data may be the first probability of each character.
  • the first result is taken as an input value into the entirety of the all-connected layer d and the Softmax layer 1 in the first classifier shown in FIG. 1B, that is, the first probability corresponding to each character can be obtained.
  • the character corresponding to the picture data may be an Arabic numeral between 0 and 9, or may be 52 English characters between uppercase characters A to Z and between lowercase letters a to z, or punctuation marks, special symbols, Chinese characters, Roman characters, etc.
  • One or more characters in the character types that may appear in the picture data can be set according to requirements, which is not limited in the embodiment of the present invention.
  • the classification algorithm of the Softmax layer can be used to calculate the probability that the input picture data may be individual characters, that is, the first probability corresponding to each character.
  • Step 140 bringing the first result into the second classifier except the shared computing layer
  • the remaining calculation layer is calculated to obtain a second probability.
  • the second classifier is a classifier for identifying whether the picture data is a character picture, and the first result obtained in step 120 is carried into the calculation layer remaining in the second classifier except the shared calculation layer, that is, The second probability of getting the picture data as a character picture.
  • the first result is taken as an input value into the whole of the all-connected layer e and the Softmax layer 2 in the second classifier shown in FIG. 1B, and the second probability corresponding to each character can be obtained.
  • the second probability obtained by the second classifier is the probability that the picture data is a character picture.
  • the characters corresponding to the character picture may also be the various character types described in step 130, and may be set according to requirements, but it should be noted that
  • the character type corresponding to the first classifier may be consistent with the character type corresponding to the second classifier, or the character type corresponding to the second classifier includes the character type corresponding to the first classifier, which is not limited in this embodiment of the present invention. .
  • the character type corresponding to the first classifier matches the character type corresponding to the second classifier, the efficiency and accuracy of the final character recognition will be higher.
  • the character picture refers to the picture containing the set character type
  • the probability of calculating the picture data as the character picture refers to calculating the probability that the picture data is a picture containing the set character type, and the obtained result is Is the second probability
  • the second classifier also calculates the second probability by using its own Softmax layer. If the possible character type has been set, the classification algorithm of the Softmax layer can be used to calculate the probability that the input picture data may be a character picture, that is, Corresponds to the first probability of each character.
  • the first probability and the second probability calculated by the first classifier and the second classifier are different in essence, the first classifier and the second classifier have the remaining calculations except the shared computing layer.
  • the parameters and structure of the layers, especially the Softmax layer, are not necessarily the same.
  • Step 150 Calculate, according to the first probability and the second probability, the confidence that the picture data is recognized as each character.
  • the first probability refers to the probability that the picture data may be each character
  • the second probability refers to the probability that the picture data may be a character picture.
  • the number of the first probability is Corresponding to the type of character set, the number of first probabilities is equal to the number of character types, at least one, and corresponding to inputting one picture data, the second probability obtained can only be one.
  • the confidence that the picture data can be recognized as each character can be calculated. For example, the confidence that the image data is recognized as each character can be obtained by multiplying the first probability of the image data corresponding to each character and the second probability of the image data, respectively.
  • ten first probabilities can be obtained by the first classifier, corresponding to the probability that the picture data is an Arabic number between 0 and 9.
  • the first probability p0 refers to the probability that the picture data is the character 0
  • the first probability p1 refers to the probability that the picture data is the character 1
  • the first probability p9 refers to the probability that the picture data is the character 9, and so on.
  • Only a second probability can be obtained by the second classifier, that is, the picture data satisfies the set condition, for example, the probability s of the Arabic numeral.
  • p0 is multiplied by s
  • the obtained image data is recognized as the confidence of the character
  • p9 is multiplied by s
  • the obtained image data is recognized as the confidence of the character 9.
  • Step 160 Output a recognition result of the character according to the confidence level.
  • the confidence of each character may be recognized according to the calculated picture data, and the character with the highest confidence is output as the recognition result.
  • step 150 it is determined in step 150 whether a picture data is an Arabic number between 0 and 9. If the confidence value corresponding to each character is finally calculated, the confidence obtained by multiplying p9 and s is the largest, then Character 9 is output as a recognition result.
  • the image data is calculated by using a computing layer shared by the first classifier and the second classifier to obtain a first result; a result is brought into the first classifier to calculate the remaining computing layer except the shared computing layer to obtain a first probability corresponding to each character; the first result is brought into the second classifier except the shared computing layer
  • the remaining calculation layer is calculated to obtain a second probability; and further, according to the first probability and the second probability, the recognition of the picture data as each character is calculated; and finally, the recognition result of the character is output according to the confidence.
  • the second classifier shares a part of the computing layer with the first classifier.
  • the calculation process and the calculation result are also shared, so the character is relative to the background technology.
  • the process of adding a complete "is not a character" classifier before the classifier, and then calculating the picture in order, the relative increase of the calculation amount is relatively small, reducing the calculation time, and improving the efficiency of character recognition relative to the background art. .
  • the second classifier and the first classifier are not used in sequence, but the probability values obtained by the two classifiers are respectively multiplied to obtain a confidence, and then the corresponding recognition result is output according to the confidence value, relative to the background.
  • the technique improves the accuracy of character recognition and does not have a big impact on the entire recognition process due to problems with the "is not a character" classifier.
  • FIG. 2 a flow chart of steps of an embodiment of a character recognition method of the present application is shown. Specifically, the method may include the following steps:
  • Step 210 Segment each picture data from the number area of the picture of the identity certificate.
  • the number identification of the picture of the identity certificate is performed, because the picture of the identity certificate may include multiple numbers, such as an identity card number, so for the convenience of identification, the number area of the picture from the identity proof is first needed.
  • each picture data is segmented, as shown in FIG. 1A, and a plurality of picture data are obtained by segmentation. For example, the area where the ID card number is located is sequentially divided into picture data containing only one digit. For a specific segmentation method, it is a well-known technology in the art. The embodiments of the present application are not described in detail.
  • Step 220 Calculate the picture data by using a calculation layer shared by the first classifier and the second classifier to obtain a first result;
  • the first classifier is a classifier that identifies a specific character from the picture data;
  • the second classifier is a classifier that identifies whether the picture data is a character picture.
  • the computing layer shared by the first classifier and the second classifier includes: a convolution layer, or a convolution layer and at least one layer of fully connected layers.
  • the characters are numbers.
  • the first classifier calculates the probability that the input picture data is any number between 0 and 9, respectively, and the second classifier calculates the probability that the input picture data can be recognized as a number.
  • Step 230 Bring the first result into a computing layer remaining in the first classifier except the shared computing layer, and obtain a first probability corresponding to each character.
  • Step 240 Bring the first result to a computing layer remaining in the second classifier except the shared computing layer to calculate, to obtain a second probability.
  • Step 250 Multiply the maximum first probability by the second probability to obtain a confidence level of the number corresponding to the first probability that the picture data is the largest.
  • the largest first probability is the first probability corresponding to the number that the input picture data is most likely to be, and the maximum first probability is multiplied by the second probability, that is, the first probability that the input picture data is the largest is obtained.
  • the confidence level of the number is the first probability corresponding to the number that the input picture data is most likely to be, and the maximum first probability is multiplied by the second probability, that is, the first probability that the input picture data is the largest is obtained.
  • Step 260 Select, from each picture data, a number corresponding to the top of the picture that meets the specified number of the ID card, and output it in order.
  • each picture obtained by the segmentation has a corresponding number 0-9 probability
  • the picture divided by FIG. 1B is divided into a plurality of pictures arranged from the left end to the right end according to the writing habit of the ID card. Then this The application determines that each picture corresponds to the maximum probability of 0-9, and then selects the top 18 images with the highest probability of each picture from the arranged pictures, and then combines the corresponding probability digital pictures, that is, Get the ID number.
  • the pictures may be marked for recording.
  • the user first divides it into a plurality of character data, and according to the writing habit, it can cut the 22 picture data that are not connected to each other from the left end to the right end. A1 to a22, respectively, and then using the first classifier and the second classifier to calculate the confidence of the maximum probability number corresponding to each picture data, and select 18 picture data and 18 pictures according to the order of confidence from high to low.
  • the data corresponds to the maximum probability number, assuming that the selected picture data is in descending order of the maximum illuminance of each digit in the picture: a5: (0.95, 2), a6: (0.94, 0), A12: (0.93, 8), a15: (0.92, 9), a11: (0.92, 9), a13: (0.90, 9), a16: (0.90, 2), a4: (0.89, 4), a10 ( 0.89,1), a14:(0.88,0), a7:(0.87,9), a17:(0.86,6), a8:(0.85,2), a18:(0.84,5), a9:(0.84, 1), a19: (0.83, 1), a20: (0.81, 3), a21: (0.80, 8), a2 (0.1, 8), a1 (0.1, 9), a22 (0.09, 0), a3 ( 0.09,0) but in the process of output, it is still the most The order
  • the number area of the picture of the identity certificate it can be divided into a plurality of picture data, and the first classifier and the second classifier are respectively used in sequence, and the above steps 220-250 are performed to calculate the confidence of each picture data, and sequentially The numbers corresponding to the first probabilities are output, thereby realizing the identification of the number of the identity certificate, such as the ID number.
  • the first classifier and the second are also used after acquiring the picture data.
  • the computing layer shared by the classifier calculates the picture data to obtain a first result; and then respectively introduces the first result into a computing layer remaining in the first classifier except the shared computing layer, to obtain corresponding characters.
  • a first probability the first result is brought into the second classifier and the remaining computing layers except the shared computing layer are calculated to obtain a second probability; and then the first probability and the second probability are calculated according to the first probability and the second probability
  • the picture data is recognized as the confidence of each character; finally, according to the confidence, the recognition result of the character is output. Further, the efficiency and accuracy of character recognition are improved with respect to the background art.
  • the present application may first train one of the classifiers, then fix the calculation layer shared by the first classifier and the second classifier, and continue training another
  • the computational layer that the classifier is not trained reduces the workload of the training relative to the background art, and also improves the efficiency of training the first classifier and the second classifier. The efficiency and accuracy of character recognition are further improved.
  • FIG. 3 a structural block diagram of an embodiment of a character recognition apparatus of the present application is shown, which may specifically include the following modules:
  • the picture obtaining module 310 is adapted to acquire picture data.
  • a first result calculation module 320 adapted to utilize a meter shared by the first classifier and the second classifier Calculating, by the calculation layer, the picture data to obtain a first result;
  • the first classifier is a classifier for identifying a specific character from the picture data; and
  • the second classifier is a classifier for identifying whether the picture data is a character picture .
  • the first probability calculation module 330 is adapted to bring the first result into a calculation layer remaining in the first classifier except the shared computing layer to obtain a first probability corresponding to each character.
  • the second probability calculation module 340 is adapted to bring the first result into a calculation layer remaining in the second classifier except the shared computing layer to obtain a second probability.
  • the confidence calculation module 350 is adapted to calculate, according to the first probability and the second probability, the confidence that the picture data is recognized as each character.
  • the output module 360 is adapted to output a recognition result of the character according to the confidence level.
  • the method before the first result calculation module 320, the method further includes:
  • the first classifier training module 370 is adapted to use the character picture samples to train parameter values of the respective calculation layers of the first classifier; the first classifier includes a first N layer calculation layer and a rear M layer calculation layer.
  • the method before the second probability calculation module 340, the method further includes:
  • the second classifier training module 380 is configured to fix a parameter of the first N layer computing layer of the second classifier as a parameter of the first N layer computing layer of the first classifier, and train the second using the non-character picture sample and the character picture sample.
  • the parameter value of the rear L layer of the classifier is configured to fix a parameter of the first N layer computing layer of the second classifier as a parameter of the first N layer computing layer of the first classifier, and train the second using the non-character picture sample and the character picture sample.
  • the picture data is calculated by using a calculation layer shared by the first classifier and the second classifier to obtain a first result; and then the first result is respectively brought into the first category.
  • the calculation layer remaining in the device except the shared computing layer is calculated. a first probability corresponding to each character; bringing the first result into a computing layer remaining in the second classifier except the shared computing layer, to obtain a second probability; and further according to the first probability and the second probability Calculating the confidence that the picture data is recognized as each character; finally, outputting the recognition result of the character according to the confidence.
  • the second classifier shares a part of the computing layer with the first classifier.
  • the calculation process and the calculation result are also shared, so one is added with respect to the background technology.
  • the relative increase in the amount of calculation of this application is relatively small, and the efficiency of character recognition is improved relative to the background art.
  • the second classifier and the first classifier are not used in sequence, but the probability values obtained by the two classifiers are respectively multiplied to obtain a confidence, and then the corresponding recognition result is output according to the confidence value, relative to the background.
  • Technology improves the accuracy of character recognition.
  • FIG. 4 a structural block diagram of an embodiment of a character recognition apparatus of the present application is shown, which may specifically include the following modules:
  • the picture obtaining module 410 is adapted to acquire picture data. Specifically include:
  • the picture splicing module 411 is adapted to segment the picture data from the number area of the picture of the identity certificate.
  • the first result calculation module 420 is adapted to calculate the picture data by using a calculation layer shared by the first classifier and the second classifier to obtain a first result; the first classifier is to identify a specific character from the picture data. a classifier; the second classifier is a classifier that identifies whether the picture data is a character picture.
  • the first probability calculation module 430 is adapted to bring the first result into a calculation layer remaining in the first classifier except the shared computing layer to obtain a first probability corresponding to each character.
  • the second probability calculation module 440 is adapted to bring the first result into a calculation layer remaining in the second classifier except the shared computing layer to obtain a second probability.
  • the confidence calculation module 450 is adapted to calculate, according to the first probability and the second probability, the confidence that the picture data is recognized as each character. Specifically include:
  • the confidence calculation sub-module 451 is adapted to multiply the maximum first probability by the second probability to obtain a confidence level of the number corresponding to the first probability that the picture data is the largest.
  • the output module 460 is adapted to output a recognition result of the character according to the confidence level. Specifically include:
  • the output sub-module 461 is adapted to select, from each picture data, a number corresponding to the top-ranked picture that meets the specified number of the ID card, and output it in order.
  • the image data is calculated by using a computing layer shared by the first classifier and the second classifier to obtain a first result; and then the first result is respectively brought into the first
  • the remaining computing layers in the classifier except the shared computing layer are calculated to obtain a first probability corresponding to each character; the first result is brought into the computing layer remaining in the second classifier except the shared computing layer Performing a calculation to obtain a second probability; and further calculating a confidence that the picture data is recognized as each character according to the first probability and the second probability; and finally outputting a recognition result of the character according to the confidence.
  • the efficiency and accuracy of character recognition are improved with respect to the background art.
  • the present application may first train one of the classifiers, then fix the calculation layer shared by the first classifier and the second classifier, and continue training another
  • the computational layer that the classifier is not trained reduces the workload of the training relative to the background art, and also improves the efficiency of training the first classifier and the second classifier. The efficiency and accuracy of character recognition are further improved.
  • the ratio is described because it is substantially similar to the method embodiment. It is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
  • embodiments of the embodiments of the present application can be provided as a method, apparatus, or computer program product. Therefore, the embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the computer device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory Memory (DRAM), other types of random access memory (RAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, read only CD only Read memory (CD-ROM), character versatile disc (DVD) or other optical storage, magnetic cassette, magnetic tape storage or other magnetic storage device or any other non-transportable medium that can be used to store access that can be accessed by a computing device. information.
  • computer readable media does not include non-persistent computer readable media, such as modulated data signals and carrier waves.
  • Embodiments of the present application are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG.
  • These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal device to produce a machine such that instructions are executed by a processor of a computer or other programmable data processing terminal device
  • Means are provided for implementing the functions specified in one or more of the flow or in one or more blocks of the flow chart.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing terminal device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • Instruction device implemented in the process Figure One or more processes and/or block diagrams of the functions specified in a block or blocks.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

一种字符识别方法和装置,涉及图像识别技术领域,降低了计算时间,提高了字符识别的效率。所述方法包括:获取图片数据(110);利用第一分类器和第二分类器共享的计算层对所述图片数据进行计算,得到第一结果;所述第一分类器为从图片数据中识别具体字符的分类器;所述第二分类器为识别图片数据是否为字符图片的分类器(120);将所述第一结果带入第一分类器中除共享的计算层之外剩余的计算层进行计算,得到对应各字符的第一概率(130);将所述第一结果带入第二分类器中除共享的计算层之外剩余的计算层进行计算,得到第二概率(140);根据所述第一概率和第二概率,计算所述图片数据识别为各个字符的置信度(150);根据置信度,输出字符的识别结果(160)。

Description

一种字符识别方法和装置 技术领域
本申请涉及图像识别技术领域,特别是涉及一种字符识别方法和一种字符识别装置。
背景技术
近年来随着计算机技术和数字图像处理技术的飞速发展,图片识别技术尤其是对于图片中的数字、字母、特殊符号等字符进行识别的技术越来越多的在各个方面都有着较为广泛的应用需求。
对于识别图片中的字符,目前存在的识别过程是:
1、检测到图片中字符的位置;
2、切分成包含单个字符的图片数据;
3、使用字符分类器识别各个图片数据。
其中,对于字符分类器,目前实现的方案主要包括以下两种:1)首先提取图片的特征,比如HOG(Histogram of Oriented Gradient,方向梯度直方图)特征,然后使用SVM(Support Vector Machine,支持向量机)分类器、神经网络分类器等训练字符识别模型;2)利用卷积神经网络(Convolutional Neural Networks,CNN),训练字符识别模型。然后利用训练好的字符识别模型对输入的图片数据进行字符识别。
但是在实际应用过程中,例如在图片比较模糊或者图片中包含的字符比较多的情况下,可能会导致检测到的字符位置不够准确,其中会存在一些噪声,例如不是字符的斑点等被检测为字符,进而造成切分成的图片数据中存在一些包含噪声的图片数据,所以在利用分类器识别各个图片数据 之后,需要从中筛选出不是噪声的输出结果。以身份证识别为例,如图1A,其是从一张身份证图片的身份证号码区域切分得到的单字图。如图1A,在图片本身比较模糊的时候,定位号码行会不是很准确,头部或者尾部会多出来一些噪声,造成切单字的时候,会切出来前边的噪声,导致切出来的单字大于18个,需要选取18个数字。
针对上述的问题,已有的技术方案是增加一个识别“是不是字符”的CNN分类器。利用这个分类器,先排除不是数字的图片数据,然后再利用“是何种字符”的字符分类器识别剩下的图片数据,从剩下的图片数据中识别具体的字符。但是该技术方案存在一定的缺点,如下:
首先,增加一个分类器,其是先后进行计算,意味着同时增加了计算时间,影响运行效率;
其次,在实际操作中,如果增加的用以识别“是不是数字”的CNN分类器的计算结果出现错误,则不管后续的字符识别模型是否会出现问题,整个识别过程都会不可避免地发生错误,降低了字符识别的准确性。
发明内容
鉴于上述问题,提出了本申请实施例以便提供一种克服上述问题或者至少部分地解决上述问题的一种字符识别方法和相应的一种字符识别装置。
为了解决上述问题,本申请公开了一种字符识别方法,其特征在于,包括:
获取图片数据;
利用第一分类器和第二分类器共享的计算层对所述图片数据进行计算,得到第一结果;所述第一分类器为从图片数据中识别具体字符的分类 器;所述第二分类器为识别图片数据是否为字符图片的分类器;
将所述第一结果带入第一分类器中除共享的计算层之外剩余的计算层进行计算,得到对应各字符的第一概率;
将所述第一结果带入第二分类器中除共享的计算层之外剩余的计算层进行计算,得到第二概率;
根据所述第一概率和第二概率,计算所述图片数据识别为各个字符的置信度;
根据置信度,输出字符的识别结果。
优选地,还包括:
利用字符图片样本训练第一分类器的各计算层的参数值;所述第一分类器包括前N层计算层和后M层计算层;
固定第二分类器的前N层计算层的参数为第一分类器的前N层计算层的参数,并利用非字符图片样本和字符图片样本训练第二分类器的后L层的参数值。
优选地,所述第一分类器和第二分类器共享的计算层包括:
卷积层、或者卷积层和至少一层全连接层。
优选地,所述字符为数字。
优选地,所述获取图片数据的步骤,包括:
从身份证明的图片的号码区域,切分各个图片数据。
优选地,所述根据所述第一概率和第二概率,计算所述图片数据识别为各个字符的置信度的步骤包括:
将最大的第一概率与第二概率相乘,得到所述图片数据为最大的第一概率对应的数字的置信度。
优选地,所述根据置信度,输出字符的识别结果的步骤,包括:
从各个图片数据中,选择排序靠前的符合所述身份证规定个数的图片所对应的数字,并按序输出。
本申请还公开了一种字符识别装置,其特征在于,包括:
图片获取模块,适于获取图片数据;
第一结果计算模块,适于利用第一分类器和第二分类器共享的计算层对所述图片数据进行计算,得到第一结果;所述第一分类器为从图片数据中识别具体字符的分类器;所述第二分类器为识别图片数据是否为字符图片的分类器;
第一概率计算模块,适于将所述第一结果带入第一分类器中除共享的计算层之外剩余的计算层进行计算,得到对应各字符的第一概率;
第二概率计算模块,适于将所述第一结果带入第二分类器中除共享的计算层之外剩余的计算层进行计算,得到第二概率;
置信度计算模块,适于根据所述第一概率和第二概率,计算所述图片数据识别为各个字符的置信度;
输出模块,适于根据置信度,输出字符的识别结果。
优选地,还包括:
第一分类器训练模块,适于利用字符图片样本训练第一分类器的各计算层的参数值;所述第一分类器包括前N层计算层和后M层计算层;
第二分类器训练模块,适于固定第二分类器的前N层计算层的参数为 第一分类器的前N层计算层的参数,并利用非字符图片样本和字符图片样本训练第二分类器的后L层的参数值。
优选地,所述第一分类器和第二分类器共享的计算层包括:
卷积层、或者卷积层和至少一层全连接层。
优选地,所述字符为数字。
优选地,所述图片获取模块,包括:
图片切分子模块,适于从身份证明的图片的号码区域,切分各个图片数据。
优选地,所述置信度计算模块,包括:
置信度计算子模块,适于将最大的第一概率与第二概率相乘,得到所述图片数据为最大的第一概率对应的数字的置信度。
优选地,所述输出模块,包括:
输出子模块,适于从各个图片数据中,选择排序靠前的符合所述身份证规定个数的图片所对应的数字,并按序输出。
本申请实施例包括以下优点:
本申请实施例,在获取图片数据之后利用第一分类器和第二分类器共享的计算层对所述图片数据进行计算,得到第一结果;然后分别将第一结果带入第一分类器中除共享的计算层之外剩余的计算层进行计算,得到对应各字符的第一概率;将所述第一结果带入第二分类器中除共享的计算层之外剩余的计算层进行计算,得到第二概率;进而根据第一概率和第二概率,计算所述图片数据识别为各个字符的置信度;最后根据置信度,输出字符的识别结果。
其中,第二分类器时与第一分类器共享一部分计算层的,对于第二分类器与第一分类器共享的计算层,其计算过程以及计算结果也是共享的,所以相对于背景技术在字符分类器之前增加一个完整的“是不是字符”分类器,然后按序对图片进行计算的过程,本申请相对增加的计算量比较少,降低了计算时间,相对于背景技术提高了字符识别的效率。
另外,第二分类器与第一分类器并不是前后顺序使用,而是将两个分类器分别得到的概率值相乘得到一个置信度,然后根据置信度值输出相应的识别结果,相对于背景技术提高了字符识别的准确率,不会因为“是不是字符”分类器出现问题,而对整个识别过程产生太大的影响。
附图说明
图1是本申请的一种字符识别方法实施例的步骤流程图;
图1A是本申请的一种身份证号码示意图;
图1B是本申请的一种第一分类器和第二分类器的示意图;
图2本申请的一种字符识别方法实施例的步骤流程图;
图3是本申请的一种字符识别装置实施例的结构框图;以及
图4是本申请的一种字符识别装置实施例的结构框图。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本申请作进一步详细的说明。
本申请实施例的核心构思之一在于,在获取图片数据之后利用第一分类器和第二分类器共享的计算层对所述图片数据进行计算,得到第一结 果;然后分别将第一结果带入第一分类器中除共享的计算层之外剩余的计算层进行计算,得到对应各字符的第一概率;将所述第一结果带入第二分类器中除共享的计算层之外剩余的计算层进行计算,得到第二概率;进而根据第一概率和第二概率,计算所述图片数据识别为各个字符的置信度;最后根据置信度,输出字符的识别结果。本申请使从图片数据中识别具体字符的第一分类器和识别图片数据是否为字符图片的分类器的第二分类器可以共享部分计算层数据,从而在计算时可以同时对图片数据,并且计算过程存在重合,降低计算量,提高计算准确度,并且通过两个分类器计算结果一起对图片进行识别,提高准确率,降低第二分类器出现问题对整个识别过程的影响。
实施例一
参照图1,示出了本申请的一种字符识别方法实施例的步骤流程图,具体可以包括如下步骤:
步骤110,获取图片数据。
本申请介绍的是一种针对图片的字符识别方法,首先需要获取所要识别的图片数据。
图片是指由图形、图像等构成的平面媒体。本申请所述的图片数据是数字图片,数字图片常用的存储格式很多,例如BMP(Bitmap,标准图像文件格式)、TIFF(Tagged Image File Format,位图图像格式)、JPEG(Joint Photographic Experts Group,联合图像专家小组)、GIF(Graphics Interchange Format,图像互换格式)、PSD(Photoshop专用格式)、PDF(Portable Document Format,可移植文件格式)等格式。但是本申请对于具体的数字图片的存储格式不加以限定。
在本申请实施例中,还可以对步骤110获取的图片数据进行预处理,比如将上述图片数据进行灰度化,那么可以将彩色的图片数据变成灰度的图片数据,从而可以降低计算量。
步骤120,利用第一分类器和第二分类器共享的计算层对所述图片数据进行计算,得到第一结果;所述第一分类器为从图片数据中识别具体字符的分类器;所述第二分类器为识别图片数据是否为字符图片的分类器。
为了降低识别错误的可能性,本申请利用两个分类器完成对图片数据的识别,其中第一分类器用以从图片数据中识别具体字符,第二分类器用以识别图片数据是否为字符图片。同时为了降低识别的时间成本,提高识别的效率以及准确率所以第一分类器和第二分类器共用一部分计算层。
在本申请实施例中,所述具体字符比如0-9的阿拉伯数字,又比如希腊字符α、β、γ等字符,也可以为其他字符,具体可以根据实际需求设定。相应的模型也可以根据相应字符的相应确定。
第一分类器和第二分类器都可以采用卷积神经网络(Convolutional Neural Networks,CNN)分类模型。卷积神经网络分类模型包括至少一个卷积层、至少一个全连接层、以及一个Softmax层。如图1B为一个第一分类器和第二分类器的示意图。其中,卷积层a、卷积层b、全连接层c、全连接层d、Softmax层以及数字分类器构成了第一分类器,而卷积层a、卷积层b、全连接层c、全连接层e、Softmax层以及是不是数字分类器构成了第二分类器。可见,此时第一分类器和第二分类器共享的计算层为卷积层a、卷积层b以及全连接层c,第一分类器和第二分类器不共享的是一个全连接层和一个Softmax层。在图1B中,第一分类器和第二分类器都包含两个卷积层、两个全连接层以及一个Softmax层。但是,在实际应用中,可以根据需求灵活设定第一分类器和第二分类器所包含的卷积层以 及全连接层的具体数量,对此本申请实施例不加以限定。
在本申请实施例中,可以先利用相应字符的图片数据样本,训练第一分类器。比如,利用0-9的数字图片样本,训练“是何种数字”的第一分类器。训练第一分类器时,实际上是训练该分类器的各个计算层的参数值。那么本申请实施例在训练了第一分类器后,可以固定前面部分计算层的参数,比如固定所有卷积层的参数,然后利用不是数字的图片数据样本和是数字的图片数据样本,在固定了卷积层的参数值的情况下,训练后续计算层的参数值。如此第二分类器与第一分类器则共享了参数值相同的计算层。
在本申请实施例中,最优的情况是共享卷积层,和除最后一层全连接层之前的全连接层。如此可以降低计算量,也可以提高准确度。
其中,卷积层的计算过程如下:
假设输入的图片数据的维度为C×N×N,卷积层的卷积核(kernel)的大小是m×m。其中,C表示图片数据的R(Red,红色)、G(Green、绿色)、B(Blue、蓝色)三通道,N×N的前后两个N分别表示图片数据在横向方向上的像素大小与纵向方向上的像素大小,根据输入图像数据的不同,前后两个N的值可以不同,也可以相同,对此本发明实施例不加以限定。对于卷积核的大小m×m,前后两个m的值也可以相同或者不同,对此本发明实施例也不加以限定。需要说明的是,两个m中的较大值应该小于两个N中的较小值。则经过卷积层后,输出值为:
Figure PCTCN2017077254-appb-000001
其中,k,i,j表示输出值的坐标,k对应于图片数据的R、G、B三 通道,i对应于图片数据在横向方向上的像素点位置,j对应于图片数据在纵向方向上的像素位置。w是卷积层的参数值,x是输入值,y是输出值。在本申请实施例中,w是已知的参数,可以通过预先的对卷积层训练获得。
在本申请实施例中,对于每个卷积层,可以有多个卷积核,例如卷积核的个数可以与输入图片数据除了在横向方向上像素大小以及在纵向方向上的像素大小之外的维度一致,例如前述图片数据的三维矩阵C×N×N中的C,因为C代表图片数据R、G、B三通道,所以此时卷积层可以有3个如前述的m×m的卷积核,则此时由该卷积层的卷积核构成的3×m×m的三维矩阵,即为该卷基层的卷积矩阵。在具体运算过程中,每个卷积核m×m与输入图片数据的三维矩阵C×N×N进行卷积,得到一个二维矩阵。例如:
第一个卷积核m×m与C为R通道时的图片数据卷积,得到一个二维矩阵;
第二个卷积核m×m与C为G通道时的图片数据卷积,得到一个二维矩阵;
第三个卷积核m×m与C为B通道时的图片数据卷积,得到一个二维矩阵;
将三个卷积核卷积得出的三个二维矩阵构成一个三维的矩阵,这个三维的矩阵就是公式(1)所示的卷积层的输出结果。
全连接层的计算公式如下:
假设输入数据的维度为N,则经过全连接层后,输出值为:
Figure PCTCN2017077254-appb-000002
其中,σ(*)为sigmoid函数,
Figure PCTCN2017077254-appb-000003
w为全连接层的参数。
Softmax层的计算公式如下:
Figure PCTCN2017077254-appb-000004
其中,x为输入值,j表示每一个类别,y表示类别标签,θ为Softmax层的参数,e为常量。以数字为例y的类别包括0,1,2……9。那么该公式可以计算数字图片数据对应0、1、2……9这9个数字的概率。
由上述分析以及图1B知,在将获取的图片数据输入之后,第一分类器和第一分类器共享的计算层的计算过程是一致的,因此在经过第一分类器和第一分类器共享的计算层之后,获取的第一结果也是一致的,而后,对于第一分类器和第一分类器不共享的计算层,如图1B中的全连接层d和全连接层e,是将第一结果分别作为第一分类器和第一分类器不共享的计算层的输入,此时,第一分类器和第一分类器不共享的计算层的计算过程不相同。
所以在本申请实施例中,先利用第一分类器和第二分类器共享的计算层对所述图片数据进行计算,得到第一结果。
在本申请另一个优选的实施例中,在步骤120之前,还包括:
步骤S11,利用字符图片样本训练第一分类器的各计算层的参数值;所述第一分类器包括前N层计算层和后M层计算层。
因为第一分类器和第二分类器中的参数可能为未知的,或者是为了进一步提高两者的准确度,在本申请实施例中,在利用第一分类器以及第二 分类器分别执行计算过程之前,需要先训练第一分类器以及第二分类器中的参数。
因为第一分类器和第二分类器最终计算的概率本质上是不同的,所以对于第一分类器的训练不需要考虑图片数据不是字符的情况,所以在本申请实施例中,可以首先利用字符图片样本训练第一分类器的各计算层的参数值。其中,第一分类器包括前N层计算层和后M层计算层,其前N层计算层是与第二分类器共享的计算层,后M层计算层则是不与第二分类器共享的计算层。
在本申请实施例中,可以利用至少一个字符图片样本训练第一分类器,其中的字符图片样本是指已经明确识别出字符的字符图片样本,字符图片样本包括的字符种类应该大于设定的第一分类器和第二分类器可识别的字符种类。所以可知,对应于字符图片样本,其识别为自身对应的字符的概率为1,为其他字符的概率都为0。此时,可以将字符图片样本作为第一分类器的输入,将字符图片样本的概率为0的分类以及概率为1的分类作为理想输出,对第一分类器的参数进行训练。
训练过程主要包括四步,这四步被分为两个阶段:
第一阶段,向前传播阶段:
(1)、选择一个字符图片样本,输入第一分类器;
(2)、计算相应的实际输出;在此阶段,第一分类器会随机生成初始参数,字符图片数据从输入层经过逐级的变换,传送到输出层。这个过程也是第一分类器在完成训练后正常执行时执行的过程。
第二阶段,向后传播阶段:
(1)、计算实际输出与相应的理想输出的差;
(2)、按极小化误差的方法调整参数。
这两个阶段的工作一般应受到精度要求的控制,精度要求可以根据需求灵活设定,对此本申请不加以限定。
对于第一分类器的训练,实际上是训练各个卷积层对应的公式(1)中的参数w,各全连接层对应的公式(2)中的参数w,以及Softmax层中的参数θ。其中,不同的卷积层的公式(1)的w不同,不同的全连接层的公式(2)的w不同.
步骤S12,固定第二分类器的前N层计算层的参数为第一分类器的前N层计算层的参数,并利用非字符图片样本和字符图片样本训练第二分类器的后L层的参数值。
因为第二分类器的前N层计算层是与第一分类器的前N层计算层共享的,所以在确定了第一分类器的前N层计算层的参数之后,那么此时相当于第二分类器的前N层计算层的参数也一样确定了,可以只训练第二分类器的后L层的参数值,其中L与M可以相同,也可以不同,对此本申请不加以限定。
因为第二分类器时计算输入图片数据为字符图片的概率,所以在对其训练的过程还需要考虑非字符图片的情况,所以,在申请实施例中,可以利用至少一个非字符图片样本和至少一个字符图片样本训练第二分类器的后L层计算层的参数值。
对于非字符图片样本,其是字符图片的概率为0,不是字符图片的概率为1;而对于字符图片样本,其是字符图片的概率为1,不是字符图片的概率为0。此时,可以将各字符图片样本作为第二分类器的输入,将概率为1作为理想输出;将各非字符图片样本作为第二分类器的输入,将概 率为0作为理想输出;对第二分类器的参数进行训练。
具体的训练过程,与步骤S11第一分类器的训练过程类似,同样主要包括四步,这四步被分为两个阶段:
第一阶段,向前传播阶段:
(1)、选择一个字符图片样本或者非字符图片样本,输入第一分类器;
(2)、计算相应的实际输出;在此阶段,第一分类器会随机生成初始参数,字符图片数据或者非字符图片数据从输入层经过逐级的变换,传送到输出层。这个过程也是第二分类器在完成训练后正常执行时执行的过程。
第二阶段,向后传播阶段:
(1)、计算实际输出与相应的理想输出的差;
(2)、按极小化误差的方法调整参数。
这两个阶段的工作一般也应受到精度要求的控制,第二分类器器的精度要求同样可以根据需求灵活设定,对此本申请不加以限定。
需要说明的是,在本申请的另一优选地实施例中,也可以先利用非字符图片样本和字符图片样本训练第二分类器的前N层计算层和后L层计算层的参数值,然后固定第一分类器的前N层计算层的参数为第二分类器的前N层计算层的参数,并利用字符图片样本训练第一分类器的后M层计算层的参数值。
对于第二分类器的训练,如果与第一分类器共享卷积层,则其各卷积层的公式(1)的参数由步骤S11确定;然后实际上利用前述(字符图片数据样本+非字符图片数据样本)训练各全连接层对应的公式(2)中的参 数w,以及Softmax层中的参数θ。
如果如果与第一分类器共享卷积层+部分全连接层,当然共享的全连接层是按照参数的输入顺序共享,即共享排序靠前的全连接层,那么其各卷积层的公式(1)的参数由步骤S11确定,其共享部分的全连接层的公式(2)中的参数w也由步骤S11确定。然后利用前述(字符图片数据样本+非字符图片数据样本)训练剩余的未共享的全连接层对应的公式(2)中的参数w,以及Softmax层中的参数θ。
步骤130,将所述第一结果带入第一分类器中除共享的计算层之外剩余的计算层进行计算,得到对应各字符的第一概率。
第一分类器是用以从图片数据中识别具体字符的分类器,将经步骤120得到的第一结果带入第一分类器中除共享的计算层之外剩余的计算层进行计算,即可以得到图片数据可能为各字符的第一概率。例如将第一结果作为输入值带入图1B所示的第一分类器中的全连接层d以及Softmax层1组成的整体,即可以得到对应各字符的第一概率。
其中,图片数据对应的字符可以为0到9之间的阿拉伯数字,也可以为大写字符A到Z之间以及小写字母a到z之间的52个英文字符,或者是标点符号、特殊符号、汉字、罗马字符等等在图片数据中可能出现的字符类型中一种或多种字符。在本申请实施例中,所对应的具体字符种类可以根据需求设定,对此本发明实施例中不加以限定。
在Softmax层,如果已经设定可能的字符种类,则利用Softmax层的分类算法,即可以算出输入的图片数据可能为各个字符的概率,即为其对应各字符的第一概率。
步骤140,将所述第一结果带入第二分类器中除共享的计算层之外剩 余的计算层进行计算,得到第二概率。
第二分类器是用以识别图片数据是否为字符图片的分类器,将经步骤120得到的第一结果带入第二分类器中除共享的计算层之外剩余的计算层进行计算,即可以得到图片数据为字符图片的第二概率。例如将第一结果作为输入值带入图1B所示的第二分类器中的全连接层e以及Softmax层2组成的整体,即可以得到对应各字符的第二概率。
第二分类器得到的第二概率是图片数据为字符图片的概率,其中字符图片所对应的字符同样可能为步骤130所述的各种字符类型,可以根据需求设定,但是需要说明的是,第一分类器对应的字符种类与第二分类器对应的字符种类可以是一致的,或者第二分类器对应的字符种类包含第一分类器对应的字符种类,对此本发明实施例不加以限定。但是相对而言,对于第一分类器对应的字符种类与第二分类器对应的字符种类一致的情况,最终识别字符的效率以及准确度会更高。对于第二分类器而言,字符图片是指包含设定的字符类型的图片,计算图片数据为字符图片的概率是指计算图片数据为包含设定的字符类型的图片的概率,得到的结果即为第二概率。
第二分类器也是利用其自身的Softmax层计算第二概率,如果已经设定可能的字符种类,则利用Softmax层的分类算法,即可以算出输入的图片数据可能为字符图片的概率,即为其对应各字符的第一概率。
需要说明的是,因为第一分类器和第二分类器计算得到的第一概率和第二概率的本质不相同,所以第一分类器和第二分类器除了共享的计算层之外剩余的计算层,尤其是Softmax层的参数以及结构并不一定相同。
步骤150,根据所述第一概率和第二概率,计算所述图片数据识别为各个字符的置信度。
如前述,第一概率是指图片数据可能为各个字符的概率,而第二概率是指图片数据可能为字符图片的概率,可知,对于一个图片数据而言,其第一概率的个数是与设定的字符种类相对应的,第一概率的个数等同于字符种类的个数,至少为一个,而对应输入一个图片数据,得到的第二概率只能为一个。此时,根据图像数据的第一概率和第二概率,可以计算该图片数据可以识别为各个字符的置信度。例如,可以通过分别将图像数据对应于各个字符的第一概率与该图片数据的第二概率相乘,得到该图片数据识别为各个字符的置信度。
例如若要识别一个图片数据是否为0到9之间的阿拉伯数字,则经过第一分类器可以得到十个第一概率,分别对应于该图片数据为0到9之间的阿拉伯数字的概率,例如第一概率p0是指该图片数据为字符0的概率,第一概率p1是指该图片数据为字符1的概率,第一概率p9是指该图片数据为字符9的概率,等等。而经过第二分类器只能得到一个第二概率,即为该图片数据满足设定的条件,例如为阿拉伯数字的概率s。此时,将p0与s相乘,得到的即为该图片数据识别为字符0的置信度,而将p9与s相乘,得到的即为该图片数据识别为字符9的置信度。
步骤160,根据置信度,输出字符的识别结果。
在本申请实施例中,可以根据计算出来的图片数据识别为各个字符的置信度,输出其中对应置信度最高的字符作为识别结果输出。
例如,步骤150中所述的识别一个图片数据是否为0到9之间的阿拉伯数字,若最终计算得到的对应各个字符的置信度中,p9与s相乘得到的置信度最大,则可以将字符9作为识别结果输出。
在本申请实施例中,在获取图片数据之后利用第一分类器和第二分类器共享的计算层对所述图片数据进行计算,得到第一结果;然后分别将第 一结果带入第一分类器中除共享的计算层之外剩余的计算层进行计算,得到对应各字符的第一概率;将所述第一结果带入第二分类器中除共享的计算层之外剩余的计算层进行计算,得到第二概率;进而根据第一概率和第二概率,计算所述图片数据识别为各个字符的置信度;最后根据置信度,输出字符的识别结果。
其中,第二分类器时与第一分类器共享一部分计算层的,对于第二分类器与第一分类器共享的计算层,其计算过程以及计算结果也是共享的,所以相对于背景技术在字符分类器之前增加一个完整的“是不是字符”分类器,然后按序对图片进行计算的过程,本申请相对增加的计算量比较少,降低了计算时间,相对于背景技术提高了字符识别的效率。
另外,第二分类器与第一分类器并不是前后顺序使用,而是将两个分类器分别得到的概率值相乘得到一个置信度,然后根据置信度值输出相应的识别结果,相对于背景技术提高了字符识别的准确率,不会因为“是不是字符”分类器出现问题,而对整个识别过程产生太大的影响。
实施例二
参照图2,示出了本申请的一种字符识别方法实施例的步骤流程图,具体可以包括如下步骤:
步骤210,从身份证明的图片的号码区域,切分各个图片数据。
在本申请实施例中,是对身份证明的图片进行号码识别,因为身份证明的图片中可能包含多个号码,例如身份证号码,所以为了识别的方便,首先需要从身份证明的图片的号码区域,切分各个图片数据,如图1A,切分得到多个图片数据。例如将身份证号码所在的区域按序切分成只包含一位号码的图片数据,对于具体的切分方法,属于本领域的公知技术,对此 本申请实施例不加以赘述。
步骤220,利用第一分类器和第二分类器共享的计算层对所述图片数据进行计算,得到第一结果;所述第一分类器为从图片数据中识别具体字符的分类器;所述第二分类器为识别图片数据是否为字符图片的分类器。
在本申请的另一个优选的实施例中,所述第一分类器和第二分类器共享的计算层包括:卷积层、或者卷积层和至少一层全连接层。
在本申请的另一个优选的实施例中,所述字符为数字。
此时,第一分类器是计算输入图片数据分别为0到9之间任一数字的概率,第二分类器是计算输入图片数据可识别为数字的概率。
步骤230,将所述第一结果带入第一分类器中除共享的计算层之外剩余的计算层进行计算,得到对应各字符的第一概率。
步骤240,将所述第一结果带入第二分类器中除共享的计算层之外剩余的计算层进行计算,得到第二概率。
步骤250,将最大的第一概率与第二概率相乘,得到所述图片数据为最大的第一概率对应的数字的置信度。
其中最大的第一概率即为输入图片数据最可能为的数字对应的第一概率,将最大的第一概率与第二概率相乘,即可以得到输入的图片数据为最大的第一概率对应的数字的置信度。
步骤260,从各个图片数据中,选择排序靠前的符合所述身份证规定个数的图片所对应的数字,并按序输出。
由于对切分得到的各个图片都有对应数字0-9概率,而图1B切分的图片按身份证的撰写习惯,从左端至右端切分的排列的多个图片。那么本 申请则确定各张图片对应0-9中最大的概率,然后从排列好的图片中以每张图片最大的概率选择概率最靠前的18张图片,然后将相应概率数字图片的排序组合,即得到身份证号码。
当然,在本申请实施例中,图片的排序在步骤210切图时,可以对图片进行标注以记录。
例如,对于图1A所示的身份证号码进行字符识别,首先将其切分为多个字符数据,按照撰写习惯,可以从左端到右端对其进行切分成互不相连的22个图片数据,依次分别为a1到a22,然后利用第一分类器和第二分类器,计算各图片数据对应最大的概率数字的置信度,并按照置信度从高到低的顺序选择18个图片数据及18个图片数据对应最大概率的数字,假设选择出的图片数据按该图片中各数字的最大的照置信度从高到低的顺序分别为:a5:(0.95,2)、a6:(0.94,0)、a12:(0.93,8)、a15:(0.92,9)、a11:(0.92,9)、a13:(0.90,9)、a16:(0.90,2)、a4:(0.89,4)、a10(0.89,1)、a14:(0.88,0),a7:(0.87,9)、a17:(0.86,6)、a8:(0.85,2)、a18:(0.84,5)、a9:(0.84,1)、a19:(0.83,1),a20:(0.81,3)、a21:(0.80,8),a2(0.1,8),a1(0.1,9),a22(0.09,0),a3(0.09,0)但是在输出的过程中,仍然是按照最初切分时候的先后顺序,输出顺序为:a4、a5、a6、a7、a8、a9、a10、a11、a12、a13、a14、a15、a16、a17、a18、a19、a20、a21,则输出的数字序列为420921198909265138
对于身份证明的图片的号码区域,可以切分为多个图片数据,依次分别利用第一分类器和第二分类器,执行上述的步骤220-250,计算各个图片数据置信度,并分别按序输出各第一概率对应的数字,从而实现了对身份证明的号码,例如身份证号码的数字识别。
在本申请实施例中,同样在获取图片数据之后利用第一分类器和第二 分类器共享的计算层对所述图片数据进行计算,得到第一结果;然后分别将第一结果带入第一分类器中除共享的计算层之外剩余的计算层进行计算,得到对应各字符的第一概率;将所述第一结果带入第二分类器中除共享的计算层之外剩余的计算层进行计算,得到第二概率;进而根据第一概率和第二概率,计算所述图片数据识别为各个字符的置信度;最后根据置信度,输出字符的识别结果。进而相对于背景技术提高了字符识别的效率以及准确率。
另外,本申请在对第一分类器和第二分类器进行训练的过程中,可以先训练其中一个分类器,然后将第一分类器和第二分类器共享的计算层固定,继续训练另一个分类器未被训练的计算层,相对于背景技术,降低了训练的工作量,也提高了对第一分类器和第二分类器训练的效率。进一步提高了字符识别的效率以及准确率。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请实施例并不受所描述的动作顺序的限制,因为依据本申请实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本申请实施例所必须的。
实施例三
参照图3,示出了本申请的一种字符识别装置实施例的结构框图,具体可以包括如下模块:
图片获取模块310,适于获取图片数据。
第一结果计算模块320,适于利用第一分类器和第二分类器共享的计 算层对所述图片数据进行计算,得到第一结果;所述第一分类器为从图片数据中识别具体字符的分类器;所述第二分类器为识别图片数据是否为字符图片的分类器。
第一概率计算模块330,适于将所述第一结果带入第一分类器中除共享的计算层之外剩余的计算层进行计算,得到对应各字符的第一概率。
第二概率计算模块340,适于将所述第一结果带入第二分类器中除共享的计算层之外剩余的计算层进行计算,得到第二概率。
置信度计算模块350,适于根据所述第一概率和第二概率,计算所述图片数据识别为各个字符的置信度。
输出模块360,适于根据置信度,输出字符的识别结果。
在本申请的又一个优选地实施例中,在第一结果计算模块320之前,还包括:
第一分类器训练模块370,适于利用字符图片样本训练第一分类器的各计算层的参数值;所述第一分类器包括前N层计算层和后M层计算层。
在本申请的又一个优选地实施例中,在第二概率计算模块340之前,还包括:
第二分类器训练模块380,适于固定第二分类器的前N层计算层的参数为第一分类器的前N层计算层的参数,并利用非字符图片样本和字符图片样本训练第二分类器的后L层的参数值。
在本申请实施例中,在获取图片数据之后利用第一分类器和第二分类器共享的计算层对所述图片数据进行计算,得到第一结果;然后分别将第一结果带入第一分类器中除共享的计算层之外剩余的计算层进行计算,得 到对应各字符的第一概率;将所述第一结果带入第二分类器中除共享的计算层之外剩余的计算层进行计算,得到第二概率;进而根据第一概率和第二概率,计算所述图片数据识别为各个字符的置信度;最后根据置信度,输出字符的识别结果。
其中,第二分类器时与第一分类器共享一部分计算层的,对于第二分类器与第一分类器共享的计算层,其计算过程以及计算结果也是共享的,所以相对于背景技术增加一个完整的分类器,本申请相对增加的计算量比较少,相对于背景技术提高了字符识别的效率。
另外,第二分类器与第一分类器并不是前后顺序使用,而是将两个分类器分别得到的概率值相乘得到一个置信度,然后根据置信度值输出相应的识别结果,相对于背景技术提高了字符识别的准确率。
实施例四
参照图4,示出了本申请的一种字符识别装置实施例的结构框图,具体可以包括如下模块:
图片获取模块410,适于获取图片数据。具体包括:
图片切分子模块411,适于从身份证明的图片的号码区域,切分各个图片数据。
第一结果计算模块420,适于利用第一分类器和第二分类器共享的计算层对所述图片数据进行计算,得到第一结果;所述第一分类器为从图片数据中识别具体字符的分类器;所述第二分类器为识别图片数据是否为字符图片的分类器。
第一概率计算模块430,适于将所述第一结果带入第一分类器中除共享的计算层之外剩余的计算层进行计算,得到对应各字符的第一概率。
第二概率计算模块440,适于将所述第一结果带入第二分类器中除共享的计算层之外剩余的计算层进行计算,得到第二概率。
置信度计算模块450,适于根据所述第一概率和第二概率,计算所述图片数据识别为各个字符的置信度。具体包括:
置信度计算子模块451,适于将最大的第一概率与第二概率相乘,得到所述图片数据为最大的第一概率对应的数字的置信度。
输出模块460,适于根据置信度,输出字符的识别结果。具体包括:
输出子模块461,适于从各个图片数据中,选择排序靠前的符合所述身份证规定个数的图片所对应的数字,并按序输出。
在本申请实施例中,同样在获取图片数据之后利用第一分类器和第二分类器共享的计算层对所述图片数据进行计算,得到第一结果;然后分别将第一结果带入第一分类器中除共享的计算层之外剩余的计算层进行计算,得到对应各字符的第一概率;将所述第一结果带入第二分类器中除共享的计算层之外剩余的计算层进行计算,得到第二概率;进而根据第一概率和第二概率,计算所述图片数据识别为各个字符的置信度;最后根据置信度,输出字符的识别结果。进而相对于背景技术提高了字符识别的效率以及准确率。
另外,本申请在对第一分类器和第二分类器进行训练的过程中,可以先训练其中一个分类器,然后将第一分类器和第二分类器共享的计算层固定,继续训练另一个分类器未被训练的计算层,相对于背景技术,降低了训练的工作量,也提高了对第一分类器和第二分类器训练的效率。进一步提高了字符识别的效率以及准确率。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比 较简单,相关之处参见方法实施例的部分说明即可。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见即可。
本领域内的技术人员应明白,本申请实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本申请实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
在一个典型的配置中,所述计算机设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存 取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、字符多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非持续性的电脑可读媒体(transitory media),如调制的数据信号和载波。
本申请实施例是参照根据本申请实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程 图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本申请所提供的一种字符识别方法和一种字符识别装置,进行 了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (14)

  1. 一种字符识别方法,其特征在于,包括:
    获取图片数据;
    利用第一分类器和第二分类器共享的计算层对所述图片数据进行计算,得到第一结果;所述第一分类器为从图片数据中识别具体字符的分类器;所述第二分类器为识别图片数据是否为字符图片的分类器;
    将所述第一结果带入第一分类器中除共享的计算层之外剩余的计算层进行计算,得到对应各字符的第一概率;
    将所述第一结果带入第二分类器中除共享的计算层之外剩余的计算层进行计算,得到第二概率;
    根据所述第一概率和第二概率,计算所述图片数据识别为各个字符的置信度;
    根据置信度,输出字符的识别结果。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    利用字符图片样本训练第一分类器的各计算层的参数值;所述第一分类器包括前N层计算层和后M层计算层;
    固定第二分类器的前N层计算层的参数为第一分类器的前N层计算层的参数,并利用非字符图片样本和字符图片样本训练第二分类器的后L层的参数值。
  3. 根据权利要求1或2所述的方法,其特征在于,所述第一分类器和第二分类器共享的计算层包括:
    卷积层、或者卷积层和至少一层全连接层。
  4. 根据权利要求1所述的方法,其特征在于,所述字符为数字。
  5. 根据权利要求4所述的方法,其特征在于,所述获取图片数据的 步骤,包括:
    从身份证明的图片的号码区域,切分各个图片数据。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述第一概率和第二概率,计算所述图片数据识别为各个字符的置信度的步骤包括:
    将最大的第一概率与第二概率相乘,得到所述图片数据为最大的第一概率对应的数字的置信度。
  7. 根据权利要求6所述的方法,其特征在于,所述根据置信度,输出字符的识别结果的步骤,包括:
    从各个图片数据中,选择排序靠前的符合所述身份证规定个数的图片所对应的数字,并按序输出。
  8. 一种字符识别装置,其特征在于,包括:
    图片获取模块,适于获取图片数据;
    第一结果计算模块,适于利用第一分类器和第二分类器共享的计算层对所述图片数据进行计算,得到第一结果;所述第一分类器为从图片数据中识别具体字符的分类器;所述第二分类器为识别图片数据是否为字符图片的分类器;
    第一概率计算模块,适于将所述第一结果带入第一分类器中除共享的计算层之外剩余的计算层进行计算,得到对应各字符的第一概率;
    第二概率计算模块,适于将所述第一结果带入第二分类器中除共享的计算层之外剩余的计算层进行计算,得到第二概率;
    置信度计算模块,适于根据所述第一概率和第二概率,计算所述图片数据识别为各个字符的置信度;
    输出模块,适于根据置信度,输出字符的识别结果。
  9. 根据权利要求8所述的装置,其特征在于,还包括:
    第一分类器训练模块,适于利用字符图片样本训练第一分类器的各计算层的参数值;所述第一分类器包括前N层计算层和后M层计算层;
    第二分类器训练模块,适于固定第二分类器的前N层计算层的参数为第一分类器的前N层计算层的参数,并利用非字符图片样本和字符图片样本训练第二分类器的后L层的参数值。
  10. 根据权利要求8或9所述的装置,其特征在于,所述第一分类器和第二分类器共享的计算层包括:
    卷积层、或者卷积层和至少一层全连接层。
  11. 根据权利要求8所述的装置,其特征在于,所述字符为数字。
  12. 根据权利要求11所述的装置,其特征在于,所述图片获取模块,包括:
    图片切分子模块,适于从身份证明的图片的号码区域,切分各个图片数据。
  13. 根据权利要求12所述的装置,其特征在于,所述置信度计算模块,包括:
    置信度计算子模块,适于将最大的第一概率与第二概率相乘,得到所述图片数据为最大的第一概率对应的数字的置信度。
  14. 根据权利要求13所述的装置,其特征在于,所述输出模块,包括:
    输出子模块,适于从各个图片数据中,选择排序靠前的符合所述身份证规定个数的图片所对应的数字,并按序输出。
PCT/CN2017/077254 2016-03-29 2017-03-20 一种字符识别方法和装置 WO2017167046A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP17773076.9A EP3422256B1 (en) 2016-03-29 2017-03-20 Character recognition method and device
US16/144,219 US10872274B2 (en) 2016-03-29 2018-09-27 Character recognition method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610188113.8A CN107239786B (zh) 2016-03-29 2016-03-29 一种字符识别方法和装置
CN201610188113.8 2016-03-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/144,219 Continuation US10872274B2 (en) 2016-03-29 2018-09-27 Character recognition method and device

Publications (1)

Publication Number Publication Date
WO2017167046A1 true WO2017167046A1 (zh) 2017-10-05

Family

ID=59963457

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/077254 WO2017167046A1 (zh) 2016-03-29 2017-03-20 一种字符识别方法和装置

Country Status (5)

Country Link
US (1) US10872274B2 (zh)
EP (1) EP3422256B1 (zh)
CN (1) CN107239786B (zh)
TW (1) TWI766855B (zh)
WO (1) WO2017167046A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145891A (zh) * 2018-06-27 2019-01-04 上海携程商务有限公司 客户端及其识别身份证的方法、识别身份证的系统
CN109376731A (zh) * 2018-08-24 2019-02-22 北京三快在线科技有限公司 一种文字识别方法和装置
CN110765870A (zh) * 2019-09-18 2020-02-07 北京三快在线科技有限公司 一种ocr识别结果的置信度确定方法、装置及电子设备
CN111527528A (zh) * 2017-11-15 2020-08-11 天使游戏纸牌股份有限公司 识别系统

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239786B (zh) * 2016-03-29 2022-01-11 阿里巴巴集团控股有限公司 一种字符识别方法和装置
CN109902724B (zh) * 2019-01-31 2023-09-01 平安科技(深圳)有限公司 基于支持向量机的文字识别方法、装置和计算机设备
US11003937B2 (en) * 2019-06-26 2021-05-11 Infrrd Inc System for extracting text from images
CN110555462A (zh) * 2019-08-02 2019-12-10 深圳索信达数据技术有限公司 基于卷积神经网络的无固定多字符验证码识别方法
US20210097392A1 (en) * 2019-10-01 2021-04-01 Sensormatic Electronics, LLC Classification and re-identification
CN110909734A (zh) * 2019-10-29 2020-03-24 福建两岸信息技术有限公司 一种文献文字检测和识别的方法
CN110827333B (zh) * 2019-10-31 2022-05-03 国网河北省电力有限公司电力科学研究院 一种继电保护的压板拼接识别方法、系统及介质
CN111027529A (zh) * 2019-12-04 2020-04-17 深圳市新国都金服技术有限公司 减少深度学习ocr的参数量和计算量的方法与计算机设备及存储介质
CN111428553B (zh) * 2019-12-31 2022-07-15 深圳数联天下智能科技有限公司 人脸色素斑识别方法、装置、计算机设备和存储介质
CN111428552B (zh) * 2019-12-31 2022-07-15 深圳数联天下智能科技有限公司 黑眼圈识别方法、装置、计算机设备和存储介质
CN111914825B (zh) * 2020-08-03 2023-10-27 腾讯科技(深圳)有限公司 文字识别方法、装置及电子设备
CN111738269B (zh) * 2020-08-25 2020-11-20 北京易真学思教育科技有限公司 模型训练方法、图像处理方法及装置、设备、存储介质
CN112530086A (zh) * 2020-12-16 2021-03-19 合肥美的智能科技有限公司 自动售货柜及其商品sku计算方法、系统以及远程服务器
CN112861648B (zh) * 2021-01-19 2023-09-26 平安科技(深圳)有限公司 文字识别方法、装置、电子设备及存储介质
US11748923B2 (en) 2021-11-12 2023-09-05 Rockwell Collins, Inc. System and method for providing more readable font characters in size adjusting avionics charts
US11887222B2 (en) 2021-11-12 2024-01-30 Rockwell Collins, Inc. Conversion of filled areas to run length encoded vectors
US11915389B2 (en) 2021-11-12 2024-02-27 Rockwell Collins, Inc. System and method for recreating image with repeating patterns of graphical image file to reduce storage space
US11954770B2 (en) 2021-11-12 2024-04-09 Rockwell Collins, Inc. System and method for recreating graphical image using character recognition to reduce storage space
US11842429B2 (en) 2021-11-12 2023-12-12 Rockwell Collins, Inc. System and method for machine code subroutine creation and execution with indeterminate addresses
CN116343232A (zh) * 2023-04-03 2023-06-27 内蒙古师范大学 一种基于预分类的古籍数学符号识别方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101140625A (zh) * 2006-09-06 2008-03-12 中国科学院自动化研究所 一种多分辨率退化字符自适应识别系统及方法
JP2009048641A (ja) * 2007-08-20 2009-03-05 Fujitsu Ltd 文字認識方法および文字認識装置
CN101630367A (zh) * 2009-07-31 2010-01-20 北京科技大学 基于多分类器的手写体字符识别拒识方法
CN105095889A (zh) * 2014-04-22 2015-11-25 阿里巴巴集团控股有限公司 特征提取、字符识别、引擎生成、信息确定方法及装置

Family Cites Families (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0574937B1 (en) 1992-06-19 2000-08-16 United Parcel Service Of America, Inc. Method and apparatus for input classification using a neural network
DE69431393T2 (de) 1994-01-13 2003-01-23 St Microelectronics Srl Anlage zur Bilderkennung alphanumerischen Zeichen
US5745599A (en) 1994-01-19 1998-04-28 Nippon Telegraph And Telephone Corporation Character recognition method
US5577135A (en) 1994-03-01 1996-11-19 Apple Computer, Inc. Handwriting signal processing front-end for handwriting recognizers
US5542006A (en) 1994-06-21 1996-07-30 Eastman Kodak Company Neural network based character position detector for use in optical character recognition
US5912986A (en) * 1994-06-21 1999-06-15 Eastman Kodak Company Evidential confidence measure and rejection technique for use in a neural network based optical character recognition system
US6026177A (en) 1995-08-29 2000-02-15 The Hong Kong University Of Science & Technology Method for identifying a sequence of alphanumeric characters
US5835633A (en) * 1995-11-20 1998-11-10 International Business Machines Corporation Concurrent two-stage multi-network optical character recognition system
JPH09223195A (ja) 1996-02-06 1997-08-26 Hewlett Packard Co <Hp> 文字認識方法
US7336827B2 (en) 2000-11-08 2008-02-26 New York University System, process and software arrangement for recognizing handwritten characters
AUPR824401A0 (en) 2001-10-15 2001-11-08 Silverbrook Research Pty. Ltd. Methods and systems (npw002)
US7016529B2 (en) * 2002-03-15 2006-03-21 Microsoft Corporation System and method facilitating pattern recognition
SE0202446D0 (sv) 2002-08-16 2002-08-16 Decuma Ab Ideon Res Park Presenting recognised handwritten symbols
EP1661062A4 (en) 2003-09-05 2009-04-08 Gannon Technologies Group SYSTEMS AND METHODS FOR BIOMETRIC IDENTIFICATION THROUGH THE USE OF HANDWIRE IDENTIFICATION
US20070065003A1 (en) 2005-09-21 2007-03-22 Lockheed Martin Corporation Real-time recognition of mixed source text
US7646913B2 (en) 2005-12-19 2010-01-12 Microsoft Corporation Allograph based writer adaptation for handwritten character recognition
US7724957B2 (en) 2006-07-31 2010-05-25 Microsoft Corporation Two tiered text recognition
CN102375991B (zh) * 2010-08-24 2016-04-13 北京中星微电子有限公司 分类器训练方法和装置以及字符识别方法和装置
US8503801B2 (en) * 2010-09-21 2013-08-06 Adobe Systems Incorporated System and method for classifying the blur state of digital image pixels
US8867828B2 (en) * 2011-03-04 2014-10-21 Qualcomm Incorporated Text region detection system and method
CN103530600B (zh) * 2013-06-06 2016-08-24 东软集团股份有限公司 复杂光照下的车牌识别方法及系统
CN104346622A (zh) * 2013-07-31 2015-02-11 富士通株式会社 卷积神经网络分类器及其分类方法和训练方法
CN103971091B (zh) * 2014-04-03 2017-04-26 北京首都国际机场股份有限公司 飞机机号自动识别方法
CN105224939B (zh) * 2014-05-29 2021-01-01 小米科技有限责任公司 数字区域的识别方法和识别装置、移动终端
US20150347860A1 (en) * 2014-05-30 2015-12-03 Apple Inc. Systems And Methods For Character Sequence Recognition With No Explicit Segmentation
CN103996057B (zh) * 2014-06-12 2017-09-12 武汉科技大学 基于多特征融合的实时手写体数字识别方法
US9858496B2 (en) * 2016-01-20 2018-01-02 Microsoft Technology Licensing, Llc Object detection and classification in images
CN107239786B (zh) * 2016-03-29 2022-01-11 阿里巴巴集团控股有限公司 一种字符识别方法和装置
US10818398B2 (en) * 2018-07-27 2020-10-27 University Of Miami System and method for AI-based eye condition determinations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101140625A (zh) * 2006-09-06 2008-03-12 中国科学院自动化研究所 一种多分辨率退化字符自适应识别系统及方法
JP2009048641A (ja) * 2007-08-20 2009-03-05 Fujitsu Ltd 文字認識方法および文字認識装置
CN101630367A (zh) * 2009-07-31 2010-01-20 北京科技大学 基于多分类器的手写体字符识别拒识方法
CN105095889A (zh) * 2014-04-22 2015-11-25 阿里巴巴集团控股有限公司 特征提取、字符识别、引擎生成、信息确定方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3422256A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111527528A (zh) * 2017-11-15 2020-08-11 天使游戏纸牌股份有限公司 识别系统
CN109145891A (zh) * 2018-06-27 2019-01-04 上海携程商务有限公司 客户端及其识别身份证的方法、识别身份证的系统
CN109376731A (zh) * 2018-08-24 2019-02-22 北京三快在线科技有限公司 一种文字识别方法和装置
CN110765870A (zh) * 2019-09-18 2020-02-07 北京三快在线科技有限公司 一种ocr识别结果的置信度确定方法、装置及电子设备
CN110765870B (zh) * 2019-09-18 2021-01-12 北京三快在线科技有限公司 一种ocr识别结果的置信度确定方法、装置及电子设备

Also Published As

Publication number Publication date
US10872274B2 (en) 2020-12-22
EP3422256B1 (en) 2023-06-07
EP3422256A1 (en) 2019-01-02
CN107239786A (zh) 2017-10-10
TW201734890A (zh) 2017-10-01
EP3422256A4 (en) 2019-10-09
TWI766855B (zh) 2022-06-11
CN107239786B (zh) 2022-01-11
US20190026607A1 (en) 2019-01-24

Similar Documents

Publication Publication Date Title
WO2017167046A1 (zh) 一种字符识别方法和装置
Sameen et al. Classification of very high resolution aerial photos using spectral-spatial convolutional neural networks
US11657602B2 (en) Font identification from imagery
WO2017067456A1 (zh) 一种识别图像中的字符串的方法和装置
WO2019174130A1 (zh) 票据识别方法、服务器及计算机可读存储介质
WO2021212736A1 (zh) 特征融合块、卷积神经网络、行人重识别方法及相关设备
US10013628B2 (en) Information processing apparatus and information processing method
Mathew et al. Multilingual OCR for Indic scripts
WO2020223859A1 (zh) 一种检测倾斜文字的方法、装置及设备
US10043057B2 (en) Accelerating object detection
US20140226904A1 (en) Information processing apparatus, information processing method, and non-transitory computer readable medium
WO2021027218A1 (zh) 文本分类的方法、装置以及计算机可读介质
US20130268476A1 (en) Method and system for classification of moving objects and user authoring of new object classes
Rehman et al. Efficient coarser‐to‐fine holistic traffic sign detection for occlusion handling
CN110796145B (zh) 基于智能决策的多证件分割关联方法及相关设备
US10217020B1 (en) Method and system for identifying multiple strings in an image based upon positions of model strings relative to one another
US10685253B2 (en) Advanced cloud detection using neural networks and optimization techniques
CN114998592A (zh) 用于实例分割的方法、装置、设备和存储介质
CN110991303A (zh) 一种图像中文本定位方法、装置及电子设备
CN111062385A (zh) 一种用于图像文本信息检测的网络模型构建方法与系统
WO2022262239A1 (zh) 文本识别方法、装置、设备及存储介质
CN114581682A (zh) 基于自注意力机制的图像特征提取方法、装置及设备
US11972626B2 (en) Extracting multiple documents from single image
CN116092105B (zh) 表格结构的解析方法和装置
WO2023220859A1 (en) Multi-dimensional attention for dynamic convolutional kernel

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2017773076

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017773076

Country of ref document: EP

Effective date: 20180927

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17773076

Country of ref document: EP

Kind code of ref document: A1