CN110288052A - Character identifying method, device, equipment and computer-readable medium - Google Patents

Character identifying method, device, equipment and computer-readable medium Download PDF

Info

Publication number
CN110288052A
CN110288052A CN201910697687.1A CN201910697687A CN110288052A CN 110288052 A CN110288052 A CN 110288052A CN 201910697687 A CN201910697687 A CN 201910697687A CN 110288052 A CN110288052 A CN 110288052A
Authority
CN
China
Prior art keywords
character
identification
text
identification text
pca
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910697687.1A
Other languages
Chinese (zh)
Inventor
张晴晴
徐冉
段由
杨金富
罗磊
马光谦
汪洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING WISDOM TECHNOLOGY Co Ltd
Original Assignee
BEIJING WISDOM TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING WISDOM TECHNOLOGY Co Ltd filed Critical BEIJING WISDOM TECHNOLOGY Co Ltd
Publication of CN110288052A publication Critical patent/CN110288052A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

This application involves a kind of character identifying method, device, equipment and computer-readable mediums.The described method includes: obtaining the scanning file of file destination, and image procossing is carried out to the scanning file;Character recognition is carried out to the target image that image procossing obtains using optical character identification OCR technique, obtains the first identification text;Wherein, when carrying out character recognition using the OCR technique, dimensionality reduction is carried out to the character feature in the target image using R1_PCA.The application in OCR Text region Feature Dimension Reduction by using R1_PCA dimensionality reduction technology, it is blended using R1_PCA and OCR technique, the interference of noise can be reduced using this dimension reduction method, to promote the accuracy of OCR technique in character features there are when noise.

Description

Character identifying method, device, equipment and computer-readable medium
Technical field
This application involves field of computer technology more particularly to a kind of character identifying method, device, equipment and computer can Read medium.
Background technique
As the temperature of artificial intelligence rises, this field of image recognition is also gradually of interest by people.Optical character is known (Optical Character Recognition, OCR) does not refer to that electronic equipment (such as scanner or digital camera) checks paper The character of upper printing determines its shape by the mode for detecting dark, bright, shape is then translated into calculating with character identifying method The process of machine text.
However, tradition using OCR technique to character recognition when used dimension reduction method such as PCA, LDA etc., be all with L2 The distance metric square as loss function of norm, when, there are when noise, PCA, LDA do not have robustness in feature, because Objective function is that error sum of squares (L2 norm) makes these algorithms have amplification to exceptional value, and small abnormal data is all It may to estimate that subspace deviation is larger, can not reflect true situation.It is sensitive to the exceptional value (noise) in sample.
Summary of the invention
In order to solve the above-mentioned technical problem or it at least is partially solved above-mentioned technical problem, this application provides a kind of words Accord with recognition methods, device, equipment and computer-readable medium.
In a first aspect, this application provides a kind of character identifying methods, comprising:
The scanning file of file destination is obtained, and image procossing is carried out to the scanning file;
Character recognition is carried out to the target image that image procossing obtains using optical character identification OCR technique, obtains first Identify text;Wherein, when carrying out character recognition using the OCR technique, using R1_PCA to the word in the target image It accords with feature and carries out dimensionality reduction.
Optionally, distance metric of the R1_PCA using the first power of R1 norm as loss function:
Wherein, X ∈ Rm×nIndicate that character features extract matrix, U ∈ Rm×dIndicate axis of projection, V=UTX indicates the text after dimensionality reduction Word eigenmatrix.
Optionally, the method also includes:
Obtain the pdf document of the file destination;
Identify the second identification text in the pdf document;
By the second identification text compared with the first identification text, the first identification text and described the is determined Difference character between two identification texts.
Optionally, the method also includes:
The difference character is marked in the first identification text and the second identification text;
And/or the difference word in the second identification text is replaced using the difference character in the first identification text Symbol;
And/or the difference word in the first identification text is replaced using the difference character in the second identification text Symbol.
Second aspect, the application also provide a kind of character recognition device, comprising:
First obtains module, carries out image procossing for obtaining the scanning file of file destination, and to the scanning file;
First identification module, for being carried out using optical character identification OCR technique to the target image that image procossing obtains Character recognition obtains the first identification text;Wherein, when carrying out character recognition using the OCR technique, using R1_PCA to institute The character feature stated in target image carries out dimensionality reduction.
Optionally, distance metric of the R1_PCA using the first power of R1 norm as loss function:
Wherein, X ∈ Rm×nIndicate that character features extract matrix, U ∈ Rm×dIndicate axis of projection, V=UTX indicates the text after dimensionality reduction Word eigenmatrix.
Optionally, described device further include:
Second obtains module, for obtaining the pdf document of the file destination;
Second identification module, second in the pdf document identifies text for identification;
Comparison module, for the second identification text compared with the first identification text, to be determined that described first knows Difference character between other text and the second identification text.
Optionally, described device further include:
Labeling module, for marking the difference character in the first identification text and the second identification text;
And/or first replacement module, know for replacing described second using the difference character in the first identification text Difference character in other text;
And/or second replacement module, know for replacing described first using the difference character in the second identification text Difference character in other text.
The third aspect, present invention also provides a kind of character recognition device, including memory, processor, the memories In be stored with the computer program that can be run on the processor, the processor is realized when executing the computer program The step of stating method described in first aspect.
Fourth aspect, present invention also provides a kind of computers of non-volatile program code that can be performed with processor Readable medium, said program code make the processor execute method described in the first aspect.
Above-mentioned technical proposal provided by the embodiments of the present application has the advantages that compared with prior art
The application uses R1_PCA and OCR skill by using R1_PCA dimensionality reduction technology in OCR Text region Feature Dimension Reduction Art blends, and the interference of noise can be reduced using this dimension reduction method, to be promoted in character features there are when noise The accuracy of OCR technique.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention Example, and be used to explain the principle of the present invention together with specification.
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, for those of ordinary skill in the art Speech, without any creative labor, is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of a kind of flow chart of character identifying method provided by the embodiments of the present application;
Fig. 2 is that a kind of dimensionality reduction provided by the embodiments of the present application influences to compare schematic diagram;
Fig. 3 is the schematic diagram of species diversity character label provided by the embodiments of the present application;
Fig. 4 is a kind of structure chart of character recognition device provided by the embodiments of the present application.
Specific embodiment
To keep the purposes, technical schemes and advantages of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the application, instead of all the embodiments.Based on the embodiment in the application, ordinary skill people Member's every other embodiment obtained without making creative work, shall fall in the protection scope of this application.
Used dimension reduction method such as PCA, LDA etc., are with L2 norm when due to using OCR technique to character recognition Distance metric square as loss function, when, there are when noise, PCA, LDA do not have robustness, because of target letter in feature Number is that error sum of squares (L2 norm) makes these algorithms have amplification to exceptional value, and small abnormal data may all make It must estimate that subspace deviation is larger, can not reflect true situation.It is sensitive to the exceptional value (noise) in sample.For this purpose, this Shen Please embodiment a kind of character identifying method is provided, as shown in Figure 1, the method may include following steps:
Step S101 obtains the scanning file of file destination, and carries out image procossing to the scanning file;
In the embodiment of the present application, file destination can be illustratively the paper files such as contract, the scanning of file destination File is the file obtained after being scanned file destination.
In practical applications, image procossing refers to before identifying text, pre-processes to original image, so as to subsequent Feature extraction and study.This process generally comprises: the sub-steps such as gray processing, binaryzation, noise reduction, Slant Rectify, character segmentation.
Gray processing, in RGB model, if when R=G=B, colour indicates a kind of greyscale color, wherein R=G=B Value is gray value, and therefore, each pixel of gray level image only needs byte storage gray value (also known as intensity value, brightness value), ash Degree range is 0-255.It succeeds in reaching an agreement custom a bit, a color image is exactly become into black and white picture.The generally important method of gray processing, most Four kinds of big value method, mean value method, weighted mean method methods carry out gray processing to color image.
Binaryzation, piece image includes target object, there are also noises for background, to directly mention from the digital picture of multivalue Target object is taken out, most common method is exactly to set a threshold value T, and the data of image are divided into two parts: greater than T with T Pixel group and pixel group less than T.This is the most special method for studying greyscale transformation, the referred to as binaryzation of image.Binaryzation Black and white picture do not include grey, only pure white and two kinds of colors of black.In binaryzation it is most important be exactly threshold value selection, one As be divided into fixed threshold and adaptive threshold.More commonly used binarization method then has: Two-peak method, P parametric method, iterative method and OTSU method etc..
Image noise reduction, reality in digital picture digitlization and transmission process in be subjected to imaging device and external environment Noise jamming etc. influences, referred to as noisy image or noise image.The process for reducing noise in digital picture is known as image noise reduction.Figure The source of noise is there are many kind as in, these noise sources are in various aspects such as Image Acquisition, transmission, compressions.The type of noise Also different, for example salt-pepper noise, Gaussian noise etc. have different Processing Algorithms for different noises.It is obtained in previous step To image in it can be seen that many fragmentary pores, here it is the noises in image, can greatly interfere with our programs pair In the cutting and identification of picture, therefore we need noise reduction process.Noise reduction is extremely important in this stage, the quality of noise reduction algorithm Influence to feature extraction is very big.
Slant Rectify, for a user, impossible absolute level when taking pictures, so, it would be desirable to pass through journey Image is done rotation processing by sequence, and to look for the position for thinking most probable level, the figure cut out in this way is possible to be best An effect.The most common method of Slant Rectify is Hough transformation, and principle is that picture is carried out expansion process, will be interrupted Text draws a straight line, and is convenient for straight-line detection.After calculating the angle of straight line picture can will be tilted using Rotation Algorithm It is remedied to horizontal position.
Step S102 carries out character recognition to the target image that image procossing obtains using optical character identification OCR technique, Obtain the first identification text;
In the embodiment of the present application, feature extraction and dimensionality reduction can be carried out first when character recognition, be characterized in for identifying The key message of text, each different text can be transferred through feature to distinguish with other texts.For number and English For letter, this feature extraction is to be relatively easy to, and in total with regard to 10+26x 2=52 character, and is all small size character set. For Chinese character, the difficulty of feature extraction is with regard to bigger, because Chinese character is large character set first;Secondly light is most in national standard Common first order Chinese character just has 3755;Last Hanzi structure is complicated, and nearly word form is more, and characteristic dimension is with regard to bigger.In determination Using after which kind of feature, it is also possible to Yao Jinhang Feature Dimension Reduction, in this case, and if the dimension of feature is too high, classifier Efficiency will receive very big influence, in order to improve recognition rate, will often carry out dimensionality reduction, this process is also critically important, both Intrinsic dimensionality is reduced, it is (different to distinguish that the feature vector after making reduction dimension again also retains enough information content Text).
Then, the character features extraction in conjunction with R1_PCA dimensionality reduction technology and dimensionality reduction.
In the embodiment of the present application, when carrying out character recognition using the OCR technique, using R1_PCA to the target Character feature in image carries out dimensionality reduction.
Distance metric of the R1_PCA using the first power of R1 norm as loss function:
Wherein, X ∈ Rm×nIndicate that character features extract matrix, U ∈ Rm×dIndicate axis of projection, V=UTX indicates the text after dimensionality reduction Word eigenmatrix.
Distance metric by R1_PCA using the first power of R1 norm as loss function, it is possible to reduce the influence of noise.R1_ PCA algorithm first defines invariable rotary L1 norm i.e. R1 norm, and R1 norm is provided simultaneously with the insensitive (Shandong of exceptional value of L1 norm Stick) and L2 norm rotational invariance.
Therefore when there is noise in extraction feature, R1_PCA algorithm dimensionality reduction is utilized, it is possible to reduce the influence of noise, more preferably Retain sample in information, have higher accuracy in terms of error reconstruction.
As illustrated in fig. 2, it is assumed that sample drops to one-dimensional from two dimension, the point of black is sample point in figure, and red point is abnormal Value, i.e. noise spot, when wherein W1 is noiseless point, PCA and the axis of projection after R1_PCA dimensionality reduction, W2 and W3 are respectively to work as to exist to make an uproar Axis of projection when sound, using R1_PCA, after PCA dimensionality reduction.It, can be by because PCA is using L2 norm square as distance metric To noise spot large effect.For R1_PCA using R1 norm as distance metric, influence is smaller, and has rotational invariance.
When carrying out dimensionality reduction operation with character features of the R1_PCA algorithm to extraction, R1_PCA has robustness, simultaneously The solution arrived keeps rotational invariance.If character features extract matrix X=(xij)∈Rm×n, then the expression of the R1 norm of matrix X is such as Under:
Enable X=(x1,x2,…,xn), thenAssuming that ξ=(| | x1||2,||x2||2,…,||xn||2), ThenTherefore
Wherein, X=(xij)∈Rm×nIndicate that character features extract matrix.
The verifying that R1 norm meets three characteristics of norm is as follows:
Assuming that A, B are two arbitrary matrixes,By the definition of R1 norm, it is known that
1.And
2.
3.
From the above, it can be seen that R1 norm meets the fundamental property of norm, therefore R1 norm is also a kind of norm.
R1- PCA is defined as follows:
Wherein X indicates that character features extract matrix, and U indicates axis of projection.
Therefrom it can be seen that objective function is that have rotational invariance.At the same time, the institute of the robustness advantage of L1-PCA It is because the summation in formula (1) is the R1 mould with the model solution projection error of L1 norm to be retained.
It constructs Lagrangian and solves (1):
Wherein U indicates that axis of projection, Λ indicate Lagrange multiplier matrix.
Due to matrix U UTIt is symmetrical, therefore Λ ∈ Rd×dIt is a symmetrical matrix.
LagrangianL (U, Λ) is sought into partial derivative to U and it is enabled to be equal to 0, then (1) optimal solution meets KKT equation:
Wherein CwFor the weighting covariance matrix of R1-PCA:
It follows that and if only if U be Span (ξ1,…,ξd) any group of normal orthogonal substrate when, (1) formula reaches minimum It is worth, wherein ξiIt is CwI-th of maximum eigenvalue corresponding to normal orthogonal feature vector.
Covariance matrix depends on U, different from L1-PCA, and closed solutions are not present in (1).So directly for CwCarry out feature Value decomposition is infeasible.Chris Ding proposes orthogonal iteration algorithm.Detailed process:
1) U ∈ R is initializedm×d, and calculate residual
2) U is updated by following formula:
M=CwU,
3) iteration executes step 2), and until convergence, iteration terminates.
It is to use R1_PCA that can reduce exceptional value interference, the conjunction with robustness as character features dimension reduction method above Rationality, and the method for solving R1_PCA are summarized.
After combining the character features of R1_PCA dimensionality reduction technology to extract and dimensionality reduction, then classifier design, training are carried out, it is right One character image, extracts feature, loses to classifier, and classifier just classifies to it, tells your this feature the identification At which text.The design of classifier is exactly our task.The design method of classifier generally has: template matching method, differentiation Function method, neural network classification method, Process Based method etc., are not unfolded to describe here.Before carrying out practical identification, toward contact Classifier is trained, this is the process of a supervised learning.Mature classifier also has very much, there is SVM, CNN etc..
It is finally post-processed, namely the classification results of classifier is optimized, this will generally be related to natural language The scope of understanding.Be the processing of nearly word form first: lifting chestnut, " dividing " and " ", shape was close, but if encounter " score " this A word should not just be identified as " counting ", because " score " is only a normal word.This need by language model come into Row is corrected.Followed by for the processing of text composition: such as some books are to be divided to or so two columns, with a line two column of left and right not Belong to in short, any phraseological connection is not present.It, will be the end of left lateral and opening for right lateral if cut according to row Head connects together, this is that we are not intended to see, such situation needs to carry out specially treated.
The application uses R1_PCA and OCR skill by using R1_PCA dimensionality reduction technology in OCR Text region Feature Dimension Reduction Art blends, and the interference of noise can be reduced using this dimension reduction method, to be promoted in character features there are when noise The accuracy of OCR technique.
In the another embodiment of the application, the method also includes:
Obtain the pdf document of the file destination;
Identify the second identification text in the pdf document;
By the second identification text compared with the first identification text, the first identification text and described the is determined Difference character between two identification texts.
In the another embodiment of the application, determining between the first identification text and the second identification text After difference character, the method also includes:
Mark the difference character in the first identification text and the second identification text (referring to Fig. 3);
And/or the difference word in the second identification text is replaced using the difference character in the first identification text Symbol;
And/or the difference word in the first identification text is replaced using the difference character in the second identification text Symbol.
In the another embodiment of the application, as shown in figure 4, also providing a kind of character recognition device, comprising:
First obtains module 11, carries out at image for obtaining the scanning file of file destination, and to the scanning file Reason;
First identification module 12, target image for being obtained using optical character identification OCR technique to image procossing into Line character identification, obtains the first identification text;Wherein, when carrying out character recognition using the OCR technique, R1_PCA pairs is used Character feature in the target image carries out dimensionality reduction.
In the another embodiment of the application, distance degree of the R1_PCA using the first power of R1 norm as loss function Amount:
Wherein, X ∈ Rm×nIndicate that character features extract matrix, U ∈ Rm×dIndicate axis of projection, V=UTX indicates the text after dimensionality reduction Word eigenmatrix.
In the another embodiment of the application, described device further include:
Second obtains module, for obtaining the pdf document of the file destination;
Second identification module, second in the pdf document identifies text for identification;
Comparison module, for the second identification text compared with the first identification text, to be determined that described first knows Difference character between other text and the second identification text.
In the another embodiment of the application, described device further include:
Labeling module, for marking the difference character in the first identification text and the second identification text;
And/or first replacement module, know for replacing described second using the difference character in the first identification text Difference character in other text;
And/or second replacement module, know for replacing described first using the difference character in the second identification text Difference character in other text.
In the another embodiment of the application, a kind of character recognition device, including memory, processor are also provided, it is described The computer program that can be run on the processor is stored in memory, when the processor executes the computer program The step of realizing method described in above method embodiment.
In the another embodiment of the application, a kind of non-volatile program code that can be performed with processor is also provided Computer-readable medium, said program code make the processor execute method described in preceding method embodiment.
It should be noted that, in this document, the relational terms of such as " first " and " second " or the like are used merely to one A entity or operation with another entity or operate distinguish, without necessarily requiring or implying these entities or operation it Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant are intended to Cover non-exclusive inclusion, so that the process, method, article or equipment for including a series of elements not only includes those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or setting Standby intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in the process, method, article or apparatus that includes the element.
The above is only a specific embodiment of the invention, is made skilled artisans appreciate that or realizing this hair It is bright.Various modifications to these embodiments will be apparent to one skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and applied principle and features of novelty phase one herein The widest scope of cause.

Claims (10)

1. a kind of character identifying method characterized by comprising
The scanning file of file destination is obtained, and image procossing is carried out to the scanning file;
Character recognition is carried out to the target image that image procossing obtains using optical character identification OCR technique, obtains the first identification Text;Wherein, special to the character in the target image using R1_PCA when carrying out character recognition using the OCR technique Sign carries out dimensionality reduction.
2. character identifying method according to claim 1, which is characterized in that the R1_PCA is made with the first power of R1 norm For the distance metric of loss function:
Wherein, X ∈ Rm×nIndicate that character features extract matrix, U ∈ Rm×dIndicate axis of projection, V=UTX indicates that the text after dimensionality reduction is special Levy matrix.
3. character identifying method according to claim 1, which is characterized in that the method also includes:
Obtain the pdf document of the file destination;
Identify the second identification text in the pdf document;
By the second identification text compared with the first identification text, determine that the first identification text and described second is known Difference character between other text.
4. character identifying method according to claim 3, which is characterized in that the method also includes:
The difference character is marked in the first identification text and the second identification text;
And/or the difference character in the second identification text is replaced using the difference character in the first identification text;
And/or the difference character in the first identification text is replaced using the difference character in the second identification text.
5. a kind of character recognition device characterized by comprising
First obtains module, carries out image procossing for obtaining the scanning file of file destination, and to the scanning file;
First identification module, for carrying out character to the target image that image procossing obtains using optical character identification OCR technique Identification, obtains the first identification text;Wherein, when carrying out character recognition using the OCR technique, using R1_PCA to the mesh Character feature in logo image carries out dimensionality reduction.
6. character recognition device according to claim 5, which is characterized in that the R1_PCA is made with the first power of R1 norm For the distance metric of loss function:
Wherein, X ∈ Rm×nIndicate that character features extract matrix, U ∈ Rm×dIndicate axis of projection, V=UTX indicates that the text after dimensionality reduction is special Levy matrix.
7. character recognition device according to claim 5, which is characterized in that described device further include:
Second obtains module, for obtaining the pdf document of the file destination;
Second identification module, second in the pdf document identifies text for identification;
Comparison module, for the second identification text compared with the first identification text, to be determined the first identification text This identifies the difference character between text described second.
8. character recognition device according to claim 7, which is characterized in that described device further include:
Labeling module, for marking the difference character in the first identification text and the second identification text;
And/or first replacement module, for replacing the second identification text using the difference character in the first identification text Difference character in this;
And/or second replacement module, for replacing the first identification text using the difference character in the second identification text Difference character in this.
9. a kind of character recognition device, including memory, processor, it is stored with and can transports on the processor in the memory Capable computer program, which is characterized in that the processor realizes the claims 1 to 4 when executing the computer program The step of described in any item methods.
10. a kind of computer-readable medium for the non-volatile program code that can be performed with processor, which is characterized in that described Program code makes the processor execute described any the method for claim 1-4.
CN201910697687.1A 2019-03-27 2019-07-30 Character identifying method, device, equipment and computer-readable medium Pending CN110288052A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2019102361346 2019-03-27
CN201910236134.6A CN109919253A (en) 2019-03-27 2019-03-27 Character identifying method, device, equipment and computer-readable medium

Publications (1)

Publication Number Publication Date
CN110288052A true CN110288052A (en) 2019-09-27

Family

ID=66967035

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201910236134.6A Withdrawn CN109919253A (en) 2019-03-27 2019-03-27 Character identifying method, device, equipment and computer-readable medium
CN201910697687.1A Pending CN110288052A (en) 2019-03-27 2019-07-30 Character identifying method, device, equipment and computer-readable medium

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201910236134.6A Withdrawn CN109919253A (en) 2019-03-27 2019-03-27 Character identifying method, device, equipment and computer-readable medium

Country Status (1)

Country Link
CN (2) CN109919253A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111728A (en) * 2021-03-22 2021-07-13 广西电网有限责任公司电力科学研究院 Intelligent identification method and system for power production operation risk in transformer substation

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751565A (en) * 2008-12-10 2010-06-23 中国科学院自动化研究所 Method for character identification through fusing binary image and gray level image
CN105260727A (en) * 2015-11-12 2016-01-20 武汉大学 Academic-literature semantic restructuring method based on image processing and sequence labeling
CN105335689A (en) * 2014-08-06 2016-02-17 阿里巴巴集团控股有限公司 Character recognition method and apparatus
CN105550524A (en) * 2013-07-17 2016-05-04 中国中医科学院 Novel clinical case data collection system and collection method
US20180101726A1 (en) * 2016-10-10 2018-04-12 Insurance Services Office Inc. Systems and Methods for Optical Character Recognition for Low-Resolution Documents
CN108288078A (en) * 2017-12-07 2018-07-17 腾讯科技(深圳)有限公司 Character identifying method, device and medium in a kind of image
CN108573707A (en) * 2017-12-27 2018-09-25 北京金山云网络技术有限公司 A kind of processing method of voice recognition result, device, equipment and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751565A (en) * 2008-12-10 2010-06-23 中国科学院自动化研究所 Method for character identification through fusing binary image and gray level image
CN105550524A (en) * 2013-07-17 2016-05-04 中国中医科学院 Novel clinical case data collection system and collection method
CN105335689A (en) * 2014-08-06 2016-02-17 阿里巴巴集团控股有限公司 Character recognition method and apparatus
CN105260727A (en) * 2015-11-12 2016-01-20 武汉大学 Academic-literature semantic restructuring method based on image processing and sequence labeling
US20180101726A1 (en) * 2016-10-10 2018-04-12 Insurance Services Office Inc. Systems and Methods for Optical Character Recognition for Low-Resolution Documents
CN108288078A (en) * 2017-12-07 2018-07-17 腾讯科技(深圳)有限公司 Character identifying method, device and medium in a kind of image
CN108573707A (en) * 2017-12-27 2018-09-25 北京金山云网络技术有限公司 A kind of processing method of voice recognition result, device, equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DING C等: "R1-PCA rotational invariant l1-norm principal component analysis for robust subspace factorization", 《INTERNATIONAL CONFERENCE ON MACHINE LEARNING》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113111728A (en) * 2021-03-22 2021-07-13 广西电网有限责任公司电力科学研究院 Intelligent identification method and system for power production operation risk in transformer substation

Also Published As

Publication number Publication date
CN109919253A (en) 2019-06-21

Similar Documents

Publication Publication Date Title
Chou et al. A binarization method with learning-built rules for document images produced by cameras
US8644616B2 (en) Character recognition
Cheriet et al. Character recognition systems: a guide for students and practitioners
US8744196B2 (en) Automatic recognition of images
He et al. Beyond OCR: Multi-faceted understanding of handwritten document characteristics
US11790675B2 (en) Recognition of handwritten text via neural networks
CN112613502A (en) Character recognition method and device, storage medium and computer equipment
Dash et al. A hybrid feature and discriminant classifier for high accuracy handwritten Odia numeral recognition
CN109784342A (en) A kind of OCR recognition methods and terminal based on deep learning model
CN112329779A (en) Method and related device for improving certificate identification accuracy based on mask
Cao et al. Preprocessing of low-quality handwritten documents using markov random fields
Mishra et al. Unsupervised refinement of color and stroke features for text binarization
Sampath et al. Handwritten optical character recognition by hybrid neural network training algorithm
Wicht et al. Camera-based sudoku recognition with deep belief network
Verma et al. Script identification in natural scene images: a dataset and texture-feature based performance evaluation
CN110288052A (en) Character identifying method, device, equipment and computer-readable medium
Aravinda et al. Template matching method for Kannada handwritten recognition based on correlation analysis
Choudhary et al. A neural approach to cursive handwritten character recognition using features extracted from binarization technique
Narasimhaiah et al. Degraded character recognition from old Kannada documents.
Sharma et al. Pattern storage & recalling using Hopfield neural network and HOG feature based SVM classifier: An experiment with handwritten Odia numerals
Aggarwal et al. Content directed enhancement of degraded document images
Jacob Optical Character Recognition system with Projection Profile based segmentation and Deep Learning Techniques
Shine et al. An approach for improving Optical Character Recognition using Contrast enhancement technique
Mapari et al. A Study Of Devnagri Handwritten Character Recognition System
Kaur et al. Computational techniques to recognize Indian stone inscriptions and manuscripts: A review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 411, 4th floor, building 4, No.44, Middle North Third Ring Road, Haidian District, Beijing 100088

Applicant after: Beijing Qingshu Intelligent Technology Co.,Ltd.

Address before: 100044 1415, 14th floor, building 1, yard 59, gaoliangqiaoxie street, Haidian District, Beijing

Applicant before: BEIJING AISHU WISDOM TECHNOLOGY CO.,LTD.