CN111091128A - Character and picture classification method and device and electronic equipment - Google Patents

Character and picture classification method and device and electronic equipment Download PDF

Info

Publication number
CN111091128A
CN111091128A CN201911314877.7A CN201911314877A CN111091128A CN 111091128 A CN111091128 A CN 111091128A CN 201911314877 A CN201911314877 A CN 201911314877A CN 111091128 A CN111091128 A CN 111091128A
Authority
CN
China
Prior art keywords
picture
processed
character
classification
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911314877.7A
Other languages
Chinese (zh)
Other versions
CN111091128B (en
Inventor
薛亮
杨陆
张超
王晓宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Data Driven Technology Co ltd
Original Assignee
Beijing Data Driven Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Data Driven Technology Co ltd filed Critical Beijing Data Driven Technology Co ltd
Priority to CN201911314877.7A priority Critical patent/CN111091128B/en
Publication of CN111091128A publication Critical patent/CN111091128A/en
Application granted granted Critical
Publication of CN111091128B publication Critical patent/CN111091128B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention provides a character and picture classification method, a device and electronic equipment, wherein the method comprises the following steps: extracting the characteristics of the acquired character picture to be processed to obtain characteristic data; matching the feature data with the picture features in a preset sample library; and if the picture characteristics matched with the characteristic data do not exist in the preset sample library, determining the classification of the character picture to be processed according to the similarity between the characteristic data and the picture characteristics. According to the method, the characteristic data of the character picture to be processed is matched with the picture characteristics in the preset sample library, and under the condition that matching fails, similarity matching is performed based on the similarity between the characteristic data and the picture characteristics to determine the classification of the character picture to be processed.

Description

Character and picture classification method and device and electronic equipment
Technical Field
The invention relates to the technical field of image recognition, in particular to a character and picture classification method and device and electronic equipment.
Background
In the related art, a method for classifying character pictures generally matches feature data of a character picture with feature data corresponding to a preset character picture library, and determines a character corresponding to the feature data completely matched with the feature data of the character picture in a preset character picture as a classification corresponding to the character picture if the completely matched feature data exists. However, in the case where there is no completely matched feature data, it is difficult to classify character pictures, and therefore, this method is not suitable.
Disclosure of Invention
The invention aims to provide a character and picture classification method, a character and picture classification device and electronic equipment, so as to improve the applicability of character classification.
In a first aspect, an embodiment of the present invention provides a character and picture classification method, where the method includes: extracting the characteristics of the acquired character picture to be processed to obtain characteristic data; matching the feature data with the picture features in a preset sample library; and if the picture characteristics matched with the characteristic data do not exist in the preset sample library, determining the classification of the character picture to be processed according to the similarity between the characteristic data and the picture characteristics.
In an optional embodiment, before the step of performing feature extraction on the acquired character picture to be processed to obtain feature data, the method further includes: and carrying out normalization processing on the acquired character pictures to be processed to obtain the character pictures to be processed with the preset pixel number.
In an optional embodiment, the step of extracting the features of the acquired character picture to be processed to obtain feature data includes: carrying out binarization processing on the character picture to be processed; determining the characteristic value of each pixel point according to the pixel value of the character picture to be processed after binarization processing; and splicing the characteristic values of all the pixel points line by line to obtain the characteristic data of the character picture to be processed.
In an optional embodiment, the preset sample library includes a plurality of image classifications, each image classification includes a plurality of sample images, each sample image corresponds to an image feature, and the image feature includes a feature value corresponding to each pixel point in the sample image; the step of determining the classification of the character picture to be processed according to the similarity between the feature data and the picture features comprises the following steps: determining a similarity index corresponding to a preset sample library according to the picture characteristics in each picture classification; according to the similarity index, calculating the similarity between the character picture to be processed and the picture classification; and classifying the pictures with high similarity as the classification of the character pictures to be processed.
In an optional embodiment, the feature value corresponding to each pixel point in the sample picture is a first numerical value or a second numerical value; the step of determining the similarity index corresponding to the preset sample library according to the picture features in each picture classification includes: for each picture classification in a preset sample library, executing the following steps: aiming at each pixel point, calculating the number of sample pictures corresponding to the current picture classification with the characteristic value as a first numerical value and the number of sample pictures with the characteristic value as a second numerical value; obtaining the probability that the characteristic value on each pixel point is the first numerical value and the probability that the characteristic value is the second numerical value according to the number of the first numerical value and the number of the second numerical value; and determining the probability that the characteristic value on each pixel point corresponding to each picture classification is a first numerical value and the probability that the characteristic value is a second numerical value as a similarity index.
In an optional embodiment, the step of calculating the similarity between the character picture to be processed and the picture classification according to the similarity index includes: obtaining the score of the character picture to be processed on each picture classification according to the probability that the characteristic value on each pixel point corresponding to each picture classification is a first numerical value and the probability that the characteristic value is a second numerical value; the step of classifying the pictures with high similarity as the character pictures to be processed includes: and classifying the pictures with the highest scores, and determining the pictures as the classes of the character pictures to be processed.
In an optional embodiment, the step of obtaining the score of the character picture to be processed in each picture classification according to the probability that the feature value is the first numerical value and the probability that the feature value is the second numerical value on each pixel point corresponding to each picture classification includes: for each picture category, the following steps are performed: determining the probability corresponding to the characteristic value of each pixel point in the character picture to be processed according to the probability that the characteristic value on each pixel point corresponding to the current picture classification is a first numerical value and the probability that the characteristic value is a second numerical value; and adding the probabilities corresponding to each pixel point to obtain the score of the character picture to be processed on the current picture classification.
In a second aspect, an embodiment of the present invention provides a character and picture classification apparatus, where the apparatus includes: the characteristic extraction module is used for extracting the characteristics of the acquired character picture to be processed to obtain characteristic data; the complete matching module is used for completely matching the feature data with the picture features in the preset sample library; and the similarity matching module is used for determining the classification of the character picture to be processed according to the similarity between the feature data and the picture features if the picture features completely matched with the feature data do not exist in the preset sample library.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a processor and a memory, where the memory stores machine executable instructions that can be executed by the processor, and the processor executes the machine executable instructions to implement the character picture classification method according to any one of the foregoing embodiments.
In a fourth aspect, embodiments of the present invention provide a machine-readable storage medium storing machine-executable instructions, which when invoked and executed by a processor, cause the processor to implement the character picture classification method according to any one of the preceding embodiments.
The embodiment of the invention has the following beneficial effects:
the character picture classification method, the character picture classification device and the electronic equipment provided by the embodiment of the invention are characterized in that firstly, the obtained character picture to be processed is subjected to characteristic extraction to obtain characteristic data; matching the feature data with the picture features in a preset sample library; and if the picture characteristics matched with the characteristic data do not exist in the preset sample library, determining the classification of the character picture to be processed according to the similarity between the characteristic data and the picture characteristics. In the method, the feature data of the character picture to be processed is matched with the picture features in the preset sample library, and under the condition of failed matching, similarity matching is performed based on the similarity between the feature data and the picture features to determine the classification of the character picture to be processed.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention as set forth above.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a character and picture classification method according to an embodiment of the present invention;
fig. 2 is a flowchart of another character and picture classification method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a character and picture classification apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In view of the problem that the character and picture classification method in the prior art is poor in applicability, embodiments of the present invention provide a character and picture classification method, device and electronic device.
To facilitate understanding of the embodiment, first, a detailed description is given to a character and picture classification method disclosed in the embodiment of the present invention, as shown in fig. 1, the method includes the following specific steps:
and S102, extracting the characteristics of the acquired character picture to be processed to obtain characteristic data.
The character picture to be processed can be a picture of a character area obtained from an image, and the image can be a shopping receipt or other images containing characters; the character picture can comprise a character which can be a Chinese character, a letter, a number or other special labels; in a specific implementation, the characters in the image need to be divided to obtain a character picture corresponding to each character in the image.
In specific implementation, feature extraction may be performed according to a pixel value of the character picture to be processed, or may also be performed according to texture, color, and the like of the character picture to be processed, so as to obtain feature data, where the feature data is usually a feature value corresponding to each pixel point in the character picture to be processed.
And step S104, matching the feature data with the picture features in a preset sample library.
The preset sample library usually stores a large number of sample pictures and picture characteristics corresponding to each sample picture, and each sample picture has the picture classification. The sample picture usually includes characters such as english characters, numbers, symbols, chinese characters, and the like. And matching the feature data in the character picture to be processed with the picture features of the sample picture, and classifying the picture to which the sample picture corresponding to the picture features belongs as the classification of the character picture to be processed when the picture features completely consistent with the feature data exist in a preset sample library.
And step S106, if the picture characteristics matched with the characteristic data do not exist in the preset sample library, determining the classification of the character picture to be processed according to the similarity between the characteristic data and the picture characteristics.
When the picture features completely consistent with the feature data do not exist in the preset sample library, similarity matching needs to be performed on the feature data, namely, the similarity between the feature data and each picture feature is calculated, and the picture to which the sample picture corresponding to the picture feature with the highest similarity belongs is classified to be used as the classification of the character picture to be processed. The similarity may be determined generally according to a difference between a feature value of each pixel in the feature data and a feature value of each pixel in the picture feature, or may be determined according to the number of the same feature values in the feature value of each pixel in the feature data and the feature value of each pixel in the picture feature.
The character picture classification method provided by the embodiment of the invention comprises the steps of firstly, carrying out feature extraction on an obtained character picture to be processed to obtain feature data; matching the feature data with the picture features in a preset sample library; and if the picture characteristics matched with the characteristic data do not exist in the preset sample library, determining the classification of the character picture to be processed according to the similarity between the characteristic data and the picture characteristics. In the method, the feature data of the character picture to be processed is completely matched with the picture features in the preset sample library, and under the condition that the complete matching fails, the similarity matching is performed based on the similarity between the feature data and the picture features to determine the classification of the character picture to be processed.
The embodiment of the invention also provides another character and picture classification method, which is realized on the basis of the method of the embodiment; the method mainly describes a specific process of extracting the characteristics of the acquired character pictures to be processed to obtain characteristic data (realized through the following steps S204-S208), and a specific process of determining the classification of the character pictures to be processed according to the similarity of the characteristic data and the picture characteristics (realized through the following steps S210-S212); as shown in fig. 2, the method comprises the steps of:
step S202, acquiring a character picture to be processed.
In specific implementation, when a character picture to be processed is acquired, firstly, normalization processing needs to be performed on the acquired character picture to be processed, so as to obtain the character picture to be processed with the preset number of pixels. The predetermined number of pixels is usually determined by the size of the sample pictures in the predetermined sample library, for example, the size of the sample picture is 32 × 32, and then the predetermined pixel is 32 × 32, that is, the character picture to be processed is normalized and scaled to be 32 × 32 pixels. In the following manner, the size of the default character picture to be processed is consistent with the size of the sample picture in the preset sample library.
And step S204, carrying out binarization processing on the character picture to be processed.
The binary image can be obtained after the binarization processing, and usually, each pixel on the image has only two possible values or gray scale states, and the binary image can be represented by a black-and-white image or a monochrome image. In specific implementation, setting the pixel value of an image area corresponding to a character in a character picture to be processed to be 0, and setting the pixel value of an image area except the character to be 255 to obtain a binary image corresponding to the character picture to be processed; wherein a pixel value of 0 corresponds to black and a pixel value of 255 corresponds to white.
And step S206, determining the characteristic value of each pixel point according to the pixel value of the character picture to be processed after the binarization processing.
And setting pixel points with pixel values of 0 and 255 in the character picture to be processed after binarization processing as different characteristic values, wherein the characteristic values can be numerical values or different patterns and the like. For example, the pixel point with the pixel value of 0 may be set to 1, and the pixel point with the pixel value of 255 may be set to 0, that is, the black area of the character picture to be processed after the binarization processing is set to 1, and the white area is set to 0.
And S208, splicing the characteristic values of all the pixel points line by line to obtain the characteristic data of the character picture to be processed.
After the characteristic value of each pixel point of the character picture to be processed is obtained, the characteristic values of all the pixel points are extracted, and line-by-line splicing is carried out according to the position of each pixel point in the character picture to be processed to obtain a characteristic matrix or a characteristic sequence (also can be a character string), wherein the characteristic matrix or the characteristic sequence is the characteristic data of the character picture to be processed.
Step S210, judging whether picture characteristics matched with the characteristic data exist in a preset sample library or not; if yes, go to step S212; if not, go to step S214; the preset sample library comprises a plurality of image classifications, each image classification comprises a plurality of sample images, each sample image corresponds to an image characteristic, and the image characteristic comprises a characteristic value corresponding to each pixel point in the sample image.
Each sample picture in the preset sample library has a corresponding unique picture classification, each sample picture corresponds to a picture characteristic, and the picture characteristic is generally the same as the characteristic data in structure and determination mode, so that the picture characteristic and the characteristic data have a certain corresponding relationship. In some embodiments, hash operations may be performed on the picture features and the feature data respectively in the process of complete matching to reduce the dimensionality of data and save memory space, for example, 32-bit feature data may be obtained after the hash operations are performed on 1024-bit feature data.
Step S212, classifying the pictures corresponding to the picture characteristics, and determining the pictures as the classes corresponding to the character pictures to be processed; and (6) ending.
Step S214, according to the picture characteristics in each picture classification, determining a similarity index corresponding to a preset sample library.
The characteristic value corresponding to each pixel point in the sample picture is a first numerical value or a second numerical value; the first value and the second value may be set according to the requirement of the developer, for example, the first value may be set to 1, and the second value may be set to 0. In a specific implementation, the step S210 can be implemented by the following steps 10-11:
step 10, for each picture classification in a preset sample library, executing the following steps: aiming at each pixel point, calculating the number of sample pictures corresponding to the current picture classification with the characteristic value as a first numerical value and the number of sample pictures with the characteristic value as a second numerical value; and obtaining the probability that the characteristic value is the first numerical value and the probability that the characteristic value is the second numerical value on each pixel point according to the number of the first numerical values and the number of the second numerical values.
And step 11, determining the probability that the characteristic value of each pixel point corresponding to each picture classification is a first numerical value and the probability that the characteristic value is a second numerical value as a similarity index.
In specific implementation, it is assumed that the picture classification 1 in the preset sample library includes 10 sample pictures, the number of pixel points corresponding to each sample picture is 1024, that is, a picture feature composed of 1024 feature values, where a first numerical value corresponding to a feature value is set to 1, and a second numerical value is set to 0, and the following is a schematic result of the picture feature corresponding to 10 sample pictures, where each row of character strings represents the picture feature corresponding to one sample picture, each row of character strings should include 1024 numerical values, and because of limited space, only the first ten numerical values are shown:
1100000000...
0001000000...
1111000000...
0110000000...
1000000000...
1100000000...
1100000000...
1100000000...
1100000000...
0000000000...
for the picture characteristics of the sample picture, the number of characteristic values 1 and the number of characteristic values 0 on each pixel point can be calculated, and for the 10 characteristic pictures, 7 characteristic values on the first pixel point are 1, and 3 characteristic values are 1; 7 characteristic values on the second-bit pixel point are 1, and 3 characteristic values are 1; 2 characteristic values of the first pixel point are 1, and 8 characteristic values are 1; by analogy, the number of the characteristic values of 1 and the number of the characteristic values of 0 on 1024 pixel points can be obtained, and statistical data in the form of table 1 can be obtained.
TABLE 1
Characteristic value 1 st position Position 2 Position 3 Position 4 Position 5 1024 th bit
0 3 3 8 8 10 0
1 7 7 2 2 0 10
According to the number of 0 and the number of 1 on each pixel point (equivalent to each pixel point), the probability with the characteristic value of 1 and the probability with the characteristic value of 2 are calculated, and the calculation result is shown in the following table 2:
TABLE 2
Characteristic value 1 st position Position 2 Position 3 Position 4 Position 5 1024 th bit
0 3/10 3/10 8/10 8/10 10/10 0/10
1 7/10 7/10 2/10 2/10 0/10 10/10
And determining the calculation result as a similarity index corresponding to the picture classification 1, establishing the similarity index by classifying all the pictures in the preset sample library in the above manner, and obtaining the similarity index corresponding to the preset sample library.
Step S216, according to the similarity index, calculating the similarity between the character picture to be processed and the picture classification.
And calculating the feature data corresponding to the character picture to be processed and the similarity index file corresponding to each picture classification in the similarity index to obtain the similarity corresponding to the character picture to be processed and each picture classification. In a specific implementation, the step S216 may be implemented by:
obtaining the score of the character picture to be processed on each picture classification according to the probability that the characteristic value on each pixel point corresponding to each picture classification is a first numerical value and the probability that the characteristic value is a second numerical value; the similarity of the character picture to be processed and the picture classification can be represented.
In a specific implementation, the score of the character picture to be processed in each picture classification is calculated through the following steps 20-21:
step 20, for each picture category, performing the following steps: and determining the probability corresponding to the characteristic value of each pixel point in the character picture to be processed according to the probability that the characteristic value on each pixel point corresponding to the current picture classification is the first numerical value and the probability that the characteristic value is the second numerical value.
And step 21, adding the probabilities corresponding to each pixel point to obtain the score of the character picture to be processed in the current picture classification.
For example, when the feature data of the character picture to be processed includes 1024 feature values, each feature value is represented by a first numerical value (e.g., 1) and a second numerical value (e.g., 0), that is, a character string represented by 0 or 1 whose feature data is 1024 bits, it is assumed that the feature data is 1010111101 … …; when the score of the feature data and the picture classification 1 is calculated, according to the feature value 1 of the first-bit pixel point of the feature data, the probability that the feature value of the first-bit pixel point in the picture classification 1 is found to be 0.7, then according to the feature value 0 of the second-bit pixel point of the feature data, the probability that the feature value of the first-bit pixel point in the picture classification 1 is checked to be 0.3, the probability value of each pixel point of the feature data corresponding to the pixel point in the picture classification 1 can be obtained through column extrapolation, the obtained probability value is added to be 0.7+0.3+0.2+0.8+ …, and the score of the character picture to be processed on the picture classification 1 can be obtained.
According to the above-mentioned way of calculating the score of the character picture to be processed in the picture classification 1, the score of the character picture to be processed in each picture classification in the preset sample library can be obtained.
In step S218, the pictures with high similarity are classified as the character pictures to be processed. In a specific implementation, the picture with the highest score may also be classified and determined as the classification of the character picture to be processed.
In some embodiments, when the first value corresponding to the feature value is set to 1 and the second value is set to 0, feature values corresponding to every four pixel points in the image feature value may be combined into one 2-ary number to obtain multiple 2-ary numbers corresponding to the image feature, so that when the image similarity is calculated, the number of times of calculation may be reduced (i.e., the calculation speed is increased by 4 times compared with the original calculation speed). For each picture classification in a preset sample library, executing the following steps: acquiring the probability that the characteristic value of each pixel point corresponding to the current picture classification is 1 and the probability that the characteristic value is 0; determining the probability corresponding to each 2-system number when the 2-system number is a certain numerical value aiming at the characteristic value of 4 pixel points corresponding to each 2-system number; that is, the probabilities corresponding to the feature values of the 4 pixels are added to obtain the probability that the 2-system number represents the feature values of the 4 pixels; then, the probability that each 2-bit system number corresponding to each picture classification is a certain numerical value is determined as a similarity index.
For example, each 2-system number pair applies 2 combinations of 0 and 1, and according to the 2-system rule, every four pixel points in the picture feature are combined into one 2-system number, that is, 0000 is determined to be 0, 0001 is determined to be 1, 0010 is determined to be 2, and so on until 1111 is determined to be 15. For the schematic result of the picture features corresponding to the 10 sample pictures in the picture classification 1 in the above embodiment, the first four pixel points may be combined into a first group of 2-ary numbers, the 5 th to 8 th pixel values may be combined into a second group of 2-ary numbers, and so on; aiming at the first group of 2-system numbers, when the 2-system number is 0000, the corresponding probability is the probability addition of the characteristic values of the first four pixel points being 0, and when the 2-system number is 0010, the corresponding probability is the probability addition of the characteristic value corresponding to each pixel point on the first four pixel points; according to the method, the probability value corresponding to each 2-bit system number is counted, the similarity index file corresponding to the picture classification 1 can be obtained according to the probability, the similarity index is established by classifying all the pictures in the preset sample library in the above mode, and the similarity index corresponding to the preset sample library can be obtained.
When the similarity index is obtained according to the 2-system number, the feature data in the character picture to be processed also needs to be converted into the 2-system number, so that the similarity between the picture classification in the similarity index and the feature data can be calculated, when the similarity is calculated, the corresponding probabilities on each 2-system number corresponding to the character picture to be processed are added to obtain the score of the character picture to be processed on the current picture classification, and the picture classification corresponding to the highest score is taken as the classification of the character picture to be processed, so that the calculation amount of the similarity calculation can be reduced.
The character picture classification method comprises the steps of firstly matching feature data of a character picture to be processed with picture features in a preset sample library, determining a similarity index corresponding to the preset sample library according to the picture features in each picture classification under the condition of failed matching, further calculating the similarity between the character picture to be processed and the picture classification according to the similarity index, and classifying the pictures with high similarity as the classification of the character picture to be processed. The method avoids the phenomenon that the classification of the pictures cannot be determined only by adopting complete matching, can determine the classification of each character picture, and has strong applicability, simple operation and high classification efficiency.
Corresponding to the above embodiment of character and picture classification, an embodiment of the present invention further provides a character and picture classification apparatus, as shown in fig. 3, the apparatus includes:
and the feature extraction module 30 is configured to perform feature extraction on the acquired character picture to be processed to obtain feature data.
And the complete matching module 31 is configured to match the feature data with the picture features in the preset sample library.
And the similarity matching module 32 is configured to determine, if there is no picture feature matching the feature data in the preset sample library, the classification of the character picture to be processed according to the similarity between the feature data and the picture feature.
The character picture classification device firstly extracts the characteristics of the acquired character picture to be processed to obtain characteristic data; matching the feature data with the picture features in a preset sample library; and if the picture characteristics matched with the characteristic data do not exist in the preset sample library, determining the classification of the character picture to be processed according to the similarity between the characteristic data and the picture characteristics. In the method, the feature data of the character picture to be processed is matched with the picture features in the preset sample library, and under the condition of failed matching, similarity matching is performed based on the similarity between the feature data and the picture features to determine the classification of the character picture to be processed.
Further, the apparatus further comprises a normalization module configured to: and carrying out normalization processing on the acquired character pictures to be processed to obtain the character pictures to be processed with the preset pixel number.
Further, the feature extraction module 30 is configured to: carrying out binarization processing on the character picture to be processed; determining the characteristic value of each pixel point according to the pixel value of the character picture to be processed after binarization processing; and splicing the characteristic values of all the pixel points line by line to obtain the characteristic data of the character picture to be processed.
Specifically, the preset sample library includes a plurality of image classifications, each image classification includes a plurality of sample images, each sample image corresponds to an image feature, and the image feature includes a feature value corresponding to each pixel point in the sample image; the similarity matching module 32 includes: the index determining unit is used for determining a similarity index corresponding to a preset sample library according to the picture characteristics in each picture classification; the similarity determining unit is used for calculating the similarity between the character picture to be processed and the picture classification according to the similarity index; and the classification determining unit is used for classifying the pictures with high similarity as the classifications of the character pictures to be processed.
Further, the characteristic value corresponding to each pixel point in the sample picture is a first numerical value or a second numerical value; the index determining unit is configured to: for each picture classification in a preset sample library, executing the following steps: aiming at each pixel point, calculating the number of sample pictures corresponding to the current picture classification with the characteristic value as a first numerical value and the number of sample pictures with the characteristic value as a second numerical value; obtaining the probability that the characteristic value on each pixel point is the first numerical value and the probability that the characteristic value is the second numerical value according to the number of the first numerical value and the number of the second numerical value; and determining the probability that the characteristic value on each pixel point corresponding to each picture classification is a first numerical value and the probability that the characteristic value is a second numerical value as a similarity index.
Further, the similarity determination unit is configured to: obtaining the score of the character picture to be processed on each picture classification according to the probability that the characteristic value on each pixel point corresponding to each picture classification is a first numerical value and the probability that the characteristic value is a second numerical value; the classification determining unit is used for classifying the pictures with the highest scores and determining the pictures as the classifications of the character pictures to be processed.
Further, the similarity determination unit is further configured to: for each picture category, the following steps are performed: determining the probability corresponding to the characteristic value of each pixel point in the character picture to be processed according to the probability that the characteristic value on each pixel point corresponding to the current picture classification is a first numerical value and the probability that the characteristic value is a second numerical value; and adding the probabilities corresponding to each pixel point to obtain the score of the character picture to be processed on the current picture classification.
The implementation principle and the generated technical effects of the character and picture classification device provided by the embodiment of the invention are the same as those of the method embodiment, and for brief description, corresponding contents in the method embodiment can be referred to where the device embodiment is not mentioned.
An embodiment of the present invention further provides an electronic device, which is shown in fig. 4, and the electronic device includes a processor 101 and a memory 100, where the memory 100 stores machine executable instructions that can be executed by the processor 101, and the processor 101 executes the machine executable instructions to implement the character picture classification method.
Further, the electronic device shown in fig. 4 further includes a bus 102 and a communication interface 103, and the processor 101, the communication interface 103, and the memory 100 are connected through the bus 102.
The memory 100 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 103 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used. The bus 102 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 4, but that does not indicate only one bus or one type of bus.
The processor 101 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 101. The processor 101 may be a general-purpose processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 100, and the processor 101 reads the information in the memory 100, and completes the steps of the method of the foregoing embodiment in combination with the hardware thereof.
The embodiment of the present invention further provides a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are called and executed by a processor, the machine-executable instructions cause the processor to implement the character and picture classification method.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the apparatus and/or the electronic device described above may refer to corresponding processes in the foregoing method embodiments, and are not described herein again.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A character and picture classification method is characterized by comprising the following steps:
extracting the characteristics of the acquired character picture to be processed to obtain characteristic data;
matching the feature data with picture features in a preset sample library;
and if the picture characteristics matched with the characteristic data do not exist in the preset sample library, determining the classification of the character picture to be processed according to the similarity between the characteristic data and the picture characteristics.
2. The method according to claim 1, wherein before the step of extracting the features of the acquired character picture to be processed to obtain the feature data, the method further comprises:
and carrying out normalization processing on the acquired character picture to be processed to obtain the character picture to be processed with the preset pixel number.
3. The method according to claim 1, wherein the step of extracting the features of the acquired character picture to be processed to obtain feature data comprises:
carrying out binarization processing on the character picture to be processed;
determining the characteristic value of each pixel point according to the pixel value of the character picture to be processed after binarization processing;
and splicing the characteristic values of all the pixel points line by line to obtain the characteristic data of the character picture to be processed.
4. The method according to claim 1, wherein the preset sample library comprises a plurality of picture classifications, each picture classification comprises a plurality of sample pictures, each sample picture corresponds to the picture feature, and the picture feature comprises a feature value of each pixel point in the sample picture;
determining the classification of the character picture to be processed according to the similarity between the feature data and the picture features, wherein the classification comprises the following steps:
determining a similarity index corresponding to the preset sample library according to the picture characteristics in each picture classification;
according to the similarity index, calculating the similarity between the character picture to be processed and the picture classification;
and classifying the pictures with high similarity as the classification of the character pictures to be processed.
5. The method according to claim 4, wherein the feature value corresponding to each pixel point in the sample picture is a first value or a second value;
determining a similarity index corresponding to the preset sample library according to the picture characteristics in each picture classification, wherein the step comprises the following steps of:
for each picture category in the preset sample library, performing the following steps: calculating the number of the characteristic values as the first numerical values and the number of the characteristic values as the second numerical values in the sample picture corresponding to the current picture classification aiming at each pixel point; obtaining the probability that the characteristic value is the first numerical value and the probability that the characteristic value is the second numerical value on each pixel point according to the number of the first numerical value and the number of the second numerical value;
and determining the probability that the characteristic value is the first numerical value and the probability that the characteristic value is the second numerical value on each pixel point corresponding to each picture classification as the similarity index.
6. The method according to claim 5, wherein the step of calculating the similarity between the character picture to be processed and the picture classification according to the similarity index comprises:
obtaining the score of the character picture to be processed on each picture classification according to the probability that the characteristic value is the first numerical value and the probability that the characteristic value is the second numerical value on each pixel point corresponding to each picture classification;
the step of classifying the pictures with high similarity as the classification of the character pictures to be processed comprises the following steps:
and classifying the pictures with the highest scores to determine the pictures as the classes of the character pictures to be processed.
7. The method according to claim 6, wherein the step of obtaining the score of the character picture to be processed on each picture classification according to the probability that the feature value is the first numerical value and the probability that the feature value is the second numerical value on each pixel point corresponding to each picture classification comprises:
for each of said picture classifications, performing the following steps: determining the probability corresponding to the characteristic value of each pixel point in the character picture to be processed according to the probability that the characteristic value is the first numerical value and the probability that the characteristic value is the second numerical value on each pixel point corresponding to the current picture classification;
and adding the probabilities corresponding to each pixel point to obtain the score of the character picture to be processed on the current picture classification.
8. An apparatus for classifying characters and pictures, the apparatus comprising:
the characteristic extraction module is used for extracting the characteristics of the acquired character picture to be processed to obtain characteristic data;
the complete matching module is used for matching the feature data with the picture features in the preset sample library;
and the similarity matching module is used for determining the classification of the character picture to be processed according to the similarity between the feature data and the picture features if the picture features matched with the feature data do not exist in the preset sample library.
9. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor, the processor executing the machine executable instructions to implement the character picture classification method of any one of claims 1 to 7.
10. A machine-readable storage medium having stored thereon machine-executable instructions which, when invoked and executed by a processor, cause the processor to implement the character picture classification method of any one of claims 1 to 7.
CN201911314877.7A 2019-12-18 2019-12-18 Character picture classification method and device and electronic equipment Active CN111091128B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911314877.7A CN111091128B (en) 2019-12-18 2019-12-18 Character picture classification method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911314877.7A CN111091128B (en) 2019-12-18 2019-12-18 Character picture classification method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111091128A true CN111091128A (en) 2020-05-01
CN111091128B CN111091128B (en) 2023-09-22

Family

ID=70396496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911314877.7A Active CN111091128B (en) 2019-12-18 2019-12-18 Character picture classification method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111091128B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102122352A (en) * 2011-03-01 2011-07-13 西安电子科技大学 Characteristic value distribution statistical property-based polarized SAR image classification method
CN102880874A (en) * 2012-09-29 2013-01-16 重庆新媒农信科技有限公司 Character recognition method and character recognizer
US20130266195A1 (en) * 2012-04-10 2013-10-10 Derek Shiell Hash-Based Face Recognition System
CN104376260A (en) * 2014-11-20 2015-02-25 东华大学 Malicious code visualized analyzing method based on Shannon information entropy
CN105631449A (en) * 2015-12-21 2016-06-01 华为技术有限公司 Method, device and equipment for segmenting picture
CN106599940A (en) * 2016-11-25 2017-04-26 东软集团股份有限公司 Picture character identification method and apparatus thereof
CN106874909A (en) * 2017-01-18 2017-06-20 深圳怡化电脑股份有限公司 A kind of recognition methods of image character and its device
CN107239784A (en) * 2017-07-03 2017-10-10 福建中金在线信息科技有限公司 A kind of image identification method, device, electronic equipment and readable storage medium storing program for executing
CN107633209A (en) * 2017-08-17 2018-01-26 平安科技(深圳)有限公司 Electronic installation, the method and storage medium of dynamic video recognition of face
CN108108760A (en) * 2017-12-19 2018-06-01 山东大学 A kind of fast human face recognition
CN108563952A (en) * 2018-04-24 2018-09-21 腾讯科技(深圳)有限公司 Method for detecting virus, device and the storage medium of file
CN109543770A (en) * 2018-11-30 2019-03-29 合肥泰禾光电科技股份有限公司 Dot character recognition methods and device
CN109766893A (en) * 2019-01-09 2019-05-17 北京数衍科技有限公司 Picture character recognition methods suitable for receipt of doing shopping
CN110516100A (en) * 2019-08-29 2019-11-29 武汉纺织大学 A kind of calculation method of image similarity, system, storage medium and electronic equipment
CN110532413A (en) * 2019-07-22 2019-12-03 平安科技(深圳)有限公司 Information retrieval method, device based on picture match, computer equipment

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102122352A (en) * 2011-03-01 2011-07-13 西安电子科技大学 Characteristic value distribution statistical property-based polarized SAR image classification method
US20130266195A1 (en) * 2012-04-10 2013-10-10 Derek Shiell Hash-Based Face Recognition System
CN102880874A (en) * 2012-09-29 2013-01-16 重庆新媒农信科技有限公司 Character recognition method and character recognizer
CN104376260A (en) * 2014-11-20 2015-02-25 东华大学 Malicious code visualized analyzing method based on Shannon information entropy
CN105631449A (en) * 2015-12-21 2016-06-01 华为技术有限公司 Method, device and equipment for segmenting picture
CN106599940A (en) * 2016-11-25 2017-04-26 东软集团股份有限公司 Picture character identification method and apparatus thereof
CN106874909A (en) * 2017-01-18 2017-06-20 深圳怡化电脑股份有限公司 A kind of recognition methods of image character and its device
CN107239784A (en) * 2017-07-03 2017-10-10 福建中金在线信息科技有限公司 A kind of image identification method, device, electronic equipment and readable storage medium storing program for executing
CN107633209A (en) * 2017-08-17 2018-01-26 平安科技(深圳)有限公司 Electronic installation, the method and storage medium of dynamic video recognition of face
CN108108760A (en) * 2017-12-19 2018-06-01 山东大学 A kind of fast human face recognition
CN108563952A (en) * 2018-04-24 2018-09-21 腾讯科技(深圳)有限公司 Method for detecting virus, device and the storage medium of file
CN109543770A (en) * 2018-11-30 2019-03-29 合肥泰禾光电科技股份有限公司 Dot character recognition methods and device
CN109766893A (en) * 2019-01-09 2019-05-17 北京数衍科技有限公司 Picture character recognition methods suitable for receipt of doing shopping
CN110532413A (en) * 2019-07-22 2019-12-03 平安科技(深圳)有限公司 Information retrieval method, device based on picture match, computer equipment
CN110516100A (en) * 2019-08-29 2019-11-29 武汉纺织大学 A kind of calculation method of image similarity, system, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN111091128B (en) 2023-09-22

Similar Documents

Publication Publication Date Title
US8965127B2 (en) Method for segmenting text words in document images
US8818033B1 (en) System and method for detecting equations
WO2022156066A1 (en) Character recognition method and apparatus, electronic device and storage medium
CN106951832B (en) Verification method and device based on handwritten character recognition
WO2016033710A1 (en) Scene text detection system and method
US9171224B2 (en) Method of improving contrast for text extraction and recognition applications
CN111340023B (en) Text recognition method and device, electronic equipment and storage medium
CN110298353B (en) Character recognition method and system
CN110490190B (en) Structured image character recognition method and system
CN110647895B (en) Phishing page identification method based on login box image and related equipment
US11270143B2 (en) Computer implemented method and system for optical character recognition
CN114972817A (en) Image similarity matching method, device and storage medium
CN117315377B (en) Image processing method and device based on machine vision and electronic equipment
CN114550193A (en) Document integrity detection method and system and electronic equipment
CN111311573B (en) Branch determination method and device and electronic equipment
CN111091128B (en) Character picture classification method and device and electronic equipment
CN112149678A (en) Character recognition method and device for special language and recognition model training method and device
CN116246294B (en) Image information identification method, device, storage medium and electronic equipment
CN111611388A (en) Account classification method, device and equipment
CN111126420A (en) Method and device for establishing recognition model
CN112288045B (en) Seal authenticity distinguishing method
CN114743205A (en) Image tampering detection method and device
CN114758340A (en) Intelligent identification method, device and equipment for logistics address and storage medium
CN113343983B (en) License plate number recognition method and electronic equipment
CN115082709B (en) Remote sensing big data processing method, system and cloud platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant