CN115410211A - Image classification method and device, computer equipment and storage medium - Google Patents

Image classification method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN115410211A
CN115410211A CN202211049402.1A CN202211049402A CN115410211A CN 115410211 A CN115410211 A CN 115410211A CN 202211049402 A CN202211049402 A CN 202211049402A CN 115410211 A CN115410211 A CN 115410211A
Authority
CN
China
Prior art keywords
image
classified
result
classification
character recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211049402.1A
Other languages
Chinese (zh)
Inventor
王慧
王巍
石明
李捷
厉超
张瑞雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Pudong Development Bank Co Ltd
Original Assignee
Shanghai Pudong Development Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Pudong Development Bank Co Ltd filed Critical Shanghai Pudong Development Bank Co Ltd
Priority to CN202211049402.1A priority Critical patent/CN115410211A/en
Publication of CN115410211A publication Critical patent/CN115410211A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/19007Matching; Proximity measures
    • G06V30/19013Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Character Discrimination (AREA)

Abstract

The application relates to an image classification method, an apparatus, a computer device, a storage medium and a computer program product. The method comprises the following steps: acquiring an image to be classified, wherein the image to be classified is an image obtained by scanning a target material; classifying the images to be classified through the trained image classification model to obtain a first classification result of the images to be classified; inputting the image to be classified into a character recognition service, and obtaining a second classification result of the image to be classified in a keyword matching mode; and comparing the first classification result with the second classification result to obtain a comparison result, and if the comparison result is consistent, obtaining a classification result of the image to be classified based on the first classification result and the second classification result, wherein the classification result is used for indicating the file type to which the target material belongs. By adopting the method, the classification accuracy of the image to be processed can be improved.

Description

Image classification method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to an image classification method, apparatus, computer device, storage medium, and computer program product.
Background
In the financial industry, according to the continuous change of business management requirements, digital transformation and function optimization of the auditing process of some certificates are required from various aspects. At present, in the actual business processing process, the situation that the certificate images uploaded by the user are classified is often involved, and higher requirements are put forward on the accuracy of the image classification method. At present, manual classification is often adopted in a method for classifying certificates uploaded by users, the certificates to be classified often have multiple formats and similar formats or have multiple pages, most of the characters on the certificates have the characteristics of adhesion and density, some certificates have the conditions of handwriting and check boxes, and aiming at the complex scenes, the manual classification mode cannot meet the requirements of rapid development and strong supervision of services, and the problem of low accuracy of classification results exists.
Disclosure of Invention
Based on this, it is necessary to provide an image classification method, apparatus, computer device, computer-readable storage medium, and computer program product capable of improving the accuracy of classification results for the problem that the accuracy of image classification results is not high.
In a first aspect, the present application provides an image classification method. The method comprises the following steps:
acquiring an image to be classified, wherein the image to be classified is an image obtained by scanning a target material;
classifying the images to be classified through the trained image classification model to obtain a first classification result of the images to be classified;
inputting the image to be classified into a character recognition service, and obtaining a second classification result of the image to be classified in a keyword matching mode;
and comparing the first classification result with the second classification result to obtain a comparison result, and if the comparison result is consistent, obtaining a classification result of the image to be classified based on the first classification result and the second classification result, wherein the classification result is used for indicating the file type to which the target material belongs.
In one embodiment, the training step of the image classification model includes:
acquiring image samples under multiple categories and category label information corresponding to each image sample, wherein the multiple categories are related to file types of materials related to a target scene;
determining the number of the image samples of each category, and determining whether the number distribution of the image samples of each category meets a uniform distribution condition;
if the number of the image samples does not meet the preset number, performing sample expansion operation on the image samples under the categories of which the number of the image samples does not meet the preset number, so that the number distribution of the image samples of each category after sample expansion meets the uniform distribution condition;
and performing iterative training on the machine learning model to be trained based on the image samples of multiple types meeting the uniform distribution condition and the type label information of each image sample, and obtaining the trained image classification model when the training is completed.
In one embodiment, the sample expansion operation is performed on the image samples in the category of which the number of the image samples does not reach the preset number, and includes:
and searching the categories of which the number of the image samples does not reach the preset number, synthesizing the image samples under the searched categories with the preset background pictures to obtain a synthesized image, and taking the synthesized image as the expanded image samples under the searched categories.
In one embodiment, inputting an image to be classified into a character recognition service, and obtaining a second classification result of the image to be classified in a keyword matching manner, includes:
inputting the image to be classified into a character recognition service to obtain a character recognition result;
comparing the character recognition result with the keywords in each category respectively;
and if the character recognition result is successfully matched with the keyword in any category, taking the corresponding category as a second classification result.
In one embodiment, the image classification method further includes:
acquiring a character recognition result obtained by character recognition of an image to be classified by a character recognition service, wherein the character recognition result comprises characters and character position information;
acquiring a plurality of preset templates, and determining key position information corresponding to a key word in each preset template;
matching the characters at the corresponding key position information in the character recognition result with the keywords of the corresponding preset template to obtain the matching degree between the image to be classified and each preset template;
taking the preset template with the highest matching degree as a target template corresponding to the image to be classified;
and extracting target contents in the character recognition result based on the extraction rule matched with the target template, and combining the target contents into a structured result to be output.
In one embodiment, extracting target content in the character recognition result based on the extraction rule matched with the target template, and combining the target content into a structured result output, wherein the method comprises the following steps:
extracting target content in the character recognition result based on an extraction rule matched with the target template, and sequencing a plurality of characters in the target content to obtain an initial text line;
if the character height of the initial text line is larger than a preset value, splitting the initial text line to obtain a plurality of split single-line text lines;
if the character height of the initial text line is less than or equal to a preset value, taking the initial text line as a single-line text line;
determining a keyword corresponding to each single line of text line, and associating the keyword with the corresponding single line of text line to combine into a structured result output.
In one embodiment, in the case that a check box is included in the target template, extracting the target content in the character recognition result based on the extraction rule matched with the target template includes:
acquiring characters at the specified positions from the target content based on the specified positions of the check boxes specified in the extraction rule;
and when the character at the specified position is a preset character, determining that the character at the corresponding specified position is not selected, otherwise, determining that the character at the corresponding specified position is selected as the target content.
In a second aspect, the application further provides an image classification device. The device comprises:
the image acquisition module is used for acquiring an image to be classified, wherein the image to be classified is an image obtained by scanning a target material;
the first classification module is used for classifying the images to be classified through the trained image classification model to obtain a first classification result of the images to be classified;
the second classification module is used for inputting the images to be classified into the character recognition service and obtaining a second classification result of the images to be classified in a keyword matching mode;
the comparison module is used for comparing the first classification result with the second classification result to obtain a comparison result;
and the result obtaining module is used for obtaining a classification result of the image to be classified based on the first classification result and the second classification result if the comparison results are consistent, and the classification result is used for indicating the file type of the target material.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the following steps when executing the computer program:
acquiring an image to be classified, wherein the image to be classified is an image obtained by scanning a target material;
classifying the images to be classified through the trained image classification model to obtain a first classification result of the images to be classified;
inputting the image to be classified into a character recognition service, and obtaining a second classification result of the image to be classified in a keyword matching mode;
and comparing the first classification result with the second classification result to obtain a comparison result, and if the comparison result is consistent, obtaining a classification result of the image to be classified based on the first classification result and the second classification result, wherein the classification result is used for indicating the file type of the target material.
In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
acquiring an image to be classified, wherein the image to be classified is an image obtained by scanning a target material;
classifying the images to be classified through the trained image classification model to obtain a first classification result of the images to be classified;
inputting the image to be classified into a character recognition service, and obtaining a second classification result of the image to be classified in a keyword matching mode;
and comparing the first classification result with the second classification result to obtain a comparison result, and if the comparison result is consistent, obtaining a classification result of the image to be classified based on the first classification result and the second classification result, wherein the classification result is used for indicating the file type to which the target material belongs.
In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program which when executed by a processor performs the steps of:
acquiring an image to be classified, wherein the image to be classified is an image obtained by scanning a target material;
classifying the images to be classified through the trained image classification model to obtain a first classification result of the images to be classified;
inputting the image to be classified into a character recognition service, and obtaining a second classification result of the image to be classified in a keyword matching mode;
and comparing the first classification result with the second classification result to obtain a comparison result, and if the comparison result is consistent, obtaining a classification result of the image to be classified based on the first classification result and the second classification result, wherein the classification result is used for indicating the file type to which the target material belongs.
According to the image classification method, the image classification device, the computer equipment, the storage medium and the computer program product, the images to be classified are obtained by scanning the target material and are classified by adopting the trained image classification model, so that the accuracy of the classification result of the images to be classified can be improved; the image to be classified is input to a character recognition service, a second classification result is obtained in a keyword matching mode, the image to be classified can be converted into character data, the image to be classified is classified and converted into the character data, keyword matching is carried out on the character data, and the accuracy of the classification result of the image to be classified can be further improved; and comparing the first classification result with the second classification result, and under the condition that the comparison results are consistent, obtaining the classification result of the image to be classified based on the first classification result and the second classification result, wherein the classification result is used for indicating the file type to which the target material belongs.
Drawings
FIG. 1 is a diagram of an exemplary embodiment of a method for classifying images;
FIG. 2 is a flowchart illustrating an image classification method according to an embodiment;
FIG. 3 is a flowchart illustrating the training steps of the image classification model in one embodiment;
FIG. 4 is a flowchart illustrating an image classification method according to another embodiment;
FIG. 5 is a flowchart illustrating an image classification method according to another embodiment;
FIG. 6 is a flowchart illustrating an image classification method according to an exemplary embodiment;
FIG. 7 is a schematic diagram of an example of a to-be-processed image corresponding to a type A target material;
FIG. 8 is a diagram illustrating an example of a to-be-processed image corresponding to a target material with handwriting;
FIG. 9 is a schematic diagram of an image to be processed corresponding to a target material with check boxes in one embodiment;
FIG. 10 is a schematic diagram of an example of a pending image corresponding to an H-type target material;
FIG. 11 is a flowchart illustrating an overview of an image classification method in one embodiment;
FIG. 12 is a block diagram showing the structure of an image classification device according to an embodiment;
FIG. 13 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The image classification method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104, or may be located on the cloud or other network server. The method comprises the steps that a server obtains an image to be classified from a terminal, wherein the image to be classified is an image obtained by scanning a target material; classifying the images to be classified through the trained image classification model to obtain a first classification result of the images to be classified; inputting the image to be classified into a character recognition service, and obtaining a second classification result of the image to be classified in a keyword matching mode; and comparing the first classification result with the second classification result to obtain a comparison result, and if the comparison result is consistent, obtaining a classification result of the image to be classified based on the first classification result and the second classification result, wherein the classification result is used for indicating the file type to which the target material belongs. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The server 104 may be implemented as a stand-alone server or a server cluster comprised of multiple servers.
In an embodiment, as shown in fig. 2, an image classification method is provided, which is described by taking an example that the method is applied to a computer device (the computer device may be specifically the terminal or the server in fig. 1), and includes the following steps:
s202, obtaining an image to be classified, wherein the image to be classified is an image obtained by scanning a target material.
The target material is a file material filled or declared by a user, and after the approval is passed, the relevant business process can be continued by virtue of the target material. The target material may be a document composed of declaration information, such as account opening material, including but not limited to a business license, a change settlement application, an opening settlement application, basic deposit account information, an account opening license, a legal representative authorization submission, a comprehensive contract application, or a signature card. The changed settlement application, the opened settlement application and the comprehensive contract application have the characteristics of complex and similar formats, adhered characters, dense characters and multiple pages, and the business license, the account opening license, the seal card, the basic deposit account information and the like have the characteristics of multiple formats of the same type of certificates.
Specifically, the computer device obtains an image to be classified uploaded by a user, where the image to be classified may be an image that needs to be classified at will, or an image obtained by scanning a target material. Scanning the target material means acquiring an image of the target material by using a scanning device, which may be a terminal including a camera. The image to be classified may be a plurality of categories of images to be classified, each category of image to be classified corresponding to a type of the target material.
And S204, classifying the images to be classified through the trained image classification model to obtain a first classification result of the images to be classified.
The image classification model after training refers to a machine learning model for classifying images to be classified, and the image classification model adopts samples and labels and is trained through a model to obtain the image classification model after training. Common image classification models include LeNet (a convolutional neural network model named by its author name LeCun), alexNet (a convolutional neural network model named by its author name Alex), VGG (Visual Geometry Group, a convolutional neural network model proposed by Visual Geometry Group), google LeNet (google neural network model), and ResNet (Deep residual neural network). The computer device classifies the image to be classified through the trained image classification model, which may be that the computer device inputs the image to be classified into the trained image classification model, and the trained image classification model outputs the first classification result of the image to be classified, for example, the computer device inputs the image to be classified into the trained image classification model, and the output content of the trained image classification model is as follows: and taking the category with the maximum probability value as a first classification result of the images to be processed. The first classification result of the image to be classified refers to a classification result of the image to be classified, and the image to be classified comprises a plurality of images of different classes, and the first classification result of the image to be classified obtained after the image to be classified is classified by the trained image classification model comprises the plurality of classes.
And S206, inputting the image to be classified into a character recognition service, and obtaining a second classification result of the image to be classified in a keyword matching mode.
The character recognition service is a machine learning model constructed by adopting a character recognition algorithm. The keyword refers to a keyword preset by the computer device to represent the category of the image to be processed, for example, when the category of the target material is a business license, the corresponding keyword is a "business license", and when the category of the target material is an account opening license, the corresponding keyword is an "account opening license". The keyword matching mode refers to that the computer equipment inputs the image to be classified into a character recognition service to obtain a character recognition result which is matched with a preset keyword, so that a second classification result of the image to be classified is obtained. The second classification result of the image to be classified refers to another classification result of the image to be classified.
Specifically, the computer equipment inputs the image to be classified to the character recognition service, and the character recognition service recognizes characters in the image to be classified to obtain a character recognition result. The character recognition result typically includes character content in the image to be processed. Common character recognition algorithms include template matching character recognition algorithms, neural network character recognition algorithms, and support vector machine character recognition algorithms.
It should be noted that the images to be classified include a plurality of different types of images, the second classification result of the images to be classified includes a plurality of types by means of the character recognition service and the keyword matching, and the first classification result or the second classification result may be the same or different because the method for acquiring the second classification result of the images to be classified is different from the method for acquiring the first classification result of the images to be classified.
And S208, comparing the first classification result with the second classification result to obtain a comparison result, and if the comparison result is consistent, obtaining a classification result of the image to be classified based on the first classification result and the second classification result, wherein the classification result is used for indicating the file type to which the target material belongs.
The computer equipment compares a first classification result of the image to be processed with a second classification result of the image to be processed to obtain a comparison result, wherein the comparison result comprises that the first classification result is consistent with the second classification result or the first classification result is inconsistent with the second classification result. Although the first classification result of the image to be classified obtained by the method of classifying the image to be classified using the image recognition model and the second classification result obtained by the method of character recognition service and keyword matching have high accuracy, there may be a small number of cases of classification errors. In the application, the first classification result and the second classification result are compared to obtain a comparison result, and under the condition that the comparison results of the first classification result and the second classification result are consistent, the computer equipment obtains the classification result of the image to be classified based on the first classification result and the second classification result, wherein the classification result of the image to be classified is used for indicating the file type of the target material. And if the comparison result of the first classification result and the second classification result is consistent, the classification result of the image to be classified can be directly obtained. If the comparison result of the first classification result and the second classification result is inconsistent, it indicates that the first classification result or the second classification result has a classification error, and the processing needs to be performed again. Therefore, classification processing is respectively carried out based on two different classification modes, and the classification result is output only under the condition that the first classification result and the second classification result are consistent in comparison, so that the condition of classification errors can be avoided, and the method has high accuracy.
In the image classification method, the images to be classified are obtained by scanning the target material and are classified by adopting the trained image classification model, so that the accuracy of the classification result of the images to be classified can be improved; the image to be classified is input to a character recognition service, a second classification result is obtained in a keyword matching mode, the image to be classified can be converted into character data, the image to be classified is classified and converted into the character data to be subjected to keyword matching, and the accuracy of the classification result of the image to be classified can be further improved; and comparing the first classification result with the second classification result, and under the condition that the comparison results are consistent, obtaining a classification result of the image to be classified based on the first classification result and the second classification result, wherein the classification result is used for indicating the file type of the target material.
In one embodiment, as shown in fig. 3, the training step of the image classification model includes:
s302, image samples under multiple categories and category label information corresponding to each image sample are obtained, wherein the multiple categories are related to file types of materials related to the target scene.
The class label information is class information to be predicted by model training, and the class label information may be specifically class information of each image sample. The computer device obtains image samples of multiple categories and category label information corresponding to each image sample, wherein the multiple categories are related to file types of materials related to the target scene, the multiple categories include but are not limited to business licenses, change settlement applications, settlement application, basic deposit account information, opening licenses, legal representatives authorization and entrusting, comprehensive contract applications or seal cards, and the multiple categories are in one-to-one correspondence with the file types of the materials related to the target scene.
S304, determining the number of the image samples of each category, and determining whether the number distribution of the image samples of each category meets a uniform distribution condition.
The computer device determines the number of image samples for each category and determines whether the number distribution of the image samples for each category satisfies a uniform distribution condition. Specifically, if the difference between the number of image samples in all categories and the preset number is within the preset threshold, the uniform distribution condition is considered to be satisfied, and if the difference between the number of image samples in any category and the preset number is not within the preset threshold, the uniform distribution condition is considered to be not satisfied.
And S306, if the number of the image samples does not meet the preset number, performing sample expansion operation on the image samples of the categories, wherein the number of the image samples of the categories does not meet the preset number, so that the number distribution of the image samples of each category after sample expansion meets the uniform distribution condition.
The sample expansion refers to a method for obtaining more image samples by adopting an image synthesis method, and the number of the image samples is increased after the sample expansion operation. And when the distribution condition is not met, the computer equipment performs sample expansion operation on the image samples of which the number does not reach the preset number so that the number distribution of the image samples of each category after sample expansion meets the uniform distribution condition.
For example, the number of image samples of the three categories is: 997. 998 and 950, the preset number is 1000, the preset threshold is 3, and since the difference between the number of the image samples 950 and the preset number is not within the preset threshold, the image samples need to be subjected to sample expansion operation, after the image samples are subjected to sample expansion operation, the number of the image samples can be any data between 997 and 1000, and the number distribution of the image samples of the three categories meets the uniform distribution condition.
And S308, performing iterative training on the machine learning model to be trained based on the image samples of multiple types meeting the uniform distribution condition and the type label information of the image samples, and obtaining the trained image classification model when the training is completed.
The computer equipment carries out iterative training on a machine learning model to be trained on the basis of the image samples of multiple types meeting the uniform distribution condition and the class label information of the image samples, obtains the trained image classification model when the training is completed, and specifically, the computer equipment inputs the image samples of multiple types meeting the uniform distribution condition and the class label information of the image samples into the machine learning model to be trained for iterative training, and finishes the training when the training stopping condition is reached to obtain the trained image classification model. The training stopping condition may be that the number of iterative training reaches a preset number of iterative training, the duration of iterative training reaches a preset duration, and the accuracy, recall rate and accuracy of the trained image classification model reach preset conditions.
In the embodiment, the image samples which do not meet the uniform distribution condition are subjected to sample expansion by acquiring the image samples of multiple categories and the category label information corresponding to each image sample, so that the quantity distribution of the image samples of each category can meet the uniform distribution condition, the quantity of the image samples of each category can be ensured to be close to consistency based on the uniformly distributed image samples, and the deviation of a training result caused by too few certain samples is avoided; and performing iterative training on the machine learning model to be trained based on the image samples of multiple types meeting the uniform distribution condition and the class label phenomenon of each image sample, so that the classification accuracy of the trained image classification model is improved.
In one embodiment, the sample expansion operation is performed on the image samples under the category of which the number of the image samples does not reach the preset number, and the sample expansion operation includes: and searching categories of which the number of the image samples does not reach the preset number, synthesizing the image samples under the searched categories with the preset background pictures to obtain a synthesized image, and taking the synthesized image as the expanded image samples under the searched categories.
The computer equipment searches for the categories of which the number of the image samples does not reach the preset number, synthesizes the image samples under the searched categories with the preset background pictures to obtain a synthetic image, and takes the synthetic image as the expanded image samples under the searched categories. Specifically, in some embodiments, the method of synthesizing the image may be: the method includes the steps of obtaining a preset background picture, wherein the preset background picture can include at least one picture, and synthesizing the searched images in the category with the pictures in the preset background picture to obtain a plurality of synthesized images, wherein the background picture can be a picture shot on a desktop, a wall surface, the ground or other scenes, and the image synthesis can adopt a layer overlapping method or an image fusion method.
In other embodiments, the preset background picture may include at least one picture, and the method for synthesizing the image may be to rotate or translate the searched image sample in the category, and then perform image synthesis on the image sample subjected to the rotation or translation processing and each picture in the preset background picture, so as to obtain a plurality of synthesized images. And taking the plurality of synthetic images as the searched extended image samples under the category until the number of the searched image samples under the category reaches the preset number.
In this embodiment, the categories of which the number of the image samples does not reach the preset number are searched, the image samples under the categories of which the number of the image samples does not reach the preset number are synthesized with the preset background picture to obtain a synthesized image, the synthesized image is used as the searched extended image samples under the categories, the image samples can be extended, the number of the image samples is increased, it is ensured that the number distribution of the image samples of each category meets the uniform distribution condition, the number of the image samples of each category is close to the same based on the uniformly distributed image samples, the deviation of the training result caused by too few certain samples is avoided, and the classification accuracy of the trained image classification model is improved.
In one embodiment, inputting an image to be classified into a character recognition service, and obtaining a second classification result of the image to be classified by means of keyword matching, includes: inputting the image to be classified into a character recognition service to obtain a character recognition result; comparing the character recognition result with the keywords of each category respectively; and if the character recognition result is successfully matched with the keywords in any category, taking the corresponding category as a second classification result.
The computer device inputs the image to be classified into a Character Recognition service, where the Character Recognition service may be an OCR (Optical Character Recognition) Character Recognition model, and the Character Recognition service outputs a Character Recognition result, where the Character Recognition result includes characters in the image to be classified and Character position information corresponding to the characters. And the computer equipment respectively compares the character recognition result with the keywords under each category. Specifically, keywords are preset in the computer device, the target material of each category has corresponding preset keywords, and the preset keywords corresponding to the target material of each category are different, and the computer device compares the character recognition result with the keywords of each category respectively. And if the character recognition result is successfully matched with the keywords in any category, taking the corresponding category as a second classification result, specifically, if the matching degree of the character recognition result and the keywords in any category is greater than a preset degree value, taking the category corresponding to the keywords as the second classification result of the image to be classified by the computer equipment. For example, the character recognition result is "change settlement application", the keywords include "business license", "change settlement application", "opening settlement application", "basic deposit account information", "opening permit", "legal representative authorization letter of entrust", "comprehensive contract application", and "signature card", and the character recognition results are compared with the keywords, only the matching degree between the "change settlement application" in the keywords and the character recognition results is larger than a preset degree value, and the category corresponding to the "change settlement application" is used as the second classification result.
In this embodiment, the image to be classified is input to the character recognition service to obtain a character recognition result, the character recognition result is compared with the keywords in each category, the matching between the character recognition result and the keywords in any category is successful, and the corresponding category is used as the second classification result. The image to be classified can be converted into character data through the character recognition service, namely, the image processing is converted into the character processing, so that the efficiency and the accuracy of the image processing can be improved; and the method for matching the character recognition result with the keyword obtains a second classification result, so that the accuracy of the classification result of the image to be classified can be further improved.
In one embodiment, as shown in fig. 4, the image classification method further includes:
s401, obtaining a character recognition result obtained by the character recognition service performing character recognition on the image to be classified, wherein the character recognition result comprises characters and character position information.
The computer equipment acquires a character recognition result obtained by performing character recognition on an image to be classified by the character recognition service, and the character recognition method comprises the following steps: and inputting the image to be classified into a character recognition service to obtain a character recognition result. The character recognition result includes characters and character position information, the characters are characters in the image to be processed, the character positions are position information corresponding to the characters in the image to be processed, and the character positions can be position coordinates. It is understood that the number of characters in the character recognition result is at least one, each character may be a single character, or may be a character group formed by a plurality of characters, and the character position may also be position information of a single character in the image to be processed, or may also be position information of a character group in the image to be processed.
S402, a plurality of preset templates are obtained, and key position information corresponding to the key words in each preset template is determined.
The computer equipment is preset with a plurality of preset templates corresponding to target materials, each type of target material corresponds to one preset template, and the content in the preset templates comprises at least one keyword and a template format. The computer equipment obtains a plurality of preset templates, and determines keywords in each preset template and key position information corresponding to each keyword, wherein the key position information corresponding to the keywords is position information of the keywords in the template format, and the key position information can be position coordinates.
And S403, matching the characters corresponding to the key position information in the character recognition result with the keywords of the corresponding preset template to obtain the matching degree between the image to be classified and each preset template.
Specifically, the computer device obtains the character at the corresponding key position information from the character recognition result, and then matches the character with the keyword at the key position information in the preset template, and the matching method is as follows: the character is compared with the characters in the keywords at the key position information in the preset template one by one, for example, the character in the character recognition result is "abcd", the character in the character recognition result of the keyword of the corresponding preset template is 75% of the characters in the character recognition result and the keyword in the corresponding preset template are the same, and the matching procedure is considered to be 75%. And respectively matching all characters in the character recognition result with corresponding keywords in the corresponding preset template to obtain the matching degree between the image to be classified and each preset template.
S404, the preset template with the highest matching degree is used as a target template corresponding to the image to be classified.
The computer device searches the highest matching degree value from the matching degree between the image to be classified and each preset template, and takes the preset template corresponding to the highest matching degree value as the target template corresponding to the image to be classified. The method for selecting the preset template with the highest matching degree as the target template corresponding to the image to be classified can ensure that the target template is the preset template with the highest matching degree with the image to be classified, and is favorable for obtaining the accurate preset template corresponding to the image to be classified.
S405, extracting target contents in the character recognition result based on the extraction rule matched with the target template, and combining the target contents into a structured result to be output.
The extraction rule refers to a keyword to be extracted and key position information, the structured result refers to a result output in a preset structural form, the structured result includes characters in character recognition and character position information, and the structural form of the structured result may be a JSON (JSON Object Notation, JS Object Notation) form, for example. The computer equipment extracts target content in the character recognition result based on an extraction rule matched with the target template, specifically, the computer equipment extracts characters corresponding to key position information of the target template from the character recognition result to serve as the target content, obtains character position information of the target content, combines the target content and the corresponding character position information together, converts the target content and the corresponding character position information into a structural structure, and outputs the structural structure to obtain a structural result. For example, the structured result may be:
Figure BDA0003823265890000131
Figure BDA0003823265890000141
in the embodiment, a character recognition result obtained by performing character recognition on an image to be classified by a character recognition service is obtained, key position information corresponding to a keyword in each preset template is determined, the character at the position corresponding to the key position information in the character recognition result and the preset template with the highest matching degree with the keyword in the corresponding preset template are used as a target template corresponding to the image to be classified, target content in the character recognition result is extracted based on an extraction rule and combined into a structural result to be output, and the method for determining the target template corresponding to the image to be classified according to the matching of the character recognition result and the keyword in the preset template can ensure that the target template is the preset template with the highest matching degree with the image to be classified, and is favorable for obtaining the accurate preset template corresponding to the image to be classified; the target content in the character recognition result is extracted based on the extraction rule matched with the target template, so that the extracted target content can be ensured to correspond to the target template, and the accuracy of the structured result can be improved; and combining the target content and the corresponding character position information into a structured result to be output, which is favorable for displaying and quickly searching result data.
In one embodiment, extracting target content in the character recognition result based on the extraction rule matched with the target template, and combining the target content into a structured result output, comprises: extracting target content in the character recognition result based on an extraction rule matched with the target template, and sequencing a plurality of characters in the target content to obtain an initial text line; if the character height of the initial text line is larger than a preset value, splitting the initial text line to obtain a plurality of split single-line text lines; if the character height of the initial text line is less than or equal to a preset value, taking the initial text line as a single-line text line; determining a keyword corresponding to each single line of text line, and associating the keyword with the corresponding single line of text line to combine into a structured result output.
The computer equipment extracts target content in the character recognition result based on an extraction rule matched with the target template, and the types of the target materials are conditions of changing settlement application books, opening settlement application books and comprehensively signing the settlement application books. Because the target material has the characteristics of complex and similar format, sticky characters, dense characters and multiple pages, the target content obtained by the character recognition service processing may be a character block, and the character block refers to a text block comprising at least one line of characters. And the computer equipment extracts the character block corresponding to the key position information of the target template from the character recognition result, and sorts a plurality of characters in the character block from top to bottom and from left to right to obtain an initial text line. The computer device further performs line splitting processing on the plurality of sequenced characters, specifically, the computer device judges the character height of the initial text line and the size of a preset value, when the character height of the initial text line is larger than the preset value, the computer device splits the initial text line to obtain a plurality of split single-line text lines, and when the character height of the initial text line is smaller than or equal to the preset value, the computer device takes the initial text line as the single-line text line. In some embodiments, the method for splitting the initial text line may be: the line division is performed according to the service type, for example, the initial text line should be three lines of characters according to the service type, the first two lines of characters are the text type, and the third line is the character type, and the characters of the text type and the character type can be divided through the line division processing. Further, the computer device determines keywords respectively corresponding to each single-line text line from the target template, and specifically, respectively corresponds key position information of the target template to each single-line text line, thereby determining keywords respectively corresponding to each single-line text line. The computer equipment associates the keywords with the corresponding single line of text lines to form a structured result, and outputs the structured result, wherein the obtained structured result comprises the split text lines, and the accuracy of the obtained structured result can be improved.
In this embodiment, the target content extracted by the extraction rule matched with the target template is sorted and divided into lines, so that the target content with adhered characters or dense characters can be processed into a plurality of split single-line text lines, structured output is performed based on the single-line text lines, and the accuracy of the obtained structured result can be improved.
In one embodiment, as shown in fig. 5, in the case that a check box is included in the target template, extracting the target content in the character recognition result based on the extraction rule matching with the target template includes:
s502, acquiring characters at the designated positions from the target content based on the designated positions of the check boxes designated in the extraction rule.
Among them, when the type of the target material is the change settlement application, the legal representative authorized letter of attorney or the opening settlement application, a check box is often included in the target material, and a preset template corresponding to the target material, that is, a check box is also included in the target template, and when the check box is checked, the position of the image to be processed in the check box may be "√" or "x", and when the check box is not checked, the identification content of the check box in the character recognition service is often "mouth". In the case where the check box is included in the target template, the computer device extracts characters at the designated position of the check box designated in the target content from the target content, thereby determining the characters in the target content corresponding to the designated position of the check box.
S504, when the character at the designated position is a preset character, determining that the character at the corresponding designated position is not selected, otherwise, determining that the character at the corresponding designated position is selected as the target content.
When the character at the designated position of the check box in the target content is a preset character, for example, the preset character is 'mouth', the computer device determines that the character at the corresponding position is not selected, and when the character at the designated position of the check box in the target content is not the preset character, the computer device determines that the character at the corresponding designated position is selected as the target content.
In the embodiment, under the condition that the target template comprises the check boxes, the characters at the designated positions of the check boxes are obtained from the target content, and whether the target content is selected or not is determined according to the character content, so that the accurate target content can be obtained under the condition that the check boxes are arranged in the target template, and the accurate structured result can be output.
To explain the image classification method and effect in this scheme in detail, a most detailed embodiment is described below, where the target material is an account opening material, the type of the target material includes, but is not limited to, a type a, B, C, D, E, F, G, or H, the machine learning model to be trained is a ResNet image classification model, and the character recognition service is an OCR character recognition model:
as shown in fig. 6, which is a schematic flow chart of the image classification method, the computer device obtains an image to be classified, where the image to be classified is an image obtained by scanning a target material, and classifies the image to be classified through a trained image classification model to obtain a first classification result of the image to be classified. The training step of the image classification model comprises the following steps: acquiring image samples under multiple categories and category label information corresponding to each image sample, wherein the multiple categories are related to file types of materials related to a target scene; the number of image samples of each category is determined, and whether the number distribution of the image samples of each category satisfies a uniform distribution condition is determined. And if the difference value between the number of the image samples in any one category and the preset number is not within the preset threshold value, the uniform distribution condition is not met. If the number of the image samples does not meet the preset number, performing sample expansion operation on the image samples under the categories of which the number of the image samples does not meet the preset number, so that the number distribution of the image samples of each category after sample expansion meets the uniform distribution condition; and performing iterative training on the machine learning model to be trained based on the image samples of multiple types meeting the uniform distribution condition and the type label information of each image sample, and obtaining the trained image classification model when the training is completed. Carrying out sample expansion operation on the image samples under the category of which the number of the image samples does not reach the preset number, wherein the sample expansion operation comprises the following steps: and searching categories of which the number of the image samples does not reach the preset number, synthesizing the image samples under the searched categories with the preset background pictures to obtain a synthesized image, and taking the synthesized image as the expanded image samples under the searched categories. For example, the number of image samples of the three categories is: 997. 998 and 950, the preset number is 1000, the preset threshold is 3, and since the difference between the number of the image samples 950 and the preset number is not within the preset threshold, the image samples need to be subjected to sample expansion operation, after the image samples are subjected to sample expansion operation, the number of the image samples can be any data between 997 and 1000, and the number distribution of the image samples of the three categories meets the uniform distribution condition. Aiming at the requirement of multi-format high-precision classification of an account opening scene, an RESNet algorithm is adopted for image feature extraction and classification, and the RESNet image classification model is a feature extraction algorithm for solving the degradation problem of a depth network through residual learning. The ResNet image classification model refers to a VGG19 image classification model, is modified on the basis of the VGG19 image classification model, and is added with a residual error unit through a short circuit mechanism, and the change is mainly embodied in that the ResNet image processing model directly uses stride (step length) =2 convolution for down-sampling, and a global average pool layer replaces a full connection layer. An important design principle of the ResNet image classification model is: when the feature map size is reduced by half, the number of feature maps is doubled, which preserves the complexity of the network layer. Compared with a neural network model for common image classification, the ResNet model increases a short-circuit mechanism between every two layers, so that residual error learning is formed.
And the computer equipment inputs the image to be classified into the character recognition service and obtains a second classification result of the image to be classified in a keyword matching mode. Inputting the image to be classified into a character recognition service, and obtaining a second classification result of the image to be classified in a keyword matching mode, wherein the second classification result comprises the following steps: inputting the image to be classified into a character recognition service, obtaining a character recognition result, comparing the character recognition result with the keywords under each category respectively, and if the character recognition result is successfully matched with the keywords under any category, taking the corresponding category as a second classification result. And the computer equipment compares the first classification result with the second classification result to obtain a comparison result, and if the comparison result is consistent, the computer equipment obtains a classification result of the image to be classified based on the first classification result and the second classification result, wherein the classification result is used for indicating the file type of the target material. For the situation that the target material is of the type A and the type B and has a plurality of pages, the format of each page is different, the image classification model needs to judge the images to be classified of different formats to be the same type, and the classification accuracy requirement of the image classification model is higher.
The image classification method further comprises the steps of carrying out structured extraction on the images to be classified with consistent comparison results, wherein the format of the account opening materials of the type C, the type D, the type E and the type F are relatively fixed, the field information to be extracted is simple in rule, a target template corresponding to the images to be classified can be obtained in a template matching mode, the target content in the character recognition results is extracted through the extraction rule corresponding to the target template, the extraction rule refers to the keywords to be extracted and the key position information, the keywords to be extracted and the key position information are preset in the template, the characters in the character recognition results corresponding to the key position information are matched with the keywords of the corresponding preset template, the preset template with the highest score is set as the target template of the images to be classified, and then the keywords and the key position information set by the target template are subjected to information extraction according to the extraction rule. Specifically, the computer equipment acquires a character recognition result obtained by performing character recognition on an image to be classified by a character recognition service, wherein the character recognition result comprises characters and character position information, acquires a plurality of preset templates, determines key position information corresponding to keywords in each preset template, matches the characters at the positions corresponding to the key position information in the character recognition result with the keywords of the corresponding preset template to obtain the matching degree between the image to be classified and each preset template, takes the preset template with the highest matching degree as a target template corresponding to the image to be classified, extracts target content in the character recognition result based on an extraction rule matched with the target template, and combines the target content into a structured result to be output.
For G-type, H-type, and the like, the computer device has target materials with small and dense handwritten characters, check boxes, and characters due to complex formats, and after obtaining the target template, needs to perform sorting and line-dividing processing on the situations of character adhesion and character density of the target content. The ordering method is that the computer device orders all the characters in the target content from top to bottom and from left to right according to the coordinates from the top left corner. After the sequence is finished, the condition with multiple lines of dense characters or check boxes is independently processed to ensure the correctness of the identification. The check boxes are processed in a mode that after characters of the areas with the check boxes are sequenced, all position information identified as 'openings' is recorded, and because the key position information of the check boxes in the target model is relatively fixed, the corresponding selection items at the positions where the key position information of the 'openings' is not located at the fixed positions are the selected contents. Specifically, the computer device extracts target contents in the character recognition result based on an extraction rule matched with the target template, and combines the target contents into a structured result output, and the method comprises the following steps: the computer equipment extracts target content in the character recognition result based on an extraction rule matched with the target template, sequences a plurality of characters in the target content to obtain an initial text line, if the character height of the initial text line is larger than a preset value, splits the initial text line to obtain a plurality of split single-line text lines, if the character height of the initial text line is smaller than or equal to the preset value, takes the initial text line as the single-line text line, determines keywords respectively corresponding to each single-line text line, and associates the keywords with the corresponding single-line text line to combine the structured result for output. In the case that the check box is included in the target template, extracting the target content in the character recognition result based on the extraction rule matched with the target template includes: acquiring characters at the specified positions from the target content based on the specified positions of the check boxes specified in the extraction rule; when the character at the designated position is a preset character, determining that the character at the corresponding designated position is not selected, otherwise, determining that the character at the corresponding designated position is selected as the target content, namely when the character at the designated position of the check box in the target content is the preset character, for example, the preset character is 'mouth', determining that the character at the corresponding position is not selected by the computer equipment, and when the character at the designated position of the check box in the target content is not the preset character, determining that the character at the corresponding designated position is selected as the target content by the computer equipment. Fig. 8 is a schematic diagram of an image to be processed obtained by scanning a G-type target material with handwriting. Fig. 9 is a schematic diagram of an image to be processed obtained by scanning a G-type target material with check boxes. In the account opening material, most of the G type and the H type have multiple lines of characters, the characters are adhered and dense, and it is difficult to detect an accurate position only by means of a character recognition service, as shown in fig. 10, a schematic diagram of an image to be processed is obtained by scanning an H type target material, and a specific content corresponding to a "change range" in the diagram has a condition that multiple lines of characters are adhered and dense.
In addition, as shown in fig. 11, which is a schematic general flow chart of the image classification method, optionally, in order to simplify that an upstream system simultaneously calls image classification and character recognition services in an account opening scene, and achieve the precision and speed requirements of real-time recognition of multiple types of account opening materials, in the application, an image classification model, a character recognition service, and a structured process are integrated into the same end-to-end service for service system call, so that a function that one interface simultaneously supports classification and character recognition of multiple types of target materials is satisfied. Meanwhile, optionally, in order to meet the identification requirement of a single request of an upstream system within a preset time, a mechanism of polling by a multiple GPU (Graphics Processing Unit) is adopted, and an individual sub-thread is started for each image to be classified to identify, so as to complete the steps of image classification, character identification, structured extraction and the like. The model is deployed in a REST (Application Program Interface) form which can be called by a business system, and classification of various target materials and output of character recognition service through a unified API can be achieved.
According to the image classification method, the images to be classified are obtained by scanning the target material and are classified by adopting the trained image classification model, so that the accuracy of the classification result of the images to be classified can be improved; the image to be classified is input to a character recognition service, a second classification result is obtained in a keyword matching mode, the image to be classified can be converted into character data, the image to be classified is classified and converted into the character data to be subjected to keyword matching, and the accuracy of the classification result of the image to be classified can be further improved; and comparing the first classification result with the second classification result, and under the condition that the comparison results are consistent, obtaining a classification result of the image to be classified based on the first classification result and the second classification result, wherein the classification result is used for indicating the file type of the target material. Meanwhile, the image classification method is simple, quick and practical, and avoids the risks of easy error, long time consumption and information leakage when manually classifying the account opening materials and recording the information in the complex scene of account opening; an end-to-end code framework based on the integration of a deep learning algorithm and a template matching post-processing mode fully considers practical application scenes; by adopting a multi-GPU distributed reasoning mode, the method realizes higher end-to-end recognition speed, has higher recognition precision, and can meet higher service concurrency requirements and response real-time requirements.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the present application further provides an image classification device for implementing the image classification method. The implementation scheme for solving the problem provided by the apparatus is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the image classification apparatus provided below can be referred to as limitations on the image classification method in the foregoing, and details are not described herein again.
In one embodiment, as shown in fig. 12, there is provided an image classification apparatus 100 including: an image acquisition module 110, a first classification module 120, a second classification module 130, a comparison module 140, and a result acquisition module 150, wherein:
the image obtaining module 110 is configured to obtain an image to be classified, where the image to be classified is an image obtained by scanning a target material.
The first classification module 120 is configured to classify the image to be classified through the trained image classification model, so as to obtain a first classification result of the image to be classified.
The second classification module 130 is configured to input the image to be classified to a character recognition service, and obtain a second classification result of the image to be classified in a keyword matching manner.
The comparison module 140 is configured to compare the first classification result with the second classification result to obtain a comparison result;
and the result obtaining module 150 is configured to, if the comparison results are consistent, obtain a classification result of the image to be classified based on the first classification result and the second classification result, where the classification result is used to indicate a file type to which the target material belongs.
According to the image classification device, the images to be classified are obtained by scanning the target material and are classified by adopting the trained image classification model, so that the accuracy of the classification result of the images to be classified can be improved; the image to be classified is input to a character recognition service, a second classification result is obtained in a keyword matching mode, the image to be classified can be converted into character data, the image to be classified is classified and converted into the character data to be subjected to keyword matching, and the accuracy of the classification result of the image to be classified can be further improved; and comparing the first classification result with the second classification result, and under the condition that the comparison results are consistent, obtaining a classification result of the image to be classified based on the first classification result and the second classification result, wherein the classification result is used for indicating the file type of the target material.
In one embodiment, in terms of the training step of the image classification model, the first classification module 120 is further configured to: acquiring image samples in multiple types and class label information corresponding to each image sample, wherein the multiple types are related to file types of materials related to a target scene; determining the number of the image samples of each category, and determining whether the number distribution of the image samples of each category meets a uniform distribution condition; if the number of the image samples does not meet the preset number, performing sample expansion operation on the image samples under the categories of which the number of the image samples does not meet the preset number, so that the number distribution of the image samples of each category after sample expansion meets the uniform distribution condition; and performing iterative training on the machine learning model to be trained based on the image samples of multiple types meeting the uniform distribution condition and the type label information of each image sample, and obtaining the trained image classification model when the training is completed.
In one embodiment, in performing the sample expansion operation on the image samples in the category where the number of the image samples does not reach the preset number, the first classification module 120 is further configured to: and searching the categories of which the number of the image samples does not reach the preset number, synthesizing the image samples under the searched categories with the preset background pictures to obtain a synthesized image, and taking the synthesized image as the expanded image samples under the searched categories.
In one embodiment, in inputting the image to be classified into the character recognition service, and obtaining the second classification result of the image to be classified by means of keyword matching, the second classification module 130 is further configured to: inputting the image to be classified into a character recognition service to obtain a character recognition result; comparing the character recognition result with the keywords of each category respectively; and if the character recognition result is successfully matched with the keywords in any category, taking the corresponding category as a second classification result.
In one embodiment, the image classification apparatus 100 is further configured to: acquiring a character recognition result obtained by character recognition of an image to be classified by a character recognition service, wherein the character recognition result comprises characters and character position information; acquiring a plurality of preset templates, and determining key position information corresponding to a keyword in each preset template; matching the characters at the corresponding key position information in the character recognition result with the keywords of the corresponding preset template to obtain the matching degree between the image to be classified and each preset template; taking the preset template with the highest matching degree as a target template corresponding to the image to be classified; and extracting target contents in the character recognition result based on the extraction rule matched with the target template, and combining the target contents into a structured result to be output.
In one embodiment, in extracting the target content in the character recognition result based on the extraction rule matched with the target template, and combining the target content into the structured result output, the image classification apparatus 100 is further configured to: extracting target content in the character recognition result based on an extraction rule matched with the target template, and sequencing a plurality of characters in the target content to obtain an initial text line; if the character height of the initial text line is larger than a preset value, splitting the initial text line to obtain a plurality of split single-line text lines; if the character height of the initial text line is less than or equal to a preset value, taking the initial text line as a single-line text line; determining a keyword corresponding to each single line of text line, and associating the keyword with the corresponding single line of text line to combine into a structured result output.
In one embodiment, in the case that the check box is included in the target template, the image classification apparatus 100 is further configured to, in terms of extracting the target content in the character recognition result based on the extraction rule matched with the target template: acquiring characters at the specified positions from the target content based on the specified positions of the check boxes specified in the extraction rule; and when the character at the specified position is a preset character, determining that the character at the corresponding specified position is not selected, otherwise, determining that the character at the corresponding specified position is selected as the target content.
The modules in the image classification apparatus may be wholly or partially implemented by software, hardware, or a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 13. The computer device includes a processor, a memory, an Input/Output interface (I/O for short), and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing images to be classified, trained image classification models, first classification results of the images to be classified, second classification results of the images to be classified, comparison results, classification results of the images to be classified, image samples of multiple types, class label information corresponding to each image sample, the number of the image samples of each type, synthetic images, character recognition results, keywords, key position information, preset templates, structured results, initial text lines and single text lines. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement an image classification method.
Those skilled in the art will appreciate that the architecture shown in fig. 13 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the above-described method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In an embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant country and region.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), magnetic Random Access Memory (MRAM), ferroelectric Random Access Memory (FRAM), phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the various embodiments provided herein may be, without limitation, general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing-based data processing logic devices, or the like.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, and these are all within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (10)

1. A method of classifying an image, the method comprising:
acquiring an image to be classified, wherein the image to be classified is an image obtained by scanning a target material;
classifying the images to be classified through the trained image classification model to obtain a first classification result of the images to be classified;
inputting the image to be classified into a character recognition service, and obtaining a second classification result of the image to be classified in a keyword matching mode;
and comparing the first classification result with the second classification result to obtain a comparison result, and if the comparison result is consistent, obtaining a classification result of the image to be classified based on the first classification result and the second classification result, wherein the classification result is used for indicating the file type of the target material.
2. The method of claim 1, wherein the step of training the image classification model comprises:
acquiring image samples in multiple types and class label information corresponding to each image sample, wherein the multiple types are related to file types of materials related to a target scene;
determining the number of the image samples of each category, and determining whether the number distribution of the image samples of each category meets a uniform distribution condition;
if the number of the image samples does not meet the preset number, performing sample expansion operation on the image samples under the categories of which the number of the image samples does not meet the preset number, so that the number distribution of the image samples of each category after sample expansion meets the uniform distribution condition;
and performing iterative training on the machine learning model to be trained based on the image samples of multiple types meeting the uniform distribution condition and the class label information of each image sample, and obtaining the trained image classification model when the training is finished.
3. The method according to claim 2, wherein the performing of the sample expansion operation on the image samples in the category of which the number of the image samples does not reach the preset number comprises:
searching for the categories of which the number of the image samples does not reach the preset number, synthesizing the image samples under the searched categories with the preset background pictures to obtain a synthesized image, and taking the synthesized image as the expanded image samples under the searched categories.
4. The method according to claim 1, wherein the inputting the image to be classified to a character recognition service and obtaining a second classification result of the image to be classified by means of keyword matching comprises:
inputting the image to be classified into a character recognition service to obtain a character recognition result;
comparing the character recognition result with the keywords of each category respectively;
and if the character recognition result is successfully matched with the keywords in any category, taking the corresponding category as a second classification result.
5. The method according to any one of claims 1 to 4, further comprising:
acquiring a character recognition result obtained by character recognition of an image to be classified by a character recognition service, wherein the character recognition result comprises characters and character position information;
acquiring a plurality of preset templates, and determining key position information corresponding to a key word in each preset template;
matching the characters corresponding to the key position information in the character recognition result with the keywords of the corresponding preset template to obtain the matching degree between the image to be classified and each preset template;
taking the preset template with the highest matching degree as a target template corresponding to the image to be classified;
and extracting target contents in the character recognition result based on an extraction rule matched with the target template, and combining the target contents into a structured result to be output.
6. The method of claim 5, wherein extracting the target content in the character recognition result based on the extraction rule matched with the target template and combining the target content into a structured result output comprises:
extracting target content in the character recognition result based on an extraction rule matched with the target template, and sequencing a plurality of characters in the target content to obtain an initial text line;
if the character height of the initial text line is larger than a preset value, splitting the initial text line to obtain a plurality of split single-line text lines;
if the character height of the initial text line is less than or equal to a preset value, taking the initial text line as a single-line text line;
determining a keyword corresponding to each single line of text line, and associating the keyword with the corresponding single line of text line to combine into a structured result output.
7. The method according to claim 5, wherein in the case that a check box is included in the target template, the extracting target content in the character recognition result based on the extraction rule matched with the target template comprises:
acquiring characters at specified positions from the target content based on the specified positions of check boxes specified in an extraction rule;
and when the characters at the specified positions are preset characters, determining that the characters at the corresponding specified positions are not selected, otherwise, determining that the characters at the corresponding specified positions are selected as target content.
8. An image classification apparatus, characterized in that the apparatus comprises:
the image acquisition module is used for acquiring an image to be classified, wherein the image to be classified is an image obtained by scanning a target material;
the first classification module is used for classifying the images to be classified through the trained image classification model to obtain a first classification result of the images to be classified;
the second classification module is used for inputting the images to be classified into a character recognition service and obtaining a second classification result of the images to be classified in a keyword matching mode;
the comparison module is used for comparing the first classification result with the second classification result to obtain a comparison result;
and the result obtaining module is used for obtaining a classification result of the image to be classified based on the first classification result and the second classification result if the comparison results are consistent, and the classification result is used for indicating the file type of the target material.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202211049402.1A 2022-08-30 2022-08-30 Image classification method and device, computer equipment and storage medium Pending CN115410211A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211049402.1A CN115410211A (en) 2022-08-30 2022-08-30 Image classification method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211049402.1A CN115410211A (en) 2022-08-30 2022-08-30 Image classification method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115410211A true CN115410211A (en) 2022-11-29

Family

ID=84163207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211049402.1A Pending CN115410211A (en) 2022-08-30 2022-08-30 Image classification method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115410211A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830613A (en) * 2023-01-09 2023-03-21 广州佰锐网络科技有限公司 Document intelligent acquisition sorting method, calling method, storage medium and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830613A (en) * 2023-01-09 2023-03-21 广州佰锐网络科技有限公司 Document intelligent acquisition sorting method, calling method, storage medium and system

Similar Documents

Publication Publication Date Title
US11816165B2 (en) Identification of fields in documents with neural networks without templates
Jabeen et al. An effective content-based image retrieval technique for image visuals representation based on the bag-of-visual-words model
CN103268317B (en) Image is carried out the system and method for semantic annotations
US11074442B2 (en) Identification of table partitions in documents with neural networks using global document context
US10733228B2 (en) Sketch and style based image retrieval
US11170249B2 (en) Identification of fields in documents with neural networks using global document context
US11704357B2 (en) Shape-based graphics search
CN113378710B (en) Layout analysis method and device for image file, computer equipment and storage medium
US10268928B2 (en) Combined structure and style network
US11741734B2 (en) Identification of blocks of associated words in documents with complex structures
CN114898357B (en) Defect identification method and device, electronic equipment and computer readable storage medium
US20230138491A1 (en) Continuous learning for document processing and analysis
CN115410211A (en) Image classification method and device, computer equipment and storage medium
CN116894974A (en) Image classification method, device, computer equipment and storage medium thereof
CN108536769B (en) Image analysis method, search method and device, computer device and storage medium
US11816909B2 (en) Document clusterization using neural networks
Vishwanath et al. Deep reader: Information extraction from document images via relation extraction and natural language
Sabahi et al. RefinerHash: a new hashing-based re-ranking technique for image retrieval
CN116541549B (en) Subgraph segmentation method, subgraph segmentation device, electronic equipment and computer readable storage medium
CN114490996B (en) Intention recognition method and device, computer equipment and storage medium
CN114049634B (en) Image recognition method and device, computer equipment and storage medium
US20230325991A1 (en) Recommending objects for image composition using a geometry-and-lighting aware neural network
CN115457572A (en) Model training method and device, computer equipment and computer readable storage medium
CN118014833A (en) Image generation method, device and system based on industrial large model and storage medium
CN111597375A (en) Picture retrieval method based on similar picture group representative feature vector and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination