WO2022057707A1 - Text recognition method, image recognition classification method, and document recognition processing method - Google Patents

Text recognition method, image recognition classification method, and document recognition processing method Download PDF

Info

Publication number
WO2022057707A1
WO2022057707A1 PCT/CN2021/117222 CN2021117222W WO2022057707A1 WO 2022057707 A1 WO2022057707 A1 WO 2022057707A1 CN 2021117222 W CN2021117222 W CN 2021117222W WO 2022057707 A1 WO2022057707 A1 WO 2022057707A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
image
recognition
document
textual
Prior art date
Application number
PCT/CN2021/117222
Other languages
French (fr)
Chinese (zh)
Inventor
徐青松
李青
Original Assignee
杭州睿琪软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州睿琪软件有限公司 filed Critical 杭州睿琪软件有限公司
Publication of WO2022057707A1 publication Critical patent/WO2022057707A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present invention relates to the technical field of machine learning, and in particular, to a text recognition method, an image recognition classification method, a document recognition processing method, an electronic device, and a computer-readable storage medium.
  • OCR Optical Character Recognition, Optical Character Recognition
  • OCR refers to an electronic device (such as a scanner or digital camera) that examines characters printed on paper, determines its shape by detecting dark and light patterns, and then uses character recognition methods to translate the shape into a computer
  • the process of text that is, for printed characters, the text in the paper document is optically converted into a black and white dot matrix image file, and the text in the image is converted into a text format by the recognition software for the word processing software. Editing techniques.
  • the recognition model can usually be used to recognize the characters in the document.
  • the same model cannot be used for the recognition of documents in different languages. It is necessary to know the language of the document before calling the corresponding recognition model. If it is a mixed language It is more difficult to identify the documents of different languages. It can be seen that the existing OCR recognition technology has the problem of low text recognition accuracy for documents in different languages.
  • the purpose of the present invention is to provide a text recognition method, an image recognition classification method, a document recognition processing method, an electronic device, and a computer-readable storage medium.
  • the specific technical solutions are as follows:
  • the present invention provides a text recognition method, comprising:
  • Adopt character recognition model to recognize the character in each described text line, obtain the preliminary recognition result of described text to be recognized
  • the corresponding language recognition model is called according to the language type, and the corresponding character part is recognized to obtain the target recognition result of the text to be recognized.
  • the method further includes: recognizing the direction of the text to be recognized in the text image, and if the direction does not meet a preset condition, correcting the direction of the text to be recognized;
  • the identifying the direction of the text to be identified in the text image includes:
  • a direction recognition model is used to recognize the direction of the text to be recognized in the text image, and the direction recognition model is a CNN-based neural network model.
  • the character recognition model is a neural network model based on the CTC connectionism time classification technology and the Attention mechanism.
  • the character recognition model is obtained by training a training sample set including the CJK character set and the ISO8859 1-16 character set.
  • the language classification model is a fasttext ⁇ N-Gram> language classification model based on the wiki data set.
  • the present invention also provides an image recognition and classification method, including:
  • a keyword is determined according to the text recognition result, a first subdivision type of the content of the textual image or a second subdivision type of the content of the non-textual image is determined according to the keyword, and the text Class images are classified into folders corresponding to the first subdivision type, and the non-text class images are classified into folders corresponding to the second subdivision type.
  • the method further includes:
  • the textual image or the non-textual image is automatically named by using the keyword.
  • image recognition classification method after recognizing a textual image or a non-textual image, it also includes:
  • classifying the textual image into a folder corresponding to the first subdivision type, and classifying the non-textual image into a folder corresponding to the second subdivision type include:
  • the first subdivision type includes: one or more of notes, certificates, receipts, screenshots, documents, and certificates.
  • the image recognition model identifies the content in the non-text images
  • the image recognition and classification method further includes:
  • the second subdivision type is determined according to the content of the non-text image, and the non-text image is classified into a folder corresponding to the second subdivision type.
  • the method further includes:
  • the non-text images are automatically named according to the content in the non-text images.
  • the method further includes:
  • the method further includes:
  • the method before executing printing, the method further includes:
  • the signature is performed in the preset signature area in the text image that needs to be signed;
  • the present invention also provides a document identification processing method, including:
  • the character recognition results of the original document are arranged to obtain a recognized document.
  • the character recognition results of the original document are arranged to obtain a recognized document, including:
  • the original text in the original document is replaced by the character recognition result of the original document to obtain a recognized document.
  • the method further includes:
  • the original document is compared with the identification document to determine whether there is a difference between the identification document and the original document, and if there is, the difference is corrected in the identification document.
  • the method before the input image is recognized, the method further includes:
  • a correction model is used to identify the radian of the curve of the original document in the input image, and if the radian of the curve satisfies a preset correction condition, a correction process is performed on the original document in the input image to remove the radian of the original document. Curve radian.
  • the method further includes:
  • the character recognition result corresponding to the marked content is typeset into a format consistent with the original document.
  • the present invention also provides an electronic device, comprising a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;
  • the memory for storing computer programs
  • the processor when executing the program stored in the memory, implements the steps in the text recognition method as described above, or implements the steps in the image recognition and classification method as described above, or implements the steps in the above-described image recognition and classification method. Steps in the described document identification processing method.
  • the present invention also provides a computer-readable storage medium on which instructions are stored, and when the instructions are executed, implement the steps in the text recognition method as described above, or implement the above-described steps in the text recognition method.
  • the steps in the image recognition classification method described, or the steps in the document recognition processing method as described above are implemented.
  • the text recognition method, image recognition classification method, document recognition processing method, electronic device, and computer-readable storage medium provided by the present invention have the following advantages:
  • the text recognition method and the corresponding electronic device and computer-readable storage medium provided by the present invention, when performing text recognition, firstly, the text lines in the text to be recognized are marked with a general text line box, and then a character recognition model is used to identify each text line. Recognition is performed to obtain the preliminary recognition result of the text to be recognized, and then the language type is recognized on the preliminary recognition result, and the corresponding language recognition model is called according to the recognized language type to further recognize the character part corresponding to the language type. After optimization character recognition results.
  • a separate language recognition model is used for accurate recognition according to the language type involved, thereby improving the accuracy of text recognition.
  • the image recognition and classification method provided by the present invention, the corresponding electronic equipment, and the computer-readable storage medium can use the above-mentioned OCR text recognition method for text recognition for both textual images and non-textual images, and obtain the The text recognition results of the textual images and the non-textual images, and the keywords are determined according to the text recognition results to classify the textual images and the non-textual images, because the image classification is performed according to the text content in the images. , the classification result is more accurate, and at the same time, the determined keywords provide convenience for the subsequent use of keywords to search for images, and realize fast search of images.
  • non-text images can also be classified by image content, which also improves the accuracy of the classification results.
  • the document recognition processing method and the corresponding electronic equipment and computer-readable storage medium provided by the present invention use the OCR text recognition method to recognize the to-be-recognized document in the input image, so as to obtain the recognized document, because the uneditable document is converted into
  • the editable document provides convenience for the subsequent use of keywords in the document to obtain the document, and realizes the fast search of the document.
  • by correcting the radian of the input image and recognizing and adjusting the fonts marked and referenced in the document errors in the process of converting the document to be recognized in the input image into editable electronic text are reduced, and the conversion accuracy is improved.
  • FIG. 1 is a schematic flowchart of a text recognition method provided by an embodiment of the present invention.
  • FIG. 2 is a schematic flowchart of an image recognition and classification method provided by an embodiment of the present invention.
  • Fig. 3 is an example diagram of image recognition classification display
  • FIG. 4 is a schematic flowchart of a document identification processing method provided by an embodiment of the present invention.
  • Figure 5a is an example diagram of an input image containing an original document
  • Figure 5b is an example diagram of a recognized document obtained after the method of the present invention is used to recognize the input image shown in Figure 5a;
  • Figure 6a is another example diagram of an input image containing an original document
  • Figure 6b is an example diagram of a recognized document obtained after the input image shown in Figure 6a is recognized by an existing method
  • Figure 6c is an example diagram of a recognized document obtained after the method of the present invention is used to recognize the input image shown in Figure 6a;
  • FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
  • FIG. 1 shows a flowchart of a text recognition method according to an exemplary embodiment of the present invention.
  • the method can be implemented in an application program (app) installed on a smart terminal such as a mobile phone and a tablet computer.
  • the method may include:
  • Step S101 identifying text lines in the text to be identified in the text image, and marking each of the text lines with a general text line box.
  • a text image refers to an image whose image content is mainly text, such as a business card image, a document image, a certificate image, a certificate image, and a note image, which may be an image obtained by taking pictures of the text, or it may be Image of scanned text.
  • the note image may be an image obtained by taking pictures of the handwritten font text content on the paper.
  • the text to be recognized in the text image includes one or more text lines.
  • the present invention uses the text OCR recognition method for text recognition. During recognition, each text line is recognized separately. Finally, the recognition results of the entire text to be recognized are obtained by combining the recognition results of all text lines. Therefore, during recognition, each text line in the text to be recognized in the text image needs to be recognized, and at the same time, each text line is marked with a general text line box.
  • the language in the text line is not limited, but is only processed according to the word line, that is, when the characters in a text line have multiple language types, as long as these characters are located in the In the same text line, it is marked in the same general text line box.
  • a text image has the front and back sides of the ID card, and these two documents need to be identified separately, so before step S101 is executed, it is also possible to identify the The document area in the text image (that is, the area where the text to be recognized is located), and the document area is sliced. For example, it can be sliced through the callout box, or the edge of the document area can be identified by the edge recognition method, and then sliced according to the edge. .
  • the method further includes: recognizing the direction of the text to be recognized in the text image, and if the direction does not meet a preset condition, correcting the direction of the text to be recognized.
  • a direction recognition model may be used to recognize the direction of the text to be recognized in the text image, and the direction recognition model may be a CNN-based neural network model.
  • the reference direction may be set as a positive direction along the horizontal direction.
  • the direction recognition model can identify the angle between the arrangement direction of the characters in the text line and the positive horizontal direction in the text image. If the angle is 0, no correction is required. If the angle is not 0, the Text images are corrected.
  • the method of the correction processing is to invert the text image, so that the included angle between the characters in the text line of the text to be recognized and the horizontal positive direction in the text image is 0. In this embodiment, it may be considered that the direction to the right along the horizontal direction is the positive horizontal direction. In other embodiments, other directions may also be set as the positive direction, which is not limited in the present invention.
  • the correction processing method can also be based on the average slope of a plurality of text lines as a correction reference, or other correction methods, which are not limited in the present invention.
  • step S102 a character recognition model is used to recognize characters in each of the text lines, and a preliminary recognition result of the text to be recognized is obtained.
  • the character recognition model is an All in one model, which is obtained by training with multiple character sets, such as CJK character set and ISO8859 1-16 character set, etc. Therefore, the character recognition model can support CJK and Recognition of Latin fonts.
  • the character recognition model is a neural network model based on the CTC connectitul time classification technology and the Attention mechanism. Inputting each text line into the character recognition model respectively, the character recognition model can output the character recognition result of the text line, and then the character recognition result of the text to be recognized can be obtained in combination with the character recognition results of each text line, as a preliminary identification result.
  • Connectionist Temporal Classification is a time series classification algorithm that does not have strict alignment information between data units and annotation units. This algorithm is currently widely used in optical character recognition (OCR) and speech recognition.
  • OCR optical character recognition
  • CTC The main function of the model is to construct a loss function for the sequence, and in the process of backpropagation, the gradient determined according to the loss function is returned to the previous layer to complete the training of the CTC model.
  • Attention attention mechanism has a huge improvement effect on sequence learning tasks.
  • the codec framework by adding A model to the encoding segment, the data weighting transformation is performed on the source data sequence, or the A model is introduced at the decoding end to improve the target data. Making weighted changes can effectively improve the system performance in a sequence-to-sequence natural way.
  • the invention adopts the combination of CTC connectionism time classification technology and Attention mechanism to construct a character recognition model, which can improve the accuracy of character recognition.
  • Step S103 using a language classification model to perform language identification on the preliminary identification result, obtain the language types involved in the preliminary identification result, and divide the preliminary identification result into a plurality of different character parts according to the language type.
  • step S102 Since the character recognition model used in step S102 is obtained by training character sets of multiple different languages, the accuracy of the character recognition model for the character recognition results in the text line is not high, so it is necessary to carry out the preliminary recognition results. Optimization, further recognize characters of different languages in the characters to improve the accuracy of character recognition.
  • a language classification model is used to perform language recognition on the preliminary recognition results, and the language types involved in the preliminary recognition results are obtained, wherein the langid technology can be used to identify language types (ie, language types), and the language classification model is based on A fasttext ⁇ N-Gram> language classification model for the wiki dataset.
  • fasttext is a word vector and text classification tool.
  • the typical application scenario is "supervised text classification problem”. It provides a simple and efficient method for text classification and representation learning, and its performance is faster than that of deep learning.
  • N-Gram is a language model commonly used in large-vocabulary continuous language recognition.
  • CLM Chinese Language Model
  • the preliminary recognition result can be divided into a plurality of different character parts, that is, the characters of each language type are divided into the same character part.
  • Step S104 calling a corresponding language recognition model according to the language type to recognize the corresponding character part, and obtain the target recognition result of the text to be recognized.
  • each language type has a corresponding language recognition model. After obtaining the language types involved in the to-be-recognized text and the character parts corresponding to each language type in step S103, call the corresponding language recognition model for the corresponding After the character parts are recognized, more accurate character recognition results of each character part can be obtained, and then the target recognition results of the text to be recognized can be obtained.
  • the text line in the text to be recognized is first marked with a general text line frame, and then the character recognition model is used to recognize each text line to obtain the text to be recognized. Then, the language type is recognized on the preliminary recognition result, and the corresponding language recognition model is called according to the recognized language type to further recognize the character part corresponding to the language type, and the optimized character recognition result is obtained.
  • a separate language recognition model is used for accurate recognition according to the language type involved, thereby improving the accuracy of text recognition.
  • the present invention also proposes an image recognition and classification method, which is used to classify and organize a large number of images, and classify images with similar content into the same folder, so as to facilitate users to consult and search .
  • the image recognition and classification method includes the following steps:
  • step S201 an image recognition model is used to recognize the image to be classified, and a textual image or a non-textual image is recognized.
  • the image to be classified may be a newly captured image, or may be an image that has been captured and saved in a folder, such as an image saved in a mobile phone album.
  • Text-based images refer to images whose image content is mainly text, such as business card images, document images, certificate images, certificate images, and note images, which can be images obtained by taking pictures of texts, or images of texts. Image obtained after scanning.
  • the note image may be an image obtained by taking pictures of the handwritten font text content on the paper.
  • Non-text images refer to images whose content is mainly non-text, such as photos of people's lives, landscapes, and photos of animals and plants.
  • the image recognition model By recognizing the image to be classified by the image recognition model, it can be identified whether the image to be classified belongs to a textual image or a non-textual image, so that a textual image and a non-textual image can be classified.
  • the images are automatically classified and stored in different preset folders. That is, after recognizing the textual image, the textual image is classified into the textual image folder, and after recognizing the non-textual image, the non-textual image is classified into the non-textual image folder.
  • Step S202 Recognize the text in the textual image or the non-textual image to obtain a text recognition result of the textual image or the non-textual image.
  • the text recognition method shown in FIG. 1 may be used to recognize the text in the textual image or the non-textual image.
  • the specific identification process is not repeated here.
  • Different pictures can also be classified according to the language type of the text recognition result.
  • Step S203 Determine a keyword according to the text recognition result, determine a first subdivision type of the content of the textual image or a second subdivision type of the non-textual image according to the keyword, and use the The textual images are classified into a folder corresponding to the first subdivision type, and the non-textual images are classified into a folder corresponding to the second subdivision type.
  • a keyword classification model may be used to obtain keywords from the text recognition result, and then determine the first subdivision type of the content of the textual image or the first subdivision type of the content of the non-textual image according to the keyword the second subdivision type, and further classify the textual images into the folders corresponding to the first subdivision type, and classify the non-text images into the folders corresponding to the second subdivision type middle.
  • the first subdivision type includes, but is not limited to, one or more of: notes, certificates, receipts, screenshots, documents, and certificates.
  • the keyword classification model can obtain the keyword "" from the text recognition result. ID card”, thus it can be determined according to the keyword that the first subdivision type of the content of the textual image is a document image, and then the textual image can be classified into a document of the subdivision type of "document image” in the folder.
  • the subdivision type of "document image” can be further divided, for example, it can be further divided into various specific types including ID card, driver's license, passport, military officer's photo, work permit, birth certificate, household registration book and so on. Therefore, the specific type of the textual image can also be determined according to the keyword, and the textual image is further classified into a subfolder of the specific type under the folder corresponding to the first subdivision type.
  • the keyword classification model can obtain the keyword "ID card” from the text recognition result, and the textual image can be further classified into the "document image”
  • the subfolder of this specific type is "ID card” under the sub-type folder. It can be understood that, under the sub-folder of "document image", several specific types of subfolders can be set, such as ID card, driver's license, passport, military officer's photo, work permit, birth certificate, and household registration book.
  • the classified text images can be set into a file tree, in which each folder is named progressively layer by layer, so that each text image to be classified can be automatically classified into the corresponding folder.
  • the keyword may also be used to automatically name the textual image.
  • the classified images can be sorted according to the modified chronological order, or can be sorted according to the shooting chronological order, or the sorting method can also be set as required.
  • the user can search the classified text-based images according to keywords, so as to quickly find the target file. Specifically, in response to an operation of inputting a search term by the user, it is searched whether there is a keyword matching the search term, and if so, a text image corresponding to the keyword is output. For example, when the search term entered by the user is "identity”, it searches whether there is a keyword matching the search term, if there is a matching keyword "identity card”, then the keyword "identity card” corresponds to Textual image output is displayed to the user.
  • the second subdivision type may include: life photos of people, landscape photos, animal photos, plant photos, and the like.
  • the non-text image is a photo of Leifeng Pagoda, and the image contains three words "Leifeng Pagoda”
  • the text recognition result of the non-text image is "Leifeng Pagoda”
  • the keyword is Leifeng Pagoda Peak tower
  • the subdivision type "landscape photo” can be further divided, for example, the scenery photo can be further divided according to the name of the scenic spot. Therefore, the specific type of the non-text image can also be determined according to the identified name of the scenic spot, and the non-text image can be further classified into a subfolder of the specific type under the folder corresponding to the second subdivision type middle. For example, for the non-text image in the preceding example, since the non-text image is identified as a photo of Leifeng Pagoda, the non-text image can be further classified under the sub-category “landscape photos” "Photos of Leifeng Pagoda" in this specific subfolder. It can be understood that specific types of sub-folders corresponding to different scenic spots can be set under the sub-folder of “landscape photos”.
  • the image recognition model can also recognize the content in the non-text image, so it can also The second subdivision type is determined according to the content of the non-text image, and the non-text image is classified into a folder corresponding to the second subdivision type. For example, if the image recognition model recognizes that the content displayed by the non-text image is Leifeng Pagoda, it can be determined that the second subdivision type of the content of the non-text image is landscape photos, and then the non-text image can be Images are grouped into sub-categories "Landscape".
  • the classified non-text images can be set into a file tree, in which each folder is named progressively, so that each to-be-classified non-text image can be automatically classified into a corresponding folder.
  • the non-text images may be automatically named according to the content of the non-text images.
  • the content of the non-text image may include the recognized names of animals and plants, the names of scenic spots, etc., so the non-text images can be automatically named according to the recognized names of animals and plants, the names of scenic spots, and the like.
  • the non-text images are automatically named according to the keywords obtained by the keyword classification model. Through the automatic naming, the finding of the non-text images can be facilitated.
  • classification of non-text images can also be classified according to the shooting time, location, relevance of people, and names.
  • encryption processing can be performed to ensure the security of the files, for example, encryption processing is performed for important documents such as certificates, or encryption processing is performed for private life photos of people.
  • encryption processing you can encrypt a single file, or you can encrypt the corresponding folder.
  • the document to be printed can be imported with one key and related processing of the document can be performed according to the classification result. , you can search for pictures by keywords to import the pictures to be printed, and realize the printing function.
  • the method further includes: if there is a text image that needs to be signed in all the imported text images, signing in a preset signature area in the text image that needs to be signed; and/or, if If there are defective text images in all imported text images, filter the defective text images.
  • a signature area is set for some documents to be signed, and the signature can be directly performed on the image, and the signed document is then printed.
  • the image can also be binarized.
  • the OCR text recognition method described above can be used for text recognition, and the textual images and the non-textual images can be obtained.
  • the text recognition result of the textual image, and the keywords are determined according to the text recognition result to classify the textual image and the non-textual image. Since the image classification is performed according to the text content in the image, the classification result is more accurate, and at the same time
  • the determined keywords provide convenience for subsequent use of keywords to search for images, thereby realizing fast image search.
  • non-text images can also be classified by image content, which also improves the accuracy of the classification results.
  • the present invention also proposes a document recognition processing method for converting different types of files, such as scanned files, PDF files or pictures, into texts that can be searched or edited at any time.
  • a document recognition processing method for converting different types of files, such as scanned files, PDF files or pictures, into texts that can be searched or edited at any time.
  • the search can be carried out according to the document content or the text on the picture, that is, only the Enter keywords in the search box, no matter the title, content, remarks, or text on the picture, it can be intelligently searched.
  • the image recognition and classification method includes the following steps:
  • Step S301 acquiring an input image, where the input image contains the original document to be recognized.
  • the type of the original document may be a paper document
  • the input image may be formed by taking a photo or scanning
  • the type of the original document may also be an electronic document, such as a PDF document or a picture document with uneditable text, in this case the input image can be obtained directly.
  • Step S302 Recognize the original document in the input image to obtain a character recognition result of the original document.
  • the text recognition method shown in FIG. 1 may be used to recognize the original document in the input image.
  • the specific identification process is not repeated here.
  • Step S303 Arrange the character recognition results of the original document according to the position information of each character of the original document in the input image to obtain a recognized document.
  • the character recognition results of the original document are arranged to obtain a recognized document, including:
  • the original text in the original document is replaced by the character recognition result of the original document to obtain a recognized document.
  • Figure 5a shows the input image containing the original document
  • Figure 5b shows the finally obtained recognized document
  • each character of the original document can be obtained during processing
  • the coordinate information in the input image so that after the character recognition result of the original document is obtained, each character is placed in the corresponding position in the input image according to the coordinate information of the characters to replace the characters in the original document, thereby Get the identification document.
  • the use of OCR can convert the characters on the input image into editable characters, intelligent recognition, without manual typing input, can instantly convert PPT, PDF files, pictures, business cards, test papers, etc. into electronic manuscripts that can be edited and modified Identify documents.
  • the original document may also be compared with the identification document to determine whether there is a difference between the identification document and the original document, and if so, compare the identification document to the identification document. Correct the difference.
  • a manual verification method can be used to compare the original document with the output editable electronic text of the identification document, and find out the difference between the editable electronic text and the original document during the conversion process.
  • the original document in the acquired input image cannot be recognized due to problems such as radian, or the recognized recognized document may not be recognized.
  • the output is garbled.
  • a correction model may be used to identify the radian of the curve of the original document in the input image, and if the radian of the curve satisfies a preset correction condition, the original document in the input image is corrected to remove all
  • the radian of the curve of the original document can be corrected manually in practical applications, or other correction methods can be used.
  • the present invention uses an annotation recognition model to recognize the input image to recognize the annotation content in the original document, and in the recognized document, the character recognition result corresponding to the annotation content is typeset into Format consistent with the original document.
  • the invention recognizes the input image through the label recognition model, distinguishes the label content from the characters of the original document, and outputs the label content not in the same text form as other character content, but in a form consistent with the original document.
  • Fig. 6c shows the identification document processed by the method of the present invention. It can be seen from Figs. 6a and 6c that in the process of OCR identification, the label is automatically identified by the label recognition model, and then the identification document is checked according to the identification result. After that, it is automatically typeset into a format consistent with the original text, so that the OCR-recognized text is consistent with the original image, and no manual proofreading is required.
  • the document recognition processing method provided by the present invention adopts the OCR text recognition method to recognize the document to be recognized in the input image, thereby obtaining the recognized document.
  • Using the keywords in the document to search to obtain the document provides convenience and realizes the fast search of the document.
  • by correcting the radian of the input image and recognizing and adjusting the fonts marked and referenced in the document errors in the process of converting the document to be recognized in the input image into editable electronic text are reduced, and the conversion accuracy is improved.
  • the present invention also provides an electronic device.
  • the electronic device includes a processor 301 , a communication interface 302 , a memory 303 and a communication bus 304 , wherein the processor 301 , the communication interface 302 , and the memory 303 pass through the communication bus 304 complete communication with each other;
  • the memory 303 is used to store computer programs
  • the processor 301 When the processor 301 is configured to execute the program stored in the memory 303, it can implement the steps in the text recognition method described above, or the steps in the image recognition classification method described above, or The steps in the document recognition processing method as described above are implemented.
  • the communication bus 304 mentioned in the above electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like.
  • PCI peripheral component interconnect standard
  • EISA Extended Industry Standard Architecture
  • the communication bus 304 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface 302 is used for communication between the above-mentioned electronic device and other devices.
  • the so-called processor 301 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc.
  • the processor 301 is the control center of the electronic device, and uses various interfaces and lines to connect various parts of the entire electronic device.
  • the memory 303 can be used to store the computer program, and the processor 301 implements various functions of the electronic device by running or executing the computer program stored in the memory 303 and calling the data stored in the memory 303. Function.
  • the memory 303 may include non-volatile and/or volatile memory.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • the present invention also provides a computer-readable storage medium, where instructions are stored on the computer-readable storage medium, and when the instructions are executed, the steps in the text recognition method described above can be implemented, or the The steps in the image recognition classification method as described above, or the steps in the document recognition processing method as described above are implemented.
  • computer-readable storage media in embodiments of the present invention may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. It should be noted that computer-readable storage media described herein are intended to include, but not be limited to, these and any other suitable types of memory.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logic for implementing the specified logic Executable instructions for the function.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the various example embodiments of the invention may be implemented in hardware or special purpose circuits, software, firmware, logic, or any combination thereof. Certain aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor or other computing device. While aspects of the embodiments of the invention are illustrated or described as block diagrams, flowcharts, or using some other graphical representation, it is to be understood that the blocks, apparatus, systems, techniques or methods described herein may be taken as non-limiting Examples are implemented in hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controllers or other computing devices, or some combination thereof.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Character Discrimination (AREA)

Abstract

A text recognition method, an image recognition classification method, and a document recognition processing method. The text recognition method comprises: during text recognition, first labeling, with universal text line boxes, text lines in text, which is to be recognized, in a text image; next, recognizing each text line by using a character recognition model, so as to obtain a preliminary recognition result of said text; and then recognizing language types of the preliminary recognition result, and calling a corresponding language recognition model according to the recognized language types, to further recognize a character part corresponding to the language type, so as to obtain an optimized character recognition result. By means of the method, after the preliminary recognition result of said text is obtained, a separate language recognition model is further used for precision recognition according to the language types involved in the preliminary recognition result, such that the accuracy of text recognition is improved.

Description

文本识别方法、图像识别分类方法、文档识别处理方法Text recognition method, image recognition classification method, document recognition processing method 技术领域technical field
本发明涉及机器学习技术领域,特别涉及一种文本识别方法、图像识别分类方法、文档识别处理方法及电子设备、计算机可读存储介质。The present invention relates to the technical field of machine learning, and in particular, to a text recognition method, an image recognition classification method, a document recognition processing method, an electronic device, and a computer-readable storage medium.
背景技术Background technique
OCR(Optical Character Recognition,光学字符识别)是指电子设备(例如扫描仪或数码相机)检查纸上打印的字符,通过检测暗、亮的模式确定其形状,然后用字符识别方法将形状翻译成计算机文字的过程;即,针对印刷体字符,采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件,并通过识别软件将图像中的文字转换成文本格式,供文字处理软件进一步编辑加工的技术。OCR (Optical Character Recognition, Optical Character Recognition) refers to an electronic device (such as a scanner or digital camera) that examines characters printed on paper, determines its shape by detecting dark and light patterns, and then uses character recognition methods to translate the shape into a computer The process of text; that is, for printed characters, the text in the paper document is optically converted into a black and white dot matrix image file, and the text in the image is converted into a text format by the recognition software for the word processing software. Editing techniques.
在OCR识别时,通常可采用识别模型来识别文档中的字符,然而针对各种不同语言的文档无法使用同一个模型识别,需要知道文档是何种语言才能调用相应的识别模型,如果是混合语言的文档则更加难以识别,可见现有的OCR识别技术针对不同语言的文档存在文本识别准确率不高的问题。In OCR recognition, the recognition model can usually be used to recognize the characters in the document. However, the same model cannot be used for the recognition of documents in different languages. It is necessary to know the language of the document before calling the corresponding recognition model. If it is a mixed language It is more difficult to identify the documents of different languages. It can be seen that the existing OCR recognition technology has the problem of low text recognition accuracy for documents in different languages.
此外,还存在识别后的文档无法有效分类的问题,导致识别后的文档管理较为凌乱,而且不便于查找;由于待识别的文档存在曲线弧度等问题,导致识别后的排版出现与原文档不一致的情况,甚至出现乱码。In addition, there is also a problem that the recognized documents cannot be effectively classified, resulting in a messy management of the recognized documents and inconvenient to find; due to the problems such as curves and radians of the documents to be recognized, the typesetting after recognition is inconsistent with the original documents. situation, even garbled characters appear.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于提供一种文本识别方法、图像识别分类方法、文档识别处理方法及电子设备、计算机可读存储介质。具体技术方案如下:The purpose of the present invention is to provide a text recognition method, an image recognition classification method, a document recognition processing method, an electronic device, and a computer-readable storage medium. The specific technical solutions are as follows:
为达到上述目的,本发明提供一种文本识别方法,包括:In order to achieve the above object, the present invention provides a text recognition method, comprising:
识别文本图像中待识别文本中的文本行,并对每一所述文本行以通用文本行框进行标注;Recognizing text lines in the text to be recognized in the text image, and marking each of the text lines with a general text line frame;
采用字符识别模型识别每一所述文本行中的字符,得到所述待识别文本 的初步识别结果;Adopt character recognition model to recognize the character in each described text line, obtain the preliminary recognition result of described text to be recognized;
采用语言分类模型对所述初步识别结果进行语言识别,获取所述初步识别结果中涉及的语言类型,并根据所述语言类型将所述初步识别结果划分为多个不同的字符部分;Use a language classification model to perform language recognition on the preliminary recognition results, obtain the language types involved in the preliminary recognition results, and divide the preliminary recognition results into a plurality of different character parts according to the language types;
根据所述语言类型调用相应的语言识别模型,对相应的字符部分进行识别,得到所述待识别文本的目标识别结果。The corresponding language recognition model is called according to the language type, and the corresponding character part is recognized to obtain the target recognition result of the text to be recognized.
可选的,在上述文本识别方法中,还包括:识别文本图像中待识别文本的方向,若方向不符合预设条件,则对所述待识别文本的方向进行校正处理;Optionally, in the above text recognition method, the method further includes: recognizing the direction of the text to be recognized in the text image, and if the direction does not meet a preset condition, correcting the direction of the text to be recognized;
其中,所述识别文本图像中的待识别文本的方向,包括:Wherein, the identifying the direction of the text to be identified in the text image includes:
采用方向识别模型识别所述文本图像中的待识别文本的方向,所述方向识别模型为基于CNN的神经网络模型。A direction recognition model is used to recognize the direction of the text to be recognized in the text image, and the direction recognition model is a CNN-based neural network model.
可选的,在上述文本识别方法中,所述字符识别模型为基于CTC联结主义时间分类技术和Attention注意力机制的神经网络模型。Optionally, in the above text recognition method, the character recognition model is a neural network model based on the CTC connectionism time classification technology and the Attention mechanism.
可选的,在上述文本识别方法中,所述字符识别模型采用包含CJK字符集和ISO8859 1-16字符集的训练样本集训练得到。Optionally, in the above text recognition method, the character recognition model is obtained by training a training sample set including the CJK character set and the ISO8859 1-16 character set.
可选的,在上述文本识别方法中,所述语言分类模型为基于wiki数据集的fasttext<N-Gram>语言分类模型。Optionally, in the above text recognition method, the language classification model is a fasttext<N-Gram> language classification model based on the wiki data set.
基于同一发明构思,本发明还提供一种图像识别分类方法,包括:Based on the same inventive concept, the present invention also provides an image recognition and classification method, including:
采用图像识别模型对待分类图像进行识别,识别出文本类图像或非文本类图像;Use the image recognition model to recognize the images to be classified, and identify text images or non-text images;
采用如上文所述的文本识别方法对所述文本类图像或非文本类图像中的文本进行识别,得到所述文本类图像或非文本类图像的文本识别结果;Use the text recognition method as described above to recognize the text in the textual image or the non-textual image, and obtain the text recognition result of the textual image or the non-textual image;
根据所述文本识别结果确定关键词,根据所述关键词确定所述文本类图像的内容的第一细分类型或所述非文本类图像的内容的第二细分类型,并将所述文本类图像归类到所述第一细分类型对应的文件夹中,将所述非文本类图像归类到所述第二细分类型对应的文件夹中。A keyword is determined according to the text recognition result, a first subdivision type of the content of the textual image or a second subdivision type of the content of the non-textual image is determined according to the keyword, and the text Class images are classified into folders corresponding to the first subdivision type, and the non-text class images are classified into folders corresponding to the second subdivision type.
可选的,在上述图像识别分类方法中,在确定所述关键词之后,还包括:Optionally, in the above-mentioned image recognition and classification method, after determining the keyword, the method further includes:
利用所述关键词对所述文本类图像或所述非文本类图像进行自动命名。The textual image or the non-textual image is automatically named by using the keyword.
可选的,在上述图像识别分类方法中,在识别出文本类图像或非文本类 图像之后,还包括:Optionally, in the above-mentioned image recognition classification method, after recognizing a textual image or a non-textual image, it also includes:
将所述文本类图像归类到文本类图像文件夹中,将所述非文本类图像归类到非文本类图像文件夹中;classifying the textual images into a textual image folder, and classifying the non-textual images into a non-textual image folder;
相应的,所述将所述文本类图像归类到所述第一细分类型对应的文件夹中,将所述非文本类图像归类到所述第二细分类型对应的文件夹中,包括:Correspondingly, classifying the textual image into a folder corresponding to the first subdivision type, and classifying the non-textual image into a folder corresponding to the second subdivision type, include:
将所述文本类图像文件夹中的所述文本类图像归类到所述第一细分类型对应的文件夹中,将所述非文本类图像文件夹中的所述非文本类图像归类到所述第二细分类型对应的文件夹中。Classifying the textual images in the textual image folder into a folder corresponding to the first subdivision type, and classifying the non-textual images in the non-textual image folder into the folder corresponding to the second subdivision type.
可选的,在上述图像识别分类方法中,所述第一细分类型包括:笔记、证件、收据、截屏、文档、证书中的一种或多种。Optionally, in the above image recognition classification method, the first subdivision type includes: one or more of notes, certificates, receipts, screenshots, documents, and certificates.
可选的,在上述图像识别分类方法中,对于识别出的所述非文本类图像,所述图像识别模型识别出所述非文本类图像中的内容;Optionally, in the above image recognition classification method, for the identified non-text images, the image recognition model identifies the content in the non-text images;
所述图像识别分类方法还包括:The image recognition and classification method further includes:
根据所述非文本类图像的内容确定所述第二细分类型,并将所述非文本类图像归类到所述第二细分类型对应的文件夹中。The second subdivision type is determined according to the content of the non-text image, and the non-text image is classified into a folder corresponding to the second subdivision type.
可选的,在上述图像识别分类方法中,在识别出所述非文本类图像中的内容之后,还包括:Optionally, in the above image recognition and classification method, after recognizing the content in the non-text image, the method further includes:
根据所述非文本类图像中的内容对所述非文本类图像进行自动命名。The non-text images are automatically named according to the content in the non-text images.
可选的,在上述图像识别分类方法中,在将所述文本类图像文件夹中的所述文本类图像归类到所述第一细分类型对应的文件夹中之后,还包括:Optionally, in the above image recognition and classification method, after classifying the textual images in the textual image folder into a folder corresponding to the first subdivision type, the method further includes:
响应于用户输入搜索词的操作,搜索是否存在与所述搜索词相匹配的关键词,如果存在,则输出所述相匹配的关键词对应的文本类图像。In response to an operation of the user inputting a search term, it is searched whether there is a keyword matching the search term, and if so, a text image corresponding to the matched keyword is output.
可选的,在上述图像识别分类方法中,在将所述文本类图像文件夹中的所述文本类图像归类到所述第一细分类型对应的文件夹中之后,还包括:Optionally, in the above image recognition and classification method, after classifying the textual images in the textual image folder into a folder corresponding to the first subdivision type, the method further includes:
响应于用户的打印操作,根据预先配置的一键导入功能,导入所述第一细分类型对应的文件夹中的所有文本类图像以便于打印。In response to a user's printing operation, according to a pre-configured one-key import function, import all text-based images in the folder corresponding to the first subdivision type for printing.
可选的,在上述图像识别分类方法中,在执行打印前,还包括:Optionally, in the above image recognition and classification method, before executing printing, the method further includes:
若导入的所有文本类图像中存在需要签名的文本类图像,则在所述需要签名的文本类图像中预设的签名区域进行签名;If there is a text image that needs to be signed in all the imported text images, the signature is performed in the preset signature area in the text image that needs to be signed;
和/或,若导入的所有文本类图像中存在具有缺陷的文本类图像,则对具有缺陷的文本类图像进行滤镜处理。And/or, if there is a defective text-based image in all the imported text-based images, filter processing is performed on the defective text-based image.
基于同一发明构思,本发明还提供一种文档识别处理方法,包括:Based on the same inventive concept, the present invention also provides a document identification processing method, including:
获取输入图像,所述输入图像中包含待识别的原始文档;obtaining an input image, which contains the original document to be identified;
采用如上文所述的文本识别方法对所述输入图像中的所述原始文档进行识别,得到所述原始文档的字符识别结果;Using the text recognition method as described above to recognize the original document in the input image to obtain a character recognition result of the original document;
根据所述输入图像中所述原始文档的各个字符的位置信息,对所述原始文档的字符识别结果进行排布,得到识别文档。According to the position information of each character of the original document in the input image, the character recognition results of the original document are arranged to obtain a recognized document.
可选的,在上述文档识别处理方法中,根据所述输入图像中所述原始文档的各个字符的位置信息,对所述原始文档的字符识别结果进行排布,得到识别文档,包括:Optionally, in the above document recognition processing method, according to the position information of each character of the original document in the input image, the character recognition results of the original document are arranged to obtain a recognized document, including:
根据所述输入图像中所述原始文档的各个字符的位置信息,将所述原始文档的字符识别结果替换所述原始文档中的原始文本,得到识别文档。According to the position information of each character of the original document in the input image, the original text in the original document is replaced by the character recognition result of the original document to obtain a recognized document.
可选的,在上述文档识别处理方法中,在得到识别文档之后,还包括:Optionally, in the above document identification processing method, after obtaining the identified document, the method further includes:
将所述原始文档与所述识别文档进行对比,判断所述识别文档与所述原始文档是否存在区别点,如果存在则在所述识别文档中对所述区别点进行修正。The original document is compared with the identification document to determine whether there is a difference between the identification document and the original document, and if there is, the difference is corrected in the identification document.
可选的,在上述文档识别处理方法中,在对所述输入图像进行识别之前,还包括:Optionally, in the above document recognition processing method, before the input image is recognized, the method further includes:
采用校正模型识别所述输入图像中所述原始文档的曲线弧度,若所述曲线弧度满足预设的校正条件,则对所述输入图像中所述原始文档进行校正处理以去除所述原始文档的曲线弧度。A correction model is used to identify the radian of the curve of the original document in the input image, and if the radian of the curve satisfies a preset correction condition, a correction process is performed on the original document in the input image to remove the radian of the original document. Curve radian.
可选的,在上述文档识别处理方法中,在得到识别文档之后,还包括:Optionally, in the above document identification processing method, after obtaining the identified document, the method further includes:
采用标注识别模型对所述输入图像进行识别,以识别出所述原始文档中的标注内容;Identify the input image by using an annotation recognition model to identify the annotation content in the original document;
在所述识别文档中,将所述标注内容对应的字符识别结果排版成与所述原始文档一致的格式。In the recognized document, the character recognition result corresponding to the marked content is typeset into a format consistent with the original document.
本发明还提供一种电子设备,包括处理器、通信接口、存储器和通信总线,其中,所述处理器、所述通信接口、所述存储器通过所述通信总线完成 相互间的通信;The present invention also provides an electronic device, comprising a processor, a communication interface, a memory, and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;
所述存储器,用于存放计算机程序;the memory for storing computer programs;
所述处理器,用于执行所述存储器上所存放的程序时,实现如上文所描述的文本识别方法中的步骤,或者实现如上文所描述的图像识别分类方法中的步骤,或者实现如上文所描述的文档识别处理方法中的步骤。The processor, when executing the program stored in the memory, implements the steps in the text recognition method as described above, or implements the steps in the image recognition and classification method as described above, or implements the steps in the above-described image recognition and classification method. Steps in the described document identification processing method.
本发明还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有指令,当所述指令被执行时,实现如上文所描述的文本识别方法中的步骤,或者实现如上文所描述的图像识别分类方法中的步骤,或者实现如上文所描述的文档识别处理方法中的步骤。The present invention also provides a computer-readable storage medium on which instructions are stored, and when the instructions are executed, implement the steps in the text recognition method as described above, or implement the above-described steps in the text recognition method. The steps in the image recognition classification method described, or the steps in the document recognition processing method as described above are implemented.
与现有技术相比,本发明提供的文本识别方法、图像识别分类方法、文档识别处理方法及电子设备、计算机可读存储介质具有以下优点:Compared with the prior art, the text recognition method, image recognition classification method, document recognition processing method, electronic device, and computer-readable storage medium provided by the present invention have the following advantages:
本发明提供的文本识别方法以及对应的及电子设备、计算机可读存储介质,在进行文本识别时,首先对待识别文本中的文本行进行通用文本行框标注,再采用字符识别模型对各个文本行进行识别,得到待识别文本的初步识别结果,然后对初步识别结果进行语言类型的识别,根据识别出的语言类型调用相应的语言识别模型对该语言类型对应的字符部分进行进一步识别,得到优化后的字符识别结果。由于本实施例在得到待识别文本的初步识别结果之后,还根据其中涉及的语言类型采用单独的语言识别模型进行精准识别,从而提高了文本识别的准确度。In the text recognition method and the corresponding electronic device and computer-readable storage medium provided by the present invention, when performing text recognition, firstly, the text lines in the text to be recognized are marked with a general text line box, and then a character recognition model is used to identify each text line. Recognition is performed to obtain the preliminary recognition result of the text to be recognized, and then the language type is recognized on the preliminary recognition result, and the corresponding language recognition model is called according to the recognized language type to further recognize the character part corresponding to the language type. After optimization character recognition results. In this embodiment, after obtaining the preliminary recognition result of the text to be recognized, a separate language recognition model is used for accurate recognition according to the language type involved, thereby improving the accuracy of text recognition.
本发明提供的图像识别分类方法以及对应的及电子设备、计算机可读存储介质,对于文本类图像和非文本类图像,均可采用上文所述的OCR文本识别方法进行文本识别,得到所述文本类图像和所述非文本类图像的文本识别结果,并根据文本识别结果确定关键词进而对所述文本类图像和所述非文本类图像进行分类,由于根据图像中的文字内容进行图像分类,分类结果更加准确,同时所确定的关键词为后续采用关键词搜索图像提供了便利,实现了图像的快速搜索。此外,对非文本类图像还可以采用图像内容进行分类,也提高了分类结果的准确性。The image recognition and classification method provided by the present invention, the corresponding electronic equipment, and the computer-readable storage medium can use the above-mentioned OCR text recognition method for text recognition for both textual images and non-textual images, and obtain the The text recognition results of the textual images and the non-textual images, and the keywords are determined according to the text recognition results to classify the textual images and the non-textual images, because the image classification is performed according to the text content in the images. , the classification result is more accurate, and at the same time, the determined keywords provide convenience for the subsequent use of keywords to search for images, and realize fast search of images. In addition, non-text images can also be classified by image content, which also improves the accuracy of the classification results.
本发明提供的文档识别处理方法以及对应的及电子设备、计算机可读存储介质,对输入图像中的待识别文档采用OCR文本识别方法进行识别,从而 得到识别文档,由于将不可编辑的文档转换为可编辑的文档,为后续采用文档中的关键词搜索得到该文档提供了便利,实现了文件的快速搜索。此外,通过对输入图像的弧度校正、对文档中标注引用的字体进行识别和调整,降低了输入图像中待识别文档转化成可编辑电子文本过程中的错误,提高了转化的正确率。The document recognition processing method and the corresponding electronic equipment and computer-readable storage medium provided by the present invention use the OCR text recognition method to recognize the to-be-recognized document in the input image, so as to obtain the recognized document, because the uneditable document is converted into The editable document provides convenience for the subsequent use of keywords in the document to obtain the document, and realizes the fast search of the document. In addition, by correcting the radian of the input image and recognizing and adjusting the fonts marked and referenced in the document, errors in the process of converting the document to be recognized in the input image into editable electronic text are reduced, and the conversion accuracy is improved.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.
图1是本发明一实施例提供的文本识别方法的流程示意图;1 is a schematic flowchart of a text recognition method provided by an embodiment of the present invention;
图2是本发明一实施例提供的图像识别分类方法的流程示意图;2 is a schematic flowchart of an image recognition and classification method provided by an embodiment of the present invention;
图3是图像识别分类展示的一种示例图;Fig. 3 is an example diagram of image recognition classification display;
图4是本发明一实施例提供的文档识别处理方法的流程示意图;4 is a schematic flowchart of a document identification processing method provided by an embodiment of the present invention;
图5a是包含原始文档的输入图像的一个示例图;Figure 5a is an example diagram of an input image containing an original document;
图5b是采用本发明的方法对图5a所示的输入图像进行识别后得到识别文档的示例图;Figure 5b is an example diagram of a recognized document obtained after the method of the present invention is used to recognize the input image shown in Figure 5a;
图6a是包含原始文档的输入图像的另一个示例图;Figure 6a is another example diagram of an input image containing an original document;
图6b是采用现有的方法对图6a所示的输入图像进行识别后得到识别文档的示例图;Figure 6b is an example diagram of a recognized document obtained after the input image shown in Figure 6a is recognized by an existing method;
图6c是采用本发明的方法对图6a所示的输入图像进行识别后得到识别文档的示例图;Figure 6c is an example diagram of a recognized document obtained after the method of the present invention is used to recognize the input image shown in Figure 6a;
图7是本发明一实施例提供的电子设备的结构示意图。FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
具体实施方式detailed description
以下结合附图和具体实施例对本发明提出的一种文本识别方法、图像识别分类方法、文档识别处理方法及电子设备、计算机可读存储介质作进一步 详细说明。根据下面说明,本发明的优点和特征将更清楚。需说明的是,附图均采用非常简化的形式且均使用非精准的比例,仅用以方便、明晰地辅助说明本发明实施例的目的。须知,本说明书所附图式所绘示的结构、比例、大小等,均仅用以配合说明书所揭示的内容,以供熟悉此技术的人士了解与阅读,并非用以限定本发明实施的限定条件,故不具技术上的实质意义,任何结构的修饰、比例关系的改变或大小的调整,在不影响本发明所能产生的功效及所能达成的目的下,均应仍落在本发明所揭示的技术内容能涵盖的范围内。A text recognition method, an image recognition classification method, a document recognition processing method, an electronic device, and a computer-readable storage medium proposed by the present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments. The advantages and features of the present invention will become more apparent from the following description. It should be noted that, the accompanying drawings are all in a very simplified form and in inaccurate scales, and are only used to facilitate and clearly assist the purpose of explaining the embodiments of the present invention. It should be noted that the structures, proportions, sizes, etc. shown in the drawings in this specification are only used to cooperate with the contents disclosed in the specification, so as to be understood and read by those who are familiar with the technology, and are not used to limit the implementation of the present invention. Therefore, it does not have technical substantive significance, and any modification of structure, change of proportional relationship or adjustment of size should still fall within the scope of the present invention without affecting the effect that the present invention can produce and the purpose that can be achieved. The scope of the disclosed technical content can be covered.
为解决现有技术中的问题,本发明提供了一种文本识别方法。图1示出了根据本发明一示例性实施例的文本识别方法的流程图,该方法可以在例如手机、平板电脑等智能终端上安装的应用程序(app)中实现。如图1所示,该方法可以包括:In order to solve the problems in the prior art, the present invention provides a text recognition method. FIG. 1 shows a flowchart of a text recognition method according to an exemplary embodiment of the present invention. The method can be implemented in an application program (app) installed on a smart terminal such as a mobile phone and a tablet computer. As shown in Figure 1, the method may include:
步骤S101,识别文本图像中待识别文本中的文本行,并对每一所述文本行以通用文本行框进行标注。Step S101 , identifying text lines in the text to be identified in the text image, and marking each of the text lines with a general text line box.
本发明中,文本图像指的是图像内容以文字为主的图像,例如:名片图像、单据图像、证件图像、证书图像、笔记图像,其可以是对文本进行拍照后所得的图像,也可以是对文本进行扫描后所得的图像。举例而言,笔记图像可以是对纸张上的手写字体文字内容进行拍照后所得的图像。In the present invention, a text image refers to an image whose image content is mainly text, such as a business card image, a document image, a certificate image, a certificate image, and a note image, which may be an image obtained by taking pictures of the text, or it may be Image of scanned text. For example, the note image may be an image obtained by taking pictures of the handwritten font text content on the paper.
一般来说,所述文本图像中所述待识别文本包含一个或多个文本行,本发明是采用文字OCR识别方法进行文本识别的,在进行识别时,是对每一文本行分别进行识别,最后结合所有文本行的识别结果得到整个待识别文本的识别结果。因此,在识别时,需要识别出所述文本图像中所述待识别文本中的各个文本行,同时对每一文本行采用通用文本行框进行标注。Generally speaking, the text to be recognized in the text image includes one or more text lines. The present invention uses the text OCR recognition method for text recognition. During recognition, each text line is recognized separately. Finally, the recognition results of the entire text to be recognized are obtained by combining the recognition results of all text lines. Therefore, during recognition, each text line in the text to be recognized in the text image needs to be recognized, and at the same time, each text line is marked with a general text line box.
需要说明的是,在识别文本行时,并不对文本行中的语言进行限制,而是仅按照字行进行处理,即,当一个文本行中的字符有多种语言类型时,只要这些字符位于同一文本行中,就将其标注在同一个通用文本行框中。It should be noted that when recognizing a text line, the language in the text line is not limited, but is only processed according to the word line, that is, when the characters in a text line have multiple language types, as long as these characters are located in the In the same text line, it is marked in the same general text line box.
需要说明的是,一个图片中可能会有多个文档,例如一张文本图像中有身份证的正反面,而这两个文档需要分别进行识别,因此在执行步骤S101之前,还可以识别所述文本图像中的文档区域(即待识别文本所在的区域),并 对文档区域进行切片处理,例如可以通过标注框进行切片,还可以通过边缘识别方法识别出文档区域的边缘然后再根据边缘进行切片。It should be noted that there may be multiple documents in a picture, for example, a text image has the front and back sides of the ID card, and these two documents need to be identified separately, so before step S101 is executed, it is also possible to identify the The document area in the text image (that is, the area where the text to be recognized is located), and the document area is sliced. For example, it can be sliced through the callout box, or the edge of the document area can be identified by the edge recognition method, and then sliced according to the edge. .
优选的,在执行步骤S101之前,所述方法还包括:识别文本图像中待识别文本的方向,若方向不符合预设条件,则对所述待识别文本的方向进行校正处理。Preferably, before step S101 is performed, the method further includes: recognizing the direction of the text to be recognized in the text image, and if the direction does not meet a preset condition, correcting the direction of the text to be recognized.
可以理解的是,在对文本图像中的待识别文本进行识别之前,需要保证待识别文本在文本图像中的方向满足预设条件,例如,确保待识别文本的文本行中的字符在文本图像中是沿某一参考方向排列。因此首先需要对文本图像待识别文本的方向进行校正处理。具体的,可以采用一方向识别模型来识别文本图像中待识别文本的方向,所述方向识别模型可以为基于CNN的神经网络模型。It is understandable that, before recognizing the text to be recognized in the text image, it is necessary to ensure that the direction of the text to be recognized in the text image satisfies a preset condition, for example, to ensure that the characters in the text line of the text to be recognized are in the text image. are arranged along a certain reference direction. Therefore, it is first necessary to correct the direction of the text to be recognized in the text image. Specifically, a direction recognition model may be used to recognize the direction of the text to be recognized in the text image, and the direction recognition model may be a CNN-based neural network model.
所述参考方向可以设定为沿水平方向的正方向。所述方向识别模型可以识别出文本行中字符的排列方向在文本图像中与水平向正方向的夹角,若夹角为0则不需要校正,若夹角不为0,则需要对所述文本图像进行校正处理。校正处理的方法具体是对所述文本图像进行翻转,以使所述待识别文本的文本行中的字符在文本图像中与水平向正方向的夹角为0。在本实施例中,可以认为沿水平方向向右的方向为水平向正方向,在其它实施例中也可以设定其它方向为正方向,本发明对此不做限定。The reference direction may be set as a positive direction along the horizontal direction. The direction recognition model can identify the angle between the arrangement direction of the characters in the text line and the positive horizontal direction in the text image. If the angle is 0, no correction is required. If the angle is not 0, the Text images are corrected. The method of the correction processing is to invert the text image, so that the included angle between the characters in the text line of the text to be recognized and the horizontal positive direction in the text image is 0. In this embodiment, it may be considered that the direction to the right along the horizontal direction is the positive horizontal direction. In other embodiments, other directions may also be set as the positive direction, which is not limited in the present invention.
校正处理的方法也可以采用根据多个文本行的平均斜率作为校正参考,或者采用其它的校正方法,本发明对此不做限定。The correction processing method can also be based on the average slope of a plurality of text lines as a correction reference, or other correction methods, which are not limited in the present invention.
步骤S102,采用字符识别模型识别每一所述文本行中的字符,得到所述待识别文本的初步识别结果。In step S102, a character recognition model is used to recognize characters in each of the text lines, and a preliminary recognition result of the text to be recognized is obtained.
本实施例中,所述字符识别模型为一种All in one模型,其采用多种字符集训练得到,例如CJK字符集和ISO8859 1-16字符集等,因此所述字符识别模型可以支持CJK和拉丁系字体的识别。所述字符识别模型为基于CTC联结主义时间分类技术和Attention注意力机制的神经网络模型。将每一文本行分别输入所述字符识别模型,则所述字符识别模型可以输出该文本行的字符识别结果,然后结合各个文本行的字符识别结果可以得到所述待识别文本的字符识别结果,作为初步识别结果。In this embodiment, the character recognition model is an All in one model, which is obtained by training with multiple character sets, such as CJK character set and ISO8859 1-16 character set, etc. Therefore, the character recognition model can support CJK and Recognition of Latin fonts. The character recognition model is a neural network model based on the CTC connectionism time classification technology and the Attention mechanism. Inputting each text line into the character recognition model respectively, the character recognition model can output the character recognition result of the text line, and then the character recognition result of the text to be recognized can be obtained in combination with the character recognition results of each text line, as a preliminary identification result.
联结主义时间分类(Connectionist Temporal Classification,CTC)是一种数据单元与标注单元不存在严格对齐信息下的时间序列分类算法,该算法目前被广泛应用于光学文字识别(OCR)和语音识别中,CTC模型的主要作用是构造出一种针对序列的损失函数,并在反向传播过程中将依据损失函数确定的梯度回传给上一层以完成CTC模型的训练。Connectionist Temporal Classification (CTC) is a time series classification algorithm that does not have strict alignment information between data units and annotation units. This algorithm is currently widely used in optical character recognition (OCR) and speech recognition. CTC The main function of the model is to construct a loss function for the sequence, and in the process of backpropagation, the gradient determined according to the loss function is returned to the previous layer to complete the training of the CTC model.
Attention注意力机制在序列学习任务上具有巨大的提升作用,在编解码器框架内,通过在编码段加入A模型,对源数据序列进行数据加权变换,或者在解码端引入A模型,对目标数据进行加权变化,可以有效提高序列对序列的自然方式下的系统表现。Attention attention mechanism has a huge improvement effect on sequence learning tasks. In the codec framework, by adding A model to the encoding segment, the data weighting transformation is performed on the source data sequence, or the A model is introduced at the decoding end to improve the target data. Making weighted changes can effectively improve the system performance in a sequence-to-sequence natural way.
本发明采用CTC联结主义时间分类技术和Attention注意力机制结合来构建字符识别模型,能够提高字符识别的准确度。The invention adopts the combination of CTC connectionism time classification technology and Attention mechanism to construct a character recognition model, which can improve the accuracy of character recognition.
步骤S103,采用语言分类模型对所述初步识别结果进行语言识别,获取所述初步识别结果中涉及的语言类型,并根据所述语言类型将所述初步识别结果划分为多个不同的字符部分。Step S103, using a language classification model to perform language identification on the preliminary identification result, obtain the language types involved in the preliminary identification result, and divide the preliminary identification result into a plurality of different character parts according to the language type.
由于步骤S102中采用的字符识别模型是由多个不同语言的字符集训练得到的,因此所述字符识别模型对文本行中字符识别结果的准确性不高,因此需要对所述初步识别结果进行优化,对字符中不同语言的字符分别进行进一步识别,以提高字符识别的准确性。Since the character recognition model used in step S102 is obtained by training character sets of multiple different languages, the accuracy of the character recognition model for the character recognition results in the text line is not high, so it is necessary to carry out the preliminary recognition results. Optimization, further recognize characters of different languages in the characters to improve the accuracy of character recognition.
首先,采用语言分类模型对所述初步识别结果进行语言识别,获取所述初步识别结果中涉及的语言类型,其中,可采用langid技术进行语言类型(即语种)识别,所述语言分类模型为基于wiki数据集的fasttext<N-Gram>语言分类模型。First, a language classification model is used to perform language recognition on the preliminary recognition results, and the language types involved in the preliminary recognition results are obtained, wherein the langid technology can be used to identify language types (ie, language types), and the language classification model is based on A fasttext<N-Gram> language classification model for the wiki dataset.
fasttext是一个词向量与文本分类工具,典型应用场景是“带监督的文本分类问题”,提供简单而高效的文本分类和表征学习的方法,性能比肩深度学习而且速度更快。fasttext is a word vector and text classification tool. The typical application scenario is "supervised text classification problem". It provides a simple and efficient method for text classification and representation learning, and its performance is faster than that of deep learning.
N-Gram是大词汇连续语言识别中常用的一种语言模型,对中文而言,可称之为汉语语言模型(CLM,Chinese Language Model),其利用上下文中相邻词间的搭配信息,可以实现到汉字的自动转换。具体的,利用上下文中相邻词间的搭配信息,在需要把连续无空格的拼音、笔划,或代表字母或笔划的 数字,转换成汉字串(即句子)时,可以计算出具有最大概率的句子,从而实现到汉字的自动转换,无需用户手动选择,避开了许多汉字对应一个相同的拼音(或笔划串,或数字串)的重码问题。N-Gram is a language model commonly used in large-vocabulary continuous language recognition. For Chinese, it can be called Chinese Language Model (CLM, Chinese Language Model). It uses the collocation information between adjacent words in the context to Realize automatic conversion to Chinese characters. Specifically, using the collocation information between adjacent words in the context, when it is necessary to convert continuous pinyin without spaces, strokes, or numbers representing letters or strokes into Chinese character strings (ie, sentences), the maximum probability can be calculated. sentence, so as to realize automatic conversion to Chinese characters, without the need for manual selection by the user, avoiding the problem of repeated codes that many Chinese characters correspond to the same pinyin (or stroke string, or number string).
采用上述的语言分类模型对初步识别结果进行语言识别,能够更准确地获得所述初步识别结果中涉及的语言类型。在识别出语言类型后,可以将所述初步识别结果划分为多个不同的字符部分,即每种语言类型的字符划分成为同一字符部分。Using the above-mentioned language classification model to perform language recognition on the preliminary recognition results can more accurately obtain the language types involved in the preliminary recognition results. After the language type is recognized, the preliminary recognition result can be divided into a plurality of different character parts, that is, the characters of each language type are divided into the same character part.
步骤S104,根据所述语言类型调用相应的语言识别模型,对相应的字符部分进行识别,得到所述待识别文本的目标识别结果。Step S104 , calling a corresponding language recognition model according to the language type to recognize the corresponding character part, and obtain the target recognition result of the text to be recognized.
本实施例中,每种语言类型具有对应的语言识别模型,在步骤S103获得所述待识别文本中涉及的语言类型及其各个语言类型对应的字符部分后,调用相应的语言识别模型对相应的字符部分进行识别,即可得到各个字符部分的更加精确的字符识别结果,进而得到所述待识别文本的目标识别结果。In this embodiment, each language type has a corresponding language recognition model. After obtaining the language types involved in the to-be-recognized text and the character parts corresponding to each language type in step S103, call the corresponding language recognition model for the corresponding After the character parts are recognized, more accurate character recognition results of each character part can be obtained, and then the target recognition results of the text to be recognized can be obtained.
综上所述,本发明提供的文本识别方法,在进行文本识别时,首先对待识别文本中的文本行进行通用文本行框标注,再采用字符识别模型对各个文本行进行识别,得到待识别文本的初步识别结果,然后对初步识别结果进行语言类型的识别,根据识别出的语言类型调用相应的语言识别模型对该语言类型对应的字符部分进行进一步识别,得到优化后的字符识别结果。由于本实施例在得到待识别文本的初步识别结果之后,还根据其中涉及的语言类型采用单独的语言识别模型进行精准识别,从而提高了文本识别的准确度。To sum up, in the text recognition method provided by the present invention, when performing text recognition, the text line in the text to be recognized is first marked with a general text line frame, and then the character recognition model is used to recognize each text line to obtain the text to be recognized. Then, the language type is recognized on the preliminary recognition result, and the corresponding language recognition model is called according to the recognized language type to further recognize the character part corresponding to the language type, and the optimized character recognition result is obtained. In this embodiment, after obtaining the preliminary recognition result of the text to be recognized, a separate language recognition model is used for accurate recognition according to the language type involved, thereby improving the accuracy of text recognition.
在上述文本识别方法的基础上,本发明还提出了一种图像识别分类方法,用于对大量图像进行分类整理,将内容相似的图像归类在同一个文件夹中,以便于用户查阅和搜索。On the basis of the above text recognition method, the present invention also proposes an image recognition and classification method, which is used to classify and organize a large number of images, and classify images with similar content into the same folder, so as to facilitate users to consult and search .
如图2所示,所述图像识别分类方法包括以下步骤:As shown in Figure 2, the image recognition and classification method includes the following steps:
步骤S201,采用图像识别模型对待分类图像进行识别,识别出文本类图像或非文本类图像。In step S201, an image recognition model is used to recognize the image to be classified, and a textual image or a non-textual image is recognized.
本实施例中,所述待分类图像可以为新拍摄的图像,也可以为已拍摄并保存在一文件夹中的图像,例如保存在手机相册中的图像。文本类图像指的 是图像内容以文字为主的图像,例如:名片图像、单据图像、证件图像、证书图像、笔记图像,其可以是对文本进行拍照后所得的图像,也可以是对文本进行扫描后所得的图像。举例而言,笔记图像可以是对纸张上的手写字体文字内容进行拍照后所得的图像。非文本类图像指的是图像内容以非文字为主的图像,例如人物生活照、风景照、动植物照片等。In this embodiment, the image to be classified may be a newly captured image, or may be an image that has been captured and saved in a folder, such as an image saved in a mobile phone album. Text-based images refer to images whose image content is mainly text, such as business card images, document images, certificate images, certificate images, and note images, which can be images obtained by taking pictures of texts, or images of texts. Image obtained after scanning. For example, the note image may be an image obtained by taking pictures of the handwritten font text content on the paper. Non-text images refer to images whose content is mainly non-text, such as photos of people's lives, landscapes, and photos of animals and plants.
通过图像识别模型对待分类图像进行识别,可以识别出该待分类图像是属于文本类图像还是非文本类图像,从而可以分类出文本类图像和非文本类图像。By recognizing the image to be classified by the image recognition model, it can be identified whether the image to be classified belongs to a textual image or a non-textual image, so that a textual image and a non-textual image can be classified.
在识别分类出文本类图像和非文本类图像之后,将图像自动归类存储至预先设置好的不同文件夹中。即,在识别出文本类图像后,将该文本类图像归类到文本类图像文件夹中,在识别出非文本类图像后,将该非文本类图像归类到非文本类文件夹中。After identifying and classifying textual images and non-textual images, the images are automatically classified and stored in different preset folders. That is, after recognizing the textual image, the textual image is classified into the textual image folder, and after recognizing the non-textual image, the non-textual image is classified into the non-textual image folder.
步骤S202,对所述文本类图像或非文本类图像中的文本进行识别,得到所述文本类图像或非文本类图像的文本识别结果。Step S202: Recognize the text in the textual image or the non-textual image to obtain a text recognition result of the textual image or the non-textual image.
具体的,可采用如图1所示的文本识别方法对所述文本类图像或非文本类图像中的文本进行识别。具体识别过程在此不做赘述。还可以根据文本识别结果的语言类型将不同的图片进行分类。Specifically, the text recognition method shown in FIG. 1 may be used to recognize the text in the textual image or the non-textual image. The specific identification process is not repeated here. Different pictures can also be classified according to the language type of the text recognition result.
步骤S203,根据所述文本识别结果确定关键词,根据所述关键词确定所述文本类图像的内容的第一细分类型或所述非文本类图像的第二细分类型,并将所述文本类图像归类到所述第一细分类型对应的文件夹中,将所述非文本类图像归类到所述第二细分类型对应的文件夹中。Step S203: Determine a keyword according to the text recognition result, determine a first subdivision type of the content of the textual image or a second subdivision type of the non-textual image according to the keyword, and use the The textual images are classified into a folder corresponding to the first subdivision type, and the non-textual images are classified into a folder corresponding to the second subdivision type.
具体的,可以采用关键词分类模型从所述文本识别结果中获取关键词,再根据所述关键词确定所述文本类图像的内容的第一细分类型或所述非文本类图像的内容的第二细分类型,进而将所述文本类图像归类到所述第一细分类型对应的文件夹中,将所述非文本类图像归类到所述第二细分类型对应的文件夹中。Specifically, a keyword classification model may be used to obtain keywords from the text recognition result, and then determine the first subdivision type of the content of the textual image or the first subdivision type of the content of the non-textual image according to the keyword the second subdivision type, and further classify the textual images into the folders corresponding to the first subdivision type, and classify the non-text images into the folders corresponding to the second subdivision type middle.
所述第一细分类型包括:笔记、证件、收据、截屏、文档、证书中的一种或多种,但不以此为限。The first subdivision type includes, but is not limited to, one or more of: notes, certificates, receipts, screenshots, documents, and certificates.
例如,所述文本类图像为身份证图像,所述文本识别结果中包含字符“中 华人民共和国居民身份证”等字样,则所述关键词分类模型可以从所述文本识别结果中获取关键词“身份证”,由此可根据所述关键词确定该文本类图像的内容的第一细分类型为证件图像,进而可以将该文本类图像归类到“证件图像”这一细分类型的文件夹中。For example, if the textual image is an ID card image, and the text recognition result contains words such as the characters "People's Republic of China Resident Identity Card", the keyword classification model can obtain the keyword "" from the text recognition result. ID card", thus it can be determined according to the keyword that the first subdivision type of the content of the textual image is a document image, and then the textual image can be classified into a document of the subdivision type of "document image" in the folder.
此外,“证件图像”这一细分类型中还可以进一步划分,例如可以进一步划分为包括身份证、驾驶证、护照、军官照、工作证、出生证、户口本等多种具体类型。因此,还可以根据所述关键词确定文本类图像的具体类型,并将该文本类图像进一步归类到所述第一细分类型对应的文件夹下该具体类型的子文件夹中。例如,对于前述举例中的文本类图像,所述关键词分类模型可以从所述文本识别结果中获取关键词“身份证”,则可以将该文本类图像进一步归类到“证件图像”这一细分类型的文件夹下的“身份证”这一具体类型的子文件夹中。可以理解的是,“证件图像”这一细分类型的文件夹下可设置身份证、驾驶证、护照、军官照、工作证、出生证、户口本等多个具体类型的子文件夹。In addition, the subdivision type of "document image" can be further divided, for example, it can be further divided into various specific types including ID card, driver's license, passport, military officer's photo, work permit, birth certificate, household registration book and so on. Therefore, the specific type of the textual image can also be determined according to the keyword, and the textual image is further classified into a subfolder of the specific type under the folder corresponding to the first subdivision type. For example, for the textual image in the foregoing example, the keyword classification model can obtain the keyword "ID card" from the text recognition result, and the textual image can be further classified into the "document image" The subfolder of this specific type is "ID card" under the sub-type folder. It can be understood that, under the sub-folder of "document image", several specific types of subfolders can be set, such as ID card, driver's license, passport, military officer's photo, work permit, birth certificate, and household registration book.
通过上述方法,可以将分类后的文本类图像设置成文件树,其中各个文件夹层层递进命名,从而可以将各个待分类的文本类图像自动归类到相应的文件夹中。此外,为便于所述文本类图像的查找,还可以利用所述关键词对所述文本类图像进行自动命名。Through the above method, the classified text images can be set into a file tree, in which each folder is named progressively layer by layer, so that each text image to be classified can be automatically classified into the corresponding folder. In addition, in order to facilitate the search of the textual image, the keyword may also be used to automatically name the textual image.
例如,可以按照如图3所示的方式进行分类,将相册中的所有图像进行分类:首先展示All Documents(所有文件),然后依次展示Handwritten notes(笔记图像),ID Card&Passport(证件图像)、Receipt(收据图像)、Screens(截屏图像),Certificate(证书图像),Other Card(其它图像)等。当然这仅仅是一种示例,在实际应用中也可以按照其它方式进行分类。归类后的图像可以按照修改的时间顺序排序,也可以按照拍摄的时间顺序排序,或者排序方式也可以根据需要进行设置。For example, you can classify all the images in the album as shown in Figure 3: first display All Documents (all files), then display Handwritten notes (note image), ID Card&Passport (document image), Receipt (receipt image), Screens (screenshot image), Certificate (certificate image), Other Card (other image), etc. Of course, this is just an example, and it can also be classified in other ways in practical applications. The classified images can be sorted according to the modified chronological order, or can be sorted according to the shooting chronological order, or the sorting method can also be set as required.
在实际应用中,用户可以根据关键词对分类后的所述文本类图像进行搜索,以便于快速找到目标文件。具体的,响应于用户输入一搜索词的操作,搜索是否存在与所述搜索词相匹配的关键词,如果存在则输出该关键词对应的文本类图像。例如,当用户输入的搜索词为“身份”时,则搜索是否存在 与该搜索词相匹配的关键词,如果存在相匹配的关键词“身份证”,则将关键词“身份证”对应的文本类图像输出显示给用户。In practical applications, the user can search the classified text-based images according to keywords, so as to quickly find the target file. Specifically, in response to an operation of inputting a search term by the user, it is searched whether there is a keyword matching the search term, and if so, a text image corresponding to the keyword is output. For example, when the search term entered by the user is "identity", it searches whether there is a keyword matching the search term, if there is a matching keyword "identity card", then the keyword "identity card" corresponds to Textual image output is displayed to the user.
所述第二细分类型可以包括:人物生活照、风景照、动物照片、植物照片等。The second subdivision type may include: life photos of people, landscape photos, animal photos, plant photos, and the like.
例如,所述非文本图像为雷峰塔照片,且图像中包含“雷峰塔”三个字,则对该非文本图像的文本识别结果为“雷峰塔”,进而可确定关键词为雷峰塔,由此可根据该关键词确定该非文本类图像的内容的第二细分类型为风景照,进而可以将该非文本类图像归类到“风景照”这一细分类型的文件夹中。For example, if the non-text image is a photo of Leifeng Pagoda, and the image contains three words "Leifeng Pagoda", the text recognition result of the non-text image is "Leifeng Pagoda", and then it can be determined that the keyword is Leifeng Pagoda Peak tower, thus it can be determined according to the keyword that the second subdivision type of the content of the non-text image is landscape photos, and then the non-text images can be classified into files of the subdivision type "landscape photos" in the folder.
此外,“风景照”这一细分类型中还可以进一步划分,例如根据景点名称对风景照进行进一步划分。因此,还可以根据识别出的景点名称确定该非文本类图像的具体类型,并将该非文本类图像进一步归类到所述第二细分类型对应的文件夹下该具体类型的子文件夹中。例如,对于前述举例中的非文本类图像,由于识别出该非文本图像为雷峰塔照片,则可以将该非文本类图像进一步归类到“风景照”这一细分类型的文件夹下的“雷峰塔照片”这一具体类型的子文件夹中。可以理解的是,“风景照”这一细分类型的文件夹下可设置不同景点对应的具体类型的子文件夹。In addition, the subdivision type "landscape photo" can be further divided, for example, the scenery photo can be further divided according to the name of the scenic spot. Therefore, the specific type of the non-text image can also be determined according to the identified name of the scenic spot, and the non-text image can be further classified into a subfolder of the specific type under the folder corresponding to the second subdivision type middle. For example, for the non-text image in the preceding example, since the non-text image is identified as a photo of Leifeng Pagoda, the non-text image can be further classified under the sub-category “landscape photos” "Photos of Leifeng Pagoda" in this specific subfolder. It can be understood that specific types of sub-folders corresponding to different scenic spots can be set under the sub-folder of “landscape photos”.
在其它实施例中,对于识别出的所述非文本类图像,步骤S201在采用图像识别模型进行识别时,所述图像识别模型还可以识别出所述非文本类图像中的内容,因此还可以根据所述非文本类图像的内容确定所述第二细分类型,并将所述非文本类图像归类到所述第二细分类型对应的文件夹中。例如,所述图像识别模型识别出所述非文本类图像显示的内容为雷峰塔,则可以确定该非文本类图像的内容的第二细分类型为风景照,进而可以将该非文本类图像归类到“风景照”这一细分类型的文件夹中。In other embodiments, for the recognized non-text image, when an image recognition model is used for recognition in step S201, the image recognition model can also recognize the content in the non-text image, so it can also The second subdivision type is determined according to the content of the non-text image, and the non-text image is classified into a folder corresponding to the second subdivision type. For example, if the image recognition model recognizes that the content displayed by the non-text image is Leifeng Pagoda, it can be determined that the second subdivision type of the content of the non-text image is landscape photos, and then the non-text image can be Images are grouped into sub-categories "Landscape".
通过上述方法,可以将分类后的非文本类图像设置成文件树,其中各个文件夹层层递进命名,从而可以将各个待分类的非文本类图像自动归类到相应的文件夹中。Through the above method, the classified non-text images can be set into a file tree, in which each folder is named progressively, so that each to-be-classified non-text image can be automatically classified into a corresponding folder.
根据所述非文本类图像的内容可以对所述非文本类图像进行自动命名。例如,所述非文本图像的内容可以包括识别出的动植物名称、风景点的名称等,故可根据识别出的动植物名称、风景点的名称等对所述非文本类图像进 行自动命名。或者根据关键词分类模型获取的关键词对所述非文本类图像进行自动命名。通过自动命名,可以便于所述非文本类图像的查找。The non-text images may be automatically named according to the content of the non-text images. For example, the content of the non-text image may include the recognized names of animals and plants, the names of scenic spots, etc., so the non-text images can be automatically named according to the recognized names of animals and plants, the names of scenic spots, and the like. Or the non-text images are automatically named according to the keywords obtained by the keyword classification model. Through the automatic naming, the finding of the non-text images can be facilitated.
此外,对非文本类图像的分类,还可以按照拍摄的时间、地点、人物的关联性以及名称等进行分类。In addition, the classification of non-text images can also be classified according to the shooting time, location, relevance of people, and names.
优选的,对于分类后的文本类图像和非文本类图像,可以进行加密处理以保证文件的安全性,例如对于证照类的重要文件进行加密处理,或者对于私密的人物生活照进行加密处理,在加密时可以对单个文件进行加密,也可以对相应的文件夹进行加密。Preferably, for the classified text images and non-text images, encryption processing can be performed to ensure the security of the files, for example, encryption processing is performed for important documents such as certificates, or encryption processing is performed for private life photos of people. When encrypting, you can encrypt a single file, or you can encrypt the corresponding folder.
优选的,为便于用户操作,当用户需要打印时,对需要打印的文档可以根据分类的结果进行一键导入及对文档的相关处理,例如需要对某时间、某地拍摄的证件照进行打印时,可以通过关键词搜索图片从而导入需要打印的图片,实现打印功能。Preferably, in order to facilitate the user's operation, when the user needs to print, the document to be printed can be imported with one key and related processing of the document can be performed according to the classification result. , you can search for pictures by keywords to import the pictures to be printed, and realize the printing function.
此外,在执行打印前,还包括:若导入的所有文本类图像中存在需要签名的文本类图像,则在所述需要签名的文本类图像中预设的签名区域进行签名;和/或,若导入的所有文本类图像中存在具有缺陷的文本类图像,则对具有缺陷的文本类图像进行滤镜处理。In addition, before performing printing, the method further includes: if there is a text image that needs to be signed in all the imported text images, signing in a preset signature area in the text image that needs to be signed; and/or, if If there are defective text images in all imported text images, filter the defective text images.
具体的,对于有些要签名的文件设置有签名区域,可以直接在图像上进行签字,签字后的文件再进行打印。Specifically, a signature area is set for some documents to be signed, and the signature can be directly performed on the image, and the signed document is then printed.
对存在缺陷的图像进行滤镜处理,例如进行以下处理:Filter the defective image, such as the following:
a)有些文本类图像在拍摄时由于光线等问题,存在阴影,打印时为了保证效果,可以将阴影去除;a) Some text images have shadows due to light and other problems when they are shot. To ensure the effect, the shadows can be removed when printing;
b)对于年代久、失真的照片可以进行补全;b) Completion can be done for old and distorted photos;
c)对于文本中的手写文字以及涂抹、油污等可以在打印的时候自动去除;c) For the handwritten characters in the text, smears, oil stains, etc. can be automatically removed during printing;
d)为了节省在打印时的用墨量,还可以对图像进行二值化处理。d) In order to save the ink consumption during printing, the image can also be binarized.
综上所述,本发明提供的图像识别分类方法,对于文本类图像和非文本类图像,均可采用上文所述的OCR文本识别方法进行文本识别,得到所述文本类图像和所述非文本类图像的文本识别结果,并根据文本识别结果确定关键词进而对所述文本类图像和所述非文本类图像进行分类,由于根据图像中的文字内容进行图像分类,分类结果更加准确,同时所确定的关键词为后续 采用关键词搜索图像提供了便利,实现了图像的快速搜索。此外,对非文本类图像还可以采用图像内容进行分类,也提高了分类结果的准确性。To sum up, in the image recognition and classification method provided by the present invention, for both textual images and non-textual images, the OCR text recognition method described above can be used for text recognition, and the textual images and the non-textual images can be obtained. The text recognition result of the textual image, and the keywords are determined according to the text recognition result to classify the textual image and the non-textual image. Since the image classification is performed according to the text content in the image, the classification result is more accurate, and at the same time The determined keywords provide convenience for subsequent use of keywords to search for images, thereby realizing fast image search. In addition, non-text images can also be classified by image content, which also improves the accuracy of the classification results.
在上述文本识别方法的基础上,本发明还提出了一种文档识别处理方法,用于将不同类型的文件,如扫描文件、PDF文件或者图片,转换为可以随时搜索或编辑的文本。当用户想找一个文件或图片,但是不记得标题,只能想起文档中的几个词,然而由于文档为不可编辑的格式,因此无法根据文档中的词来搜索到该文档。采用本发明提供的文档识别处理方法,由于将不可编辑的文档转换为可编辑的文档,因此当根据文档内容进行搜索时,此时可以根据文件内容或图片上的文字进行搜索,即只需要在搜索框中输入关键词,不论标题、内容、备注、还是图片上的文字,都能被智能搜索到。On the basis of the above text recognition method, the present invention also proposes a document recognition processing method for converting different types of files, such as scanned files, PDF files or pictures, into texts that can be searched or edited at any time. When a user wants to find a file or picture, but does not remember the title, he can only think of a few words in the document. However, because the document is in an uneditable format, the document cannot be searched based on the words in the document. With the document identification processing method provided by the present invention, since an uneditable document is converted into an editable document, when searching according to the document content, the search can be carried out according to the document content or the text on the picture, that is, only the Enter keywords in the search box, no matter the title, content, remarks, or text on the picture, it can be intelligently searched.
如图4所示,所述图像识别分类方法包括以下步骤:As shown in Figure 4, the image recognition and classification method includes the following steps:
步骤S301,获取输入图像,所述输入图像中包含待识别的原始文档。Step S301 , acquiring an input image, where the input image contains the original document to be recognized.
原始文档的类型可以是纸质文档,所述输入图像可以通过拍照或者扫描的方式形成,原始文档的类型也可以是电子文档,例如不可编辑文字的PDF文档或图片文档,此时所述输入图像可以直接获取。The type of the original document may be a paper document, the input image may be formed by taking a photo or scanning, and the type of the original document may also be an electronic document, such as a PDF document or a picture document with uneditable text, in this case the input image can be obtained directly.
步骤S302,对所述输入图像中的所述原始文档进行识别,得到所述原始文档的字符识别结果。Step S302: Recognize the original document in the input image to obtain a character recognition result of the original document.
具体的,可采用如图1所示的文本识别方法对所述输入图像中的所述原始文档进行识别。具体识别过程在此不做赘述。Specifically, the text recognition method shown in FIG. 1 may be used to recognize the original document in the input image. The specific identification process is not repeated here.
步骤S303,根据所述输入图像中所述原始文档的各个字符的位置信息,对所述原始文档的字符识别结果进行排布,得到识别文档。Step S303: Arrange the character recognition results of the original document according to the position information of each character of the original document in the input image to obtain a recognized document.
具体的,所述根据所述输入图像中所述原始文档的各个字符的位置信息,对所述原始文档的字符识别结果进行排布,得到识别文档,包括:Specifically, according to the position information of each character of the original document in the input image, the character recognition results of the original document are arranged to obtain a recognized document, including:
根据所述输入图像中所述原始文档的各个字符的位置信息,将所述原始文档的字符识别结果替换所述原始文档中的原始文本,得到识别文档。According to the position information of each character of the original document in the input image, the original text in the original document is replaced by the character recognition result of the original document to obtain a recognized document.
如图5a、5b所示,图5a示出了包含原始文档的输入图像,图5b示出了最终得到的识别文档,由图5a、5b可知,在处理时可以获取所述原始文档的各个字符在所述输入图像中的坐标信息,从而在得到所述原始文档的字符识别结果之后,根据字符的坐标信息将各个字符放到所述输入图像中的相应位 置以替换原始文档中的字符,从而得到识别文档。As shown in Figures 5a and 5b, Figure 5a shows the input image containing the original document, Figure 5b shows the finally obtained recognized document, it can be seen from Figures 5a and 5b that each character of the original document can be obtained during processing The coordinate information in the input image, so that after the character recognition result of the original document is obtained, each character is placed in the corresponding position in the input image according to the coordinate information of the characters to replace the characters in the original document, thereby Get the identification document.
由以上可知,采用OCR可以将输入图像上的字符转换为可编辑的字符,智能识别,不用手动打字输入,可以将PPT、PDF文件、图片、名片、试卷等瞬间转变为可以编辑修改的电子稿识别文档。为了保证转换字符的准确性,还可以将所述原始文档与所述识别文档进行对比,判断所述识别文档与所述原始文档是否存在区别点,如果存在则在所述识别文档中对所述区别点进行修正。例如,可以使用人工校验的方法将原始文档与输出的识别文档的可编辑电子文本进行对比,找出可编辑电子文本在转化过程中与原始文档的区别点。As can be seen from the above, the use of OCR can convert the characters on the input image into editable characters, intelligent recognition, without manual typing input, can instantly convert PPT, PDF files, pictures, business cards, test papers, etc. into electronic manuscripts that can be edited and modified Identify documents. In order to ensure the accuracy of the converted characters, the original document may also be compared with the identification document to determine whether there is a difference between the identification document and the original document, and if so, compare the identification document to the identification document. Correct the difference. For example, a manual verification method can be used to compare the original document with the output editable electronic text of the identification document, and find out the difference between the editable electronic text and the original document during the conversion process.
优选的,在扫描较厚的书本时,由于文件是在有弧度存在的情况下进行的拍摄的,则获取的输入图像中所述原始文档由于弧度等问题无法识别,或者识别出的识别文档会输出乱码,这种情况下需要对输入图像中所述原始文档的弧度进行校正,将校正去除弧度后的输入图像再进行文本识别及输出,从而避免乱码的发生。具体的,可以采用校正模型识别所述输入图像中所述原始文档的曲线弧度,若所述曲线弧度满足预设的校正条件,则对所述输入图像中所述原始文档进行校正处理以去除所述原始文档的曲线弧度,在实际应用中可以采用人工校正的方法进行校正处理,也可以采用其它校正方法。Preferably, when scanning a thicker book, since the document is shot in the presence of radians, the original document in the acquired input image cannot be recognized due to problems such as radian, or the recognized recognized document may not be recognized. The output is garbled. In this case, it is necessary to correct the radian of the original document in the input image, and then perform text recognition and output on the input image after the correction and remove the radian, so as to avoid the occurrence of garbled characters. Specifically, a correction model may be used to identify the radian of the curve of the original document in the input image, and if the radian of the curve satisfies a preset correction condition, the original document in the input image is corrected to remove all The radian of the curve of the original document can be corrected manually in practical applications, or other correction methods can be used.
如图6a、6b所示,对于原始文档中具有标注引用的字体(其字体大小一般小于文本文字),目前的文档识别处理方法在识别后,标注的内容会出现与原始文档不一致的情况,如图6a、6b中被框选中的若干处内容所示,这需要用户一一核对并且手动修改,大大降低了效率。对于这种情况,本发明采用标注识别模型对所述输入图像进行识别,以识别出所述原始文档中的标注内容,在所述识别文档中,将所述标注内容对应的字符识别结果排版成与所述原始文档一致的格式。本发明通过标注识别模型对输入图像进行识别,将标注内容从原始文档的字符中区分开,将标注内容不是以与其它字符内容相同的文字形式输出,而是以与原始文档一致的形式输出来。图6c为采用本发明的方法进行处理后的识别文档,由图6a、6c可知,在OCR识别的过程中通过所述标注识别模型对标注进行自动识别,再根据识别结果对识别文档在校验后自动排版成与原文本一致的格式,从而达到OCR识别后的文本与原图片 一致,不需要再进行人工校对。As shown in Figures 6a and 6b, for the fonts with annotation references in the original document (the font size of which is generally smaller than the text), after the current document recognition processing method is recognized, the content of the annotation will be inconsistent with the original document, such as In Figures 6a and 6b, the contents of the boxes are shown in several places, which requires the user to check one by one and modify them manually, which greatly reduces the efficiency. In this case, the present invention uses an annotation recognition model to recognize the input image to recognize the annotation content in the original document, and in the recognized document, the character recognition result corresponding to the annotation content is typeset into Format consistent with the original document. The invention recognizes the input image through the label recognition model, distinguishes the label content from the characters of the original document, and outputs the label content not in the same text form as other character content, but in a form consistent with the original document. . Fig. 6c shows the identification document processed by the method of the present invention. It can be seen from Figs. 6a and 6c that in the process of OCR identification, the label is automatically identified by the label recognition model, and then the identification document is checked according to the identification result. After that, it is automatically typeset into a format consistent with the original text, so that the OCR-recognized text is consistent with the original image, and no manual proofreading is required.
综上所述,本发明提供的文档识别处理方法,对输入图像中的待识别文档采用OCR文本识别方法进行识别,从而得到识别文档,由于将不可编辑的文档转换为可编辑的文档,为后续采用文档中的关键词搜索得到该文档提供了便利,实现了文件的快速搜索。此外,通过对输入图像的弧度校正、对文档中标注引用的字体进行识别和调整,降低了输入图像中待识别文档转化成可编辑电子文本过程中的错误,提高了转化的正确率。To sum up, the document recognition processing method provided by the present invention adopts the OCR text recognition method to recognize the document to be recognized in the input image, thereby obtaining the recognized document. Using the keywords in the document to search to obtain the document provides convenience and realizes the fast search of the document. In addition, by correcting the radian of the input image and recognizing and adjusting the fonts marked and referenced in the document, errors in the process of converting the document to be recognized in the input image into editable electronic text are reduced, and the conversion accuracy is improved.
基于同一发明构思,本发明还提供了一种电子设备。如图7所示,所述电子设备包括处理器301、通信接口302、存储器303和通信总线304,其中,所述处理器301、所述通信接口302、所述存储器303通过所述通信总线304完成相互间的通信;Based on the same inventive concept, the present invention also provides an electronic device. As shown in FIG. 7 , the electronic device includes a processor 301 , a communication interface 302 , a memory 303 and a communication bus 304 , wherein the processor 301 , the communication interface 302 , and the memory 303 pass through the communication bus 304 complete communication with each other;
所述存储器303,用于存放计算机程序;The memory 303 is used to store computer programs;
所述处理器301,用于执行所述存储器303上所存放的程序时,可以实现如上文所描述的文本识别方法中的步骤,或者实现如上文所描述的图像识别分类方法中的步骤,或者实现如上文所描述的文档识别处理方法中的步骤。When the processor 301 is configured to execute the program stored in the memory 303, it can implement the steps in the text recognition method described above, or the steps in the image recognition classification method described above, or The steps in the document recognition processing method as described above are implemented.
上述电子设备提到的所述通信总线304可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。该通信总线304可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The communication bus 304 mentioned in the above electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an Extended Industry Standard Architecture (Extended Industry Standard Architecture, EISA) bus or the like. The communication bus 304 can be divided into an address bus, a data bus, a control bus, and the like. For ease of presentation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
通信接口302用于上述电子设备与其他设备之间的通信。The communication interface 302 is used for communication between the above-mentioned electronic device and other devices.
所称处理器301可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等,所述处理器301是所述电子设备的控制中心,利用各种接口和线路连接整个电子设备的各个部分。The so-called processor 301 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), Off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc. The processor 301 is the control center of the electronic device, and uses various interfaces and lines to connect various parts of the entire electronic device.
所述存储器303可用于存储所述计算机程序,所述处理器301通过运行或执行存储在所述存储器303内的计算机程序,以及调用存储在存储器303内的数据,实现所述电子设备的各种功能。The memory 303 can be used to store the computer program, and the processor 301 implements various functions of the electronic device by running or executing the computer program stored in the memory 303 and calling the data stored in the memory 303. Function.
所述存储器303可以包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。The memory 303 may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
基于同一发明构思,本发明还提供了一种计算机可读存储介质,计算机可读存储介质上存储有指令,当指令被执行时,可以实现如上文所描述的文本识别方法中的步骤,或者实现如上文所描述的图像识别分类方法中的步骤,或者实现如上文所描述的文档识别处理方法中的步骤。Based on the same inventive concept, the present invention also provides a computer-readable storage medium, where instructions are stored on the computer-readable storage medium, and when the instructions are executed, the steps in the text recognition method described above can be implemented, or the The steps in the image recognition classification method as described above, or the steps in the document recognition processing method as described above are implemented.
类似地,本发明实施例中的计算机可读存储介质可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。应注意,本文描述的计算机可读存储介质旨在包括但不限于这些和任意其它适合类型的存储器。Similarly, computer-readable storage media in embodiments of the present invention may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. It should be noted that computer-readable storage media described herein are intended to include, but not be limited to, these and any other suitable types of memory.
需要说明的是,附图中的流程图和框图,图示了按照本发明各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,所述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。It should be noted that the flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functions and operations of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logic for implementing the specified logic Executable instructions for the function. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
一般而言,本发明的各种示例实施例可以在硬件或专用电路、软件、固件、逻辑,或其任何组合中实施。某些方面可以在硬件中实施,而其他方面可以在可以由控制器、微处理器或其他计算设备执行的固件或软件中实施。当本发明的实施例的各方面被图示或描述为框图、流程图或使用某些其他图形表示时,将理解此处描述的方框、装置、系统、技术或方法可以作为非限制性的示例在硬件、软件、固件、专用电路或逻辑、通用硬件或控制器或其他计算设备,或其某些组合中实施。In general, the various example embodiments of the invention may be implemented in hardware or special purpose circuits, software, firmware, logic, or any combination thereof. Certain aspects may be implemented in hardware, while other aspects may be implemented in firmware or software that may be executed by a controller, microprocessor or other computing device. While aspects of the embodiments of the invention are illustrated or described as block diagrams, flowcharts, or using some other graphical representation, it is to be understood that the blocks, apparatus, systems, techniques or methods described herein may be taken as non-limiting Examples are implemented in hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controllers or other computing devices, or some combination thereof.
需要说明的是,本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于电子设备、计算机可读存储介质而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。It should be noted that each embodiment in this specification is described in a related manner, and the same and similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. . In particular, for electronic devices and computer-readable storage media, since they are basically similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for related parts.
在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。In this document, relational terms such as first and second, etc. are used only to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such existence between these entities or operations. The actual relationship or sequence. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion such that a process, method, article or device comprising a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.
上述描述仅是对本发明较佳实施例的描述,并非对本发明范围的任何限定,本发明领域的普通技术人员根据上述揭示内容做的任何变更、修饰,均属于权利要求书的保护范围。The above description is only a description of the preferred embodiments of the present invention, and is not intended to limit the scope of the present invention. Any changes and modifications made by those of ordinary skill in the field of the present invention based on the above disclosure all belong to the protection scope of the claims.

Claims (20)

  1. 一种文本识别方法,其特征在于,包括:A text recognition method, comprising:
    识别文本图像中待识别文本中的文本行,并对每一所述文本行以通用文本行框进行标注;Recognizing text lines in the text to be recognized in the text image, and marking each of the text lines with a general text line frame;
    采用字符识别模型识别每一所述文本行中的字符,得到所述待识别文本的初步识别结果;A character recognition model is used to recognize characters in each of the text lines, and a preliminary recognition result of the text to be recognized is obtained;
    采用语言分类模型对所述初步识别结果进行语言识别,获取所述初步识别结果中涉及的语言类型,并根据所述语言类型将所述初步识别结果划分为多个不同的字符部分;Use a language classification model to perform language recognition on the preliminary recognition results, obtain the language types involved in the preliminary recognition results, and divide the preliminary recognition results into a plurality of different character parts according to the language types;
    根据所述语言类型调用相应的语言识别模型,对相应的字符部分进行识别,得到所述待识别文本的目标识别结果。The corresponding language recognition model is called according to the language type, and the corresponding character part is recognized to obtain the target recognition result of the text to be recognized.
  2. 如权利要求1所述的文本识别方法,其特征在于,还包括:识别文本图像中待识别文本的方向,若方向不符合预设条件,则对所述待识别文本的方向进行校正处理;The text recognition method according to claim 1, further comprising: recognizing the direction of the text to be recognized in the text image, and if the direction does not meet a preset condition, correcting the direction of the text to be recognized;
    其中,所述识别文本图像中的待识别文本的方向,包括:Wherein, the identifying the direction of the text to be identified in the text image includes:
    采用方向识别模型识别所述文本图像中的待识别文本的方向,所述方向识别模型为基于CNN的神经网络模型。A direction recognition model is used to recognize the direction of the text to be recognized in the text image, and the direction recognition model is a CNN-based neural network model.
  3. 如权利要求1所述的文本识别方法,其特征在于,所述字符识别模型为基于CTC联结主义时间分类技术和Attention注意力机制的神经网络模型。The text recognition method according to claim 1, wherein the character recognition model is a neural network model based on the CTC connectionism time classification technology and the Attention mechanism.
  4. 如权利要求1所述的文本识别方法,其特征在于,所述字符识别模型采用包含CJK字符集和ISO8859 1-16字符集的训练样本集训练得到。The text recognition method of claim 1, wherein the character recognition model is obtained by training a training sample set comprising a CJK character set and an ISO8859 1-16 character set.
  5. 如权利要求1所述的文本识别方法,其特征在于,所述语言分类模型为基于wiki数据集的fasttext<N-Gram>语言分类模型。The text recognition method according to claim 1, wherein the language classification model is a fasttext<N-Gram> language classification model based on the wiki data set.
  6. 一种图像识别分类方法,其特征在于,包括:A method for image recognition and classification, comprising:
    采用图像识别模型对待分类图像进行识别,识别出文本类图像或非文本类图像;Use the image recognition model to recognize the images to be classified, and identify text images or non-text images;
    采用如权利要求1-5任一项所述的文本识别方法对所述文本类图像或所述非文本类图像中的文本进行识别,得到所述文本类图像或所述非文本类图 像的文本识别结果;Using the text recognition method according to any one of claims 1 to 5 to recognize the text in the textual image or the non-textual image, to obtain the text of the textual image or the non-textual image identification results;
    根据所述文本识别结果确定关键词,根据所述关键词确定所述文本类图像的内容的第一细分类型或所述非文本类图像的内容的第二细分类型,并将所述文本类图像归类到所述第一细分类型对应的文件夹中,将所述非文本类图像归类到所述第二细分类型对应的文件夹中。A keyword is determined according to the text recognition result, a first subdivision type of the content of the textual image or a second subdivision type of the content of the non-textual image is determined according to the keyword, and the text Class images are classified into folders corresponding to the first subdivision type, and the non-text class images are classified into folders corresponding to the second subdivision type.
  7. 如权利要求6所述的图像识别分类方法,其特征在于,在确定所述关键词之后,还包括:The image recognition and classification method according to claim 6, wherein after determining the keyword, it further comprises:
    利用所述关键词对所述文本类图像或所述非文本类图像进行自动命名。The textual image or the non-textual image is automatically named by using the keyword.
  8. 如权利要求6所述的图像识别分类方法,其特征在于,在识别出文本类图像或非文本类图像之后,还包括:The image recognition and classification method according to claim 6, wherein after recognizing the textual image or the non-textual image, the method further comprises:
    将所述文本类图像归类到文本类图像文件夹中,将所述非文本类图像归类到非文本类图像文件夹中;classifying the textual images into a textual image folder, and classifying the non-textual images into a non-textual image folder;
    其中,所述将所述文本类图像归类到所述第一细分类型对应的文件夹中,将所述非文本类图像归类到所述第二细分类型对应的文件夹中,包括:Wherein, classifying the textual images into a folder corresponding to the first subdivision type, and classifying the non-textual images into a folder corresponding to the second subdivision type includes: :
    将所述文本类图像文件夹中的所述文本类图像归类到所述第一细分类型对应的文件夹中,将所述非文本类图像文件夹中的所述非文本类图像归类到所述第二细分类型对应的文件夹中。Classifying the textual images in the textual image folder into a folder corresponding to the first subdivision type, and classifying the non-textual images in the non-textual image folder into the folder corresponding to the second subdivision type.
  9. 如权利要求6所述的图像识别分类方法,其特征在于,所述第一细分类型包括:笔记、证件、收据、截屏、文档、证书中的一种或多种。The image recognition classification method according to claim 6, wherein the first subdivision type comprises: one or more of notes, certificates, receipts, screenshots, documents, and certificates.
  10. 如权利要求6所述的图像识别分类方法,其特征在于,对于识别出的所述非文本类图像,所述图像识别模型识别出所述非文本类图像中的内容;The image recognition and classification method according to claim 6, wherein, for the recognized non-text image, the image recognition model recognizes the content in the non-text image;
    所述图像识别分类方法还包括:The image recognition and classification method further includes:
    根据所述非文本类图像的内容确定所述第二细分类型,并将所述非文本类图像归类到所述第二细分类型对应的文件夹中。The second subdivision type is determined according to the content of the non-text image, and the non-text image is classified into a folder corresponding to the second subdivision type.
  11. 如权利要求10所述的图像识别分类方法,其特征在于,在识别出所述非文本类图像中的内容之后,还包括:The image recognition and classification method according to claim 10, wherein after recognizing the content in the non-text image, the method further comprises:
    根据所述非文本类图像中的内容对所述非文本类图像进行自动命名。The non-text images are automatically named according to the content in the non-text images.
  12. 如权利要求8所述的图像识别分类方法,其特征在于,在将所述文本类图像文件夹中的所述文本类图像归类到所述第一细分类型对应的文件夹 中之后,还包括:The image recognition and classification method according to claim 8, wherein after classifying the textual images in the textual image folder into the folder corresponding to the first subdivision type, further include:
    响应于用户输入搜索词的操作,搜索是否存在与所述搜索词相匹配的关键词,如果存在,则输出所述相匹配的关键词对应的文本类图像。In response to an operation of the user inputting a search term, it is searched whether there is a keyword matching the search term, and if so, a text image corresponding to the matched keyword is output.
  13. 如权利要求8所述的图像识别分类方法,其特征在于,在将所述文本类图像文件夹中的所述文本类图像归类到所述第一细分类型对应的文件夹中之后,还包括:The image recognition and classification method according to claim 8, wherein after classifying the textual images in the textual image folder into the folder corresponding to the first subdivision type, further include:
    响应于用户的打印操作,根据预先配置的一键导入功能,导入所述第一细分类型对应的文件夹中的所有文本类图像以便于打印。In response to a user's printing operation, according to a pre-configured one-key import function, import all text-based images in the folder corresponding to the first subdivision type for printing.
  14. 如权利要求13所述的图像识别分类方法,其特征在于,在执行打印前,还包括:The image recognition and classification method according to claim 13, characterized in that, before performing printing, further comprising:
    若导入的所有文本类图像中存在需要签名的文本类图像,则在所述需要签名的文本类图像中预设的签名区域进行签名;If there is a text image that needs to be signed in all the imported text images, the signature is performed in the preset signature area in the text image that needs to be signed;
    和/或,若导入的所有文本类图像中存在具有缺陷的文本类图像,则对具有缺陷的文本类图像进行滤镜处理。And/or, if there is a defective text-based image in all the imported text-based images, filter processing is performed on the defective text-based image.
  15. 一种文档识别处理方法,其特征在于,包括:A document identification processing method, characterized in that, comprising:
    获取输入图像,所述输入图像中包含待识别的原始文档;obtaining an input image, which contains the original document to be identified;
    采用如权利要求1-5任一项所述的文本识别方法对所述输入图像中的所述原始文档进行识别,得到所述原始文档的字符识别结果;Recognize the original document in the input image by using the text recognition method according to any one of claims 1-5 to obtain a character recognition result of the original document;
    根据所述输入图像中所述原始文档的各个字符的位置信息,对所述原始文档的字符识别结果进行排布,得到识别文档。According to the position information of each character of the original document in the input image, the character recognition results of the original document are arranged to obtain a recognized document.
  16. 如权利要求15所述的文档识别处理方法,其特征在于,所述根据所述输入图像中所述原始文档的各个字符的位置信息,对所述原始文档的字符识别结果进行排布,得到识别文档,包括:The document recognition processing method according to claim 15, wherein the character recognition results of the original document are arranged according to the position information of each character of the original document in the input image to obtain the recognition result. Documentation, including:
    根据所述输入图像中所述原始文档的各个字符的位置信息,将所述原始文档的字符识别结果替换所述原始文档中的原始文本,得到识别文档。According to the position information of each character of the original document in the input image, the original text in the original document is replaced by the character recognition result of the original document to obtain a recognized document.
  17. 如权利要求15所述的文档识别处理方法,其特征在于,在得到识别文档之后,还包括:The document identification processing method according to claim 15, characterized in that, after obtaining the identification document, further comprising:
    将所述原始文档与所述识别文档进行对比,判断所述识别文档与所述原始文档是否存在区别点,如果存在则在所述识别文档中对所述区别点进行修 正。The original document is compared with the identification document to determine whether there is a difference between the identification document and the original document, and if there is, the difference is corrected in the identification document.
  18. 如权利要求15所述的文档识别处理方法,其特征在于,在对所述输入图像中的所述原始文档进行识别之前,还包括:The document recognition processing method according to claim 15, characterized in that before recognizing the original document in the input image, the method further comprises:
    采用校正模型识别所述输入图像中所述原始文档的曲线弧度,若所述曲线弧度满足预设的校正条件,则对所述输入图像中所述原始文档进行校正处理以去除所述原始文档的曲线弧度。A correction model is used to identify the radian of the curve of the original document in the input image, and if the radian of the curve satisfies a preset correction condition, a correction process is performed on the original document in the input image to remove the radian of the original document. Curve radian.
  19. 如权利要求15所述的文档识别处理方法,其特征在于,在得到识别文档之后,还包括:The document identification processing method according to claim 15, characterized in that, after obtaining the identification document, further comprising:
    采用标注识别模型对所述输入图像进行识别,以识别出所述原始文档中的标注内容;Identify the input image by using an annotation recognition model to identify the annotation content in the original document;
    在所述识别文档中,将所述标注内容对应的字符识别结果排版成与所述原始文档一致的格式。In the recognized document, the character recognition result corresponding to the marked content is typeset into a format consistent with the original document.
  20. 一种电子设备,其特征在于,包括处理器、通信接口、存储器和通信总线,其中,所述处理器、所述通信接口、所述存储器通过所述通信总线完成相互间的通信;An electronic device, characterized in that it comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface, and the memory communicate with each other through the communication bus;
    所述存储器,用于存放计算机程序;the memory for storing computer programs;
    所述处理器,用于执行所述存储器上所存放的程序时,实现如权利要求1至5中任一项所述的文本识别方法中的步骤,或者实现如权利要求6至14中任一项所述的图像识别分类方法中的步骤,或者实现如权利要求15至19中任一项所述的文档识别处理方法中的步骤。The processor, when executing the program stored on the memory, implements the steps in the text recognition method as described in any one of claims 1 to 5, or implements any one of claims 6 to 14 The steps in the image recognition and classification method described in item 1, or the steps in the document recognition processing method according to any one of claims 15 to 19 are implemented.
PCT/CN2021/117222 2020-09-15 2021-09-08 Text recognition method, image recognition classification method, and document recognition processing method WO2022057707A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010968750.3 2020-09-15
CN202010968750.3A CN112101367A (en) 2020-09-15 2020-09-15 Text recognition method, image recognition and classification method and document recognition processing method

Publications (1)

Publication Number Publication Date
WO2022057707A1 true WO2022057707A1 (en) 2022-03-24

Family

ID=73759143

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/117222 WO2022057707A1 (en) 2020-09-15 2021-09-08 Text recognition method, image recognition classification method, and document recognition processing method

Country Status (2)

Country Link
CN (1) CN112101367A (en)
WO (1) WO2022057707A1 (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101367A (en) * 2020-09-15 2020-12-18 杭州睿琪软件有限公司 Text recognition method, image recognition and classification method and document recognition processing method
CN113420622A (en) * 2021-06-09 2021-09-21 四川百川四维信息技术有限公司 Intelligent scanning, recognizing and filing system based on machine deep learning
CN113254595B (en) * 2021-06-22 2021-10-22 北京沃丰时代数据科技有限公司 Chatting recognition method and device, electronic equipment and storage medium
CN113792659B (en) * 2021-09-15 2024-04-05 上海金仕达软件科技股份有限公司 Document identification method and device and electronic equipment
CN114173019B (en) * 2021-12-23 2023-12-01 青岛黄海学院 Multifunctional archive scanning device and working method thereof
CN114267046A (en) * 2021-12-31 2022-04-01 上海合合信息科技股份有限公司 Method and device for correcting direction of document image
CN114419636A (en) * 2022-01-10 2022-04-29 北京百度网讯科技有限公司 Text recognition method, device, equipment and storage medium
CN114596566B (en) * 2022-04-18 2022-08-02 腾讯科技(深圳)有限公司 Text recognition method and related device
CN115205868B (en) * 2022-06-24 2023-05-05 荣耀终端有限公司 Image verification method
CN117593752B (en) * 2024-01-18 2024-04-09 星云海数字科技股份有限公司 PDF document input method, PDF document input system, storage medium and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150278658A1 (en) * 2014-03-31 2015-10-01 Kyocera Document Solutions Inc. Image Forming Apparatus Capable of Changing Image Data into Document Data, an Image Forming System, and an Image Forming Method
US20160034559A1 (en) * 2014-07-31 2016-02-04 Samsung Electronics Co., Ltd. Method and device for classifying content
US20160092730A1 (en) * 2014-09-30 2016-03-31 Abbyy Development Llc Content-based document image classification
US20160189008A1 (en) * 2014-12-31 2016-06-30 Xiaomi Inc. Methods and deivces for classifying pictures
WO2019012570A1 (en) * 2017-07-08 2019-01-17 ファーストアカウンティング株式会社 Document classification system and method, and accounting system and method
CN110569830A (en) * 2019-08-01 2019-12-13 平安科技(深圳)有限公司 Multi-language text recognition method and device, computer equipment and storage medium
CN110766020A (en) * 2019-10-30 2020-02-07 哈尔滨工业大学 System and method for detecting and identifying multi-language natural scene text
CN112101367A (en) * 2020-09-15 2020-12-18 杭州睿琪软件有限公司 Text recognition method, image recognition and classification method and document recognition processing method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7106905B2 (en) * 2002-08-23 2006-09-12 Hewlett-Packard Development Company, L.P. Systems and methods for processing text-based electronic documents
US8588528B2 (en) * 2009-06-23 2013-11-19 K-Nfb Reading Technology, Inc. Systems and methods for displaying scanned images with overlaid text

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150278658A1 (en) * 2014-03-31 2015-10-01 Kyocera Document Solutions Inc. Image Forming Apparatus Capable of Changing Image Data into Document Data, an Image Forming System, and an Image Forming Method
US20160034559A1 (en) * 2014-07-31 2016-02-04 Samsung Electronics Co., Ltd. Method and device for classifying content
US20160092730A1 (en) * 2014-09-30 2016-03-31 Abbyy Development Llc Content-based document image classification
US20160189008A1 (en) * 2014-12-31 2016-06-30 Xiaomi Inc. Methods and deivces for classifying pictures
WO2019012570A1 (en) * 2017-07-08 2019-01-17 ファーストアカウンティング株式会社 Document classification system and method, and accounting system and method
CN110569830A (en) * 2019-08-01 2019-12-13 平安科技(深圳)有限公司 Multi-language text recognition method and device, computer equipment and storage medium
CN110766020A (en) * 2019-10-30 2020-02-07 哈尔滨工业大学 System and method for detecting and identifying multi-language natural scene text
CN112101367A (en) * 2020-09-15 2020-12-18 杭州睿琪软件有限公司 Text recognition method, image recognition and classification method and document recognition processing method

Also Published As

Publication number Publication date
CN112101367A (en) 2020-12-18

Similar Documents

Publication Publication Date Title
WO2022057707A1 (en) Text recognition method, image recognition classification method, and document recognition processing method
US11645826B2 (en) Generating searchable text for documents portrayed in a repository of digital images utilizing orientation and text prediction neural networks
US9626555B2 (en) Content-based document image classification
US8340425B2 (en) Optical character recognition with two-pass zoning
CN109858036B (en) Method and device for dividing documents
AU2015203150A1 (en) System and method for data extraction and searching
CN112800848A (en) Structured extraction method, device and equipment of information after bill identification
WO2014086277A1 (en) Professional notebook convenient for electronization and method for automatically identifying page number thereof
CN111914597B (en) Document comparison identification method and device, electronic equipment and readable storage medium
CN112508011A (en) OCR (optical character recognition) method and device based on neural network
US11379690B2 (en) System to extract information from documents
US8953228B1 (en) Automatic assignment of note attributes using partial image recognition results
CN110909123B (en) Data extraction method and device, terminal equipment and storage medium
Isheawy et al. Optical character recognition (ocr) system
WO2022161293A1 (en) Image processing method and apparatus, and electronic device and storage medium
US10460192B2 (en) Method and system for optical character recognition (OCR) of multi-language content
US20220269898A1 (en) Information processing device, information processing system, information processing method, and non-transitory computer readable medium
CN114021543B (en) Document comparison analysis method and system based on table structure analysis
CN110889341A (en) Form image recognition method and device based on AI (Artificial Intelligence), computer equipment and storage medium
CN111357015B (en) Text conversion method, apparatus, computer device, and computer-readable storage medium
CN112036330A (en) Text recognition method, text recognition device and readable storage medium
US10579653B2 (en) Apparatus, method, and computer-readable medium for recognition of a digital document
JP2020047031A (en) Document retrieval device, document retrieval system and program
US20230205910A1 (en) Information processing device, confidentiality level determination program, and method
US20230061725A1 (en) Automated categorization and processing of document images of varying degrees of quality

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21868532

Country of ref document: EP

Kind code of ref document: A1