CN114170594A - Optical character recognition method, device, electronic equipment and storage medium - Google Patents

Optical character recognition method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114170594A
CN114170594A CN202111489294.5A CN202111489294A CN114170594A CN 114170594 A CN114170594 A CN 114170594A CN 202111489294 A CN202111489294 A CN 202111489294A CN 114170594 A CN114170594 A CN 114170594A
Authority
CN
China
Prior art keywords
character recognition
optical character
language type
picture
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111489294.5A
Other languages
Chinese (zh)
Inventor
马勇
王佳华
顾永翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qianxin Technology Group Co Ltd
Secworld Information Technology Beijing Co Ltd
Original Assignee
Qianxin Technology Group Co Ltd
Secworld Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qianxin Technology Group Co Ltd, Secworld Information Technology Beijing Co Ltd filed Critical Qianxin Technology Group Co Ltd
Priority to CN202111489294.5A priority Critical patent/CN114170594A/en
Publication of CN114170594A publication Critical patent/CN114170594A/en
Pending legal-status Critical Current

Links

Images

Abstract

The application provides an optical character recognition method, an optical character recognition device, electronic equipment and a storage medium, wherein the method comprises the following steps: detecting the language type of characters in the picture to be processed; and adopting a character recognition model corresponding to the language type to perform optical character recognition on the picture to be processed. Therefore, optical character recognition is carried out on the picture to be processed without using character recognition models corresponding to each language type, a large amount of repeated recognition operation can be effectively avoided, the optical character recognition efficiency is improved, and the performance overhead is reduced.

Description

Optical character recognition method, device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of image recognition technologies, and in particular, to an optical character recognition method, an optical character recognition apparatus, an electronic device, and a storage medium.
Background
The existing OCR (Optical Character Recognition) technology can recognize characters on pictures into text format, and has high practical application value.
While the OCR technology in use at present is mainly implemented based on artificial intelligence. Specifically, when performing optical character recognition, it is first necessary to determine artificial intelligence models of two stages of OCR, including a character detection model and a character recognition model, and then train the two models through a large number of labeled samples of a certain language. And repeating the training steps by adopting different language sample sets to obtain corresponding models of multiple languages, wherein each language has the two models. And processing the picture to be identified through the two models corresponding to the language types to obtain the output text corresponding to each language type. Then, the output text corresponding to each language type is identified, for example, which output text has the most word information is identified, so as to output the output text corresponding to the language type with the most word information.
However, in the above scheme, the images need to be processed by using the models corresponding to the language types, and a large number of repeated recognition operations are required, which results in low optical character recognition efficiency and high performance overhead.
Disclosure of Invention
An object of the embodiments of the present application is to provide an optical character recognition method, an optical character recognition apparatus, an electronic device, and a storage medium, so as to improve the optical character recognition efficiency and reduce the performance overhead.
The embodiment of the application provides an optical character recognition method, which comprises the following steps: detecting the language type of characters in the picture to be processed; and adopting a character recognition model corresponding to the language type to perform optical character recognition on the picture to be processed.
In the implementation process, the language type of the characters in the picture to be processed is detected firstly, and then the optical character recognition is carried out on the picture to be processed by adopting the character recognition model corresponding to the language type. Therefore, optical character recognition is carried out on the picture to be processed without using character recognition models corresponding to each language type, a large amount of repeated recognition operation can be effectively avoided, the optical character recognition efficiency is improved, and the performance overhead is reduced.
Further, detecting the language type of the characters in the current picture to be processed, including: detecting a character area of the picture to be processed; detecting the language type of characters in each character area; adopting the character recognition model corresponding to the language type to perform optical character recognition on the picture to be processed, comprising the following steps: and respectively carrying out optical character recognition on each character area by adopting a character recognition model corresponding to the language type corresponding to each character area.
In the implementation process, the language type is detected for each character area, and then the character recognition model corresponding to the character area is adopted for each character area to perform optical character recognition, so that when multiple language types exist in the picture to be processed, a good recognition effect can be achieved, and the accuracy of the final output picture to be processed recognition result is improved.
Further, detecting the language type of the characters in the current picture to be processed, including: detecting a character area of the picture to be processed; detecting the language type of characters in each character area; determining a target language type corresponding to the picture to be processed according to the language type corresponding to each character area; adopting the character recognition model corresponding to the language type to perform optical character recognition on the picture to be processed, comprising the following steps: and adopting a character recognition model corresponding to the target language type to perform optical character recognition on the picture to be processed.
In the implementation process, the language type of the characters in each character area is detected, and then the target language type corresponding to the picture to be processed is determined according to the language type corresponding to each character area. Therefore, the target language type corresponding to the picture to be processed is comprehensively determined through the language types corresponding to the plurality of character areas, the detection reliability of the target language type corresponding to the picture to be processed can be improved, the risk of detection errors of the target language type is reduced, and the identification accuracy of the optical characters in the picture to be processed is improved.
Further, determining a target language type corresponding to the picture to be processed according to the language type corresponding to each text region, including: counting the number of character areas corresponding to each language type; determining the language type with the largest number of corresponding character areas; and the language type with the largest number of the corresponding character areas is the target language type corresponding to the picture to be processed.
In the implementation process, the number of the character areas corresponding to each language type is counted; the language type with the largest number of corresponding character areas is used as the target language type, so that the target language type corresponding to the picture to be processed is suitable for most character areas, and the detection reliability of the picture to be processed can be further ensured.
Further, before performing optical character recognition on the to-be-processed picture by using the character recognition model corresponding to the language type, the method further includes: determining the computing power level of the machine; determining a target character recognition model matched with the calculation power level of the local computer from a plurality of character recognition models corresponding to the language types; the plurality of character recognition models corresponding to the language type have different computational demands; adopting the character recognition model corresponding to the language type to perform optical character recognition on the picture to be processed, comprising the following steps: and adopting a target character recognition model corresponding to the language type to perform optical character recognition on the picture to be processed.
It should be understood that in practical applications, some devices have abundant computing resources and high computing power levels, while some devices have insufficient computing resources and low computing power levels. Meanwhile, for the same device, at some time, more idle computing resources result in a high computing power level of the device at that time, and at some time, less idle computing resources result in a low computing power level of the device at that time. In the prior art, the same set of character recognition models is provided for all equipment, so that in some equipment or at some moments, larger resource residues may exist when the character recognition models are operated, and the computing resources of the equipment cannot be fully utilized; in some devices or at some time, when the character recognition model is operated, the computing resources may be insufficient, so that the operation is blocked or even halted.
In the implementation process, a plurality of character recognition models with different computational power requirements are trained in advance, and then a target character recognition model matched with the computational power level of the local machine is determined according to the computational power level of the local machine to perform optical character recognition on the picture to be processed. Therefore, the selected character recognition model can be matched with the calculation power level of the local machine, so that the calculation power resource of the local machine equipment is utilized to the maximum extent, and the problem is avoided.
Further, the determining a computational power level of the machine comprises: acquiring the hardware condition of a local computer; determining a computing environment for optical character recognition according to the hardware condition of a local computer; and calling a preset calculation force detection program in the computing environment to obtain the calculation force level of the computer.
In the implementation process, the calculation environment for optical character recognition is determined according to the hardware condition of the local computer, and then a preset calculation force detection program is called in the calculation environment to obtain the calculation force level of the local computer, so that the calculation capability of the hardware in the local computer can be fully used, the current actual calculation force level of the local computer is determined, and the most appropriate character recognition model is selected. The mode can save hardware investment and improve competitiveness.
Further, determining a computing environment for performing optical character recognition based on native hardware conditions, comprising: if the local computer does not have a GPU (graphic Processing Unit, Graphics Processor, also called a Graphics card), determining that a computing environment for performing optical character recognition is a CPU (Central Processing Unit); if the native machine has an independent GPU and the GPU supports CUDA (computer Unified Device Architecture), determining that a computing environment for performing optical character recognition is the GPU using a CUDA module; if the local computer has an independent GPU, and the GPU does not support CUDA but supports DML (Direct Machine Learning), or the local computer has a GPU integrated in a CPU, and an operating system is Windows10 and a version higher than Windows10, determining the computing environment for performing optical character recognition as the GPU using a DML module; if the situations are not met, the calculation environment for performing the optical character recognition is determined to be the VULKAN module.
In the implementation process, different computing modules are used to realize the computing environment for optical character recognition based on whether the local computer has the GPU or not and which GPU the local computer has, so that the hardware resources of the equipment with different hardware configurations are fully utilized, the hardware investment is saved, and the competitiveness is improved.
The embodiment of the application also provides an optical character recognition method, which comprises the following steps: determining the computing power level of the machine; determining a target character recognition model matched with the calculation power level of the local machine; different character recognition models have different computational demands; and adopting the target character recognition model to perform optical character recognition on the picture to be processed.
As mentioned above, in the practical application process, some devices have abundant computing resources and high computing power level, but some devices have insufficient computing resources and low computing power level. Meanwhile, for the same device, at some time, more idle computing resources result in a high computing power level of the device at that time, and at some time, less idle computing resources result in a low computing power level of the device at that time. In the prior art, the same set of character recognition models is provided for all equipment, so that in some equipment or at some moments, larger resource residues may exist when the character recognition models are operated, and the computing resources of the equipment cannot be fully utilized; in some devices or at some time, when the character recognition model is operated, the computing resources may be insufficient, so that the operation is blocked or even halted. By the implementation scheme, the target character recognition model matched with the calculation power level of the local machine can be determined according to the calculation power level of the local machine so as to perform optical character recognition on the picture to be processed. Therefore, the selected character recognition model can be matched with the calculation power level of the local machine, so that the calculation power resource of the local machine equipment is utilized to the maximum extent, and the problem is avoided.
Further, the determining a computational power level of the machine comprises: acquiring the hardware condition of a local computer; determining a computing environment for optical character recognition according to the hardware condition of a local computer; and calling a preset calculation force detection program in the computing environment to obtain the calculation force level of the computer.
Further, determining a computing environment for performing optical character recognition based on native hardware conditions, comprising: if the machine does not have a GPU (graphics processing Unit), determining the computing environment for carrying out optical character recognition as a Central Processing Unit (CPU); if the local computer has an independent GPU and the GPU supports CUDA, determining that the computing environment for performing optical character recognition is the GPU using a CUDA module; if the native machine has an independent GPU, and the GPU does not support CUDA but supports DML, or the native machine has a GPU integrated in a CPU, and an operating system is Windows10 and a version higher than Windows10, determining that a computing environment for performing optical character recognition is the GPU using a DML module; if the situations are not met, the calculation environment for performing the optical character recognition is determined to be the VULKAN module.
The embodiment of the present application further provides an optical character recognition apparatus, including: the detection module is used for detecting the language type of characters in the picture to be processed; and the first processing module is used for carrying out optical character recognition on the picture to be processed by adopting the character recognition model corresponding to the language type.
The embodiment of the present application further provides an optical character recognition apparatus, including: the determining module is used for determining the computing power level of the computer; the determining module is also used for determining a target character recognition model matched with the calculation power level of the local computer; different character recognition models have different computational demands; and the second processing module is used for carrying out optical character recognition on the picture to be processed by adopting the target character recognition model.
The embodiment of the application also provides electronic equipment, which comprises a processor, a memory and a communication bus; the communication bus is used for realizing connection communication between the processor and the memory; the processor is configured to execute one or more programs stored in the memory to implement any of the above-described optical character recognition methods.
Also provided in an embodiment of the present application is a computer-readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement any of the above-described optical character recognition methods.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
FIG. 1 is a schematic flow chart illustrating a first optical character recognition method according to an embodiment of the present disclosure;
FIG. 2 is a flowchart illustrating a second method for optical character recognition according to an embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating a specific optical character recognition process according to an embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of a first optical character recognition apparatus according to an embodiment of the present disclosure;
FIG. 5 is a schematic structural diagram of a second optical character recognition apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
The first embodiment is as follows:
in order to improve the optical character recognition efficiency and reduce the performance overhead, the embodiment of the application provides an optical character recognition method. Referring to fig. 1, fig. 1 is a schematic flow chart of an optical character recognition method provided in an embodiment of the present application, including:
s101: and detecting the language type of characters in the picture to be processed.
S102: and adopting a character recognition model corresponding to the language type to perform optical character recognition on the picture to be processed.
Therefore, optical character recognition is carried out on the picture to be processed without using character recognition models corresponding to each language type, a large amount of repeated recognition operation can be effectively avoided, the optical character recognition efficiency is improved, and the performance overhead is reduced.
It should be noted that, in the embodiment of the present application, in order to detect the language type of the characters in the picture to be processed, a language type detection model may be trained in advance, so as to perform language type identification based on the language type detection model.
For example, the language type detection model may be implemented using a conventional CNN (Convolutional Neural Networks) based classification model or a more complex CNN model, which is not limited in the embodiments of the present application.
In order to ensure the reliability of the detection, in a possible implementation manner of the embodiment of the present application, the text regions of the picture to be processed may be detected first, and then the language types of the text in each text region are detected respectively, and then optical character recognition is performed on each text region respectively based on the text recognition model corresponding to the language type corresponding to each text region.
Therefore, when optical characters of various language types exist in the picture to be processed, a good recognition effect can be achieved, and the accuracy of the recognition result of the picture to be processed which is finally output is improved.
In another possible implementation manner of the embodiment of the present application, the optical character recognition may be performed on each text area without using a text recognition model of a language type corresponding to each text area. After the language type of the characters in each character area is detected, the target language type corresponding to the picture to be processed is determined according to the language type corresponding to each character area, and then the optical character recognition is carried out on the picture to be processed by adopting the character recognition model corresponding to the target language type.
Therefore, the target language type corresponding to the picture to be processed is suitable for most character areas, and the detection reliability of the picture to be processed can be further ensured.
For example, in the feasible embodiment, the number of the text regions corresponding to each language type may be counted, so as to determine the language type with the largest number of the corresponding text regions, and the language type with the largest number of the corresponding text regions is used as the target language type corresponding to the to-be-processed picture.
It should be understood that, in practical applications, the above two possible embodiments may also be used in combination.
For example, after detecting the text region of the picture to be processed and detecting the language type of the text in each text region, the number of the text regions corresponding to each language type may be counted.
And then determining the target language type with the maximum number of corresponding character areas. And calculating the proportion of the number of the character areas corresponding to the target language type to the total number of the character areas. Considering that the language type detection model has a certain false recognition probability, the recognition accuracy can be improved as much as possible by whether the proportion of the number of the character areas corresponding to the target language type to the total number of the character areas exceeds a preset value (e.g., 90%).
For example, if the ratio is higher than the preset value, it may be determined that the to-be-processed picture has only the characters of the target language type with a high probability, so that the to-be-processed picture may be subjected to optical character recognition by using the character recognition model corresponding to the target language type according to the second feasible implementation manner.
If the ratio is lower than the preset value, it may be determined that the to-be-processed picture has characters of multiple language types, so that, according to the first feasible embodiment, the optical character recognition may be performed on each character region by using the character recognition model of the language type corresponding to each character region.
It should be noted that, in order to implement the detection of the text region of the picture to be processed, in the embodiment of the present application, the text detection model may be trained through a large number of training sample pictures labeled with the text region in advance, so that the trained model is adopted to detect the text region of the picture to be processed.
In this embodiment of the application, the text detection model may be implemented by using a PSENet (progressive Scale Expansion Network) model, a PANNet (Pan-sharpening Network) model, a DBNet (scalable binary Network) model, and the like in a neural Network, but is not limited thereto.
In the embodiment of the application, the training sample pictures can adopt pictures of different language types, so that the pictures to be processed of various language types can have good character region identification capability through one character detection model.
It should be noted that, in the embodiment of the present application, each language type has a corresponding character recognition model. The character recognition model may be implemented by using a Network structure such as CRNN (Convolutional Neural Network), but is not limited thereto. The character recognition model of each language type can be obtained by training sample pictures of optical characters with the language type.
In the embodiment of the present application, only one character recognition model may be trained for each language type, so that the optical character recognition is performed on all devices by using the character recognition model corresponding to the required language type.
However, in the practical application process, some devices have abundant computing resources and high computing power level, but some devices have insufficient computing resources and low computing power level. Meanwhile, for the same device, at some time, more idle computing resources result in a high computing power level of the device at that time, and at some time, less idle computing resources result in a low computing power level of the device at that time.
The same set of character recognition model is provided for all the devices, so that in some devices or at some moments, larger resource surplus may exist when the character recognition model is operated, and the computing resources of the devices cannot be fully utilized; in some devices or at some time, when the character recognition model is operated, the computing resources may be insufficient, so that the operation is blocked or even halted.
Therefore, in order to utilize the computing resources of the device more reasonably, another optical character recognition method is provided in the embodiment of the present application, which can be seen from fig. 2 and includes:
s201: and determining the computing power level of the machine.
In the optical character recognition method, in order to determine the computation power level of the computer, the hardware condition of the computer is acquired, and then the computing environment for performing optical character recognition is determined according to the hardware condition of the computer.
Such as:
if the computer does not have a Graphics Processing Unit (GPU), the computing environment for performing optical character recognition can be determined to be a Central Processing Unit (CPU).
If the native machine has an independent GPU and the GPU supports CUDA, the computing environment for performing optical character recognition can be determined to be the GPU, and a CUDA module is used in the GPU.
If the native machine has a stand-alone GPU that does not support CUDA but supports DML, or the native machine has a GPU integrated in a CPU and the operating system is Windows10 and a version higher than Windows10, it may be determined that the computing environment for performing optical character recognition is the GPU and a DML module is used in the GPU.
If none of the above does, then the computing environment for optical character recognition is determined to be using the VULKAN module (it should be understood that VULKAN is a cross-platform Microsoft Direct X-like technology for directly accessing acceleration hardware and managing input-output devices, typically for game development or artificial intelligence acceleration applications).
After a computing environment for optical character recognition is determined, a preset computing power detection program is called in the computing environment, and the computing power level of the computer can be obtained.
It should be understood that the calculation force detection program can be written by an engineer, for example, a program for calculating a value of pi for one thousand times and recording the calculation time can be written, and the calculation force level of the computer can be obtained according to the recorded calculation time.
It should also be understood that the native application refers to a device that performs the optical character recognition method provided by the embodiments of the present application.
S202: and determining a target character recognition model matched with the calculation power level of the local computer.
It should be noted that in the optical character recognition method, a plurality of different character recognition models can be trained in advance for different language types. The different character recognition models have different computational demands, can adapt to different computing resources and meet different precision requirements.
It should be understood that a plurality of character recognition models corresponding to each language type may all be implemented by using a Network structure such as CRNN (Convolutional Neural Network), and all are obtained by training a sample picture of an optical character having the language type. The difference may be in the structure of the model networks, for example, some text recognition models with large computational requirements may have more neural network layers, and text recognition models with low computational requirements may have fewer neural network layers.
In this embodiment of the present application, the computation power level required by each character recognition model may be configured in advance, and then the target character recognition model matching the computation power level of the local computer is determined.
S203: and carrying out optical character recognition on the picture to be processed by adopting the target character recognition model.
It should be understood that, in the optical character recognition method, before the target character recognition model is used to perform optical character recognition on the picture to be processed, the character detection model may be used to detect the character region in the picture to be processed, and then the target character recognition model is used to perform optical character recognition on the character region in the picture to be processed, so as to avoid performing invalid recognition on the region without the optical character. The description of the text detection model can be found in the foregoing, and is not repeated here.
The present optical character recognition method can be used independently of the first optical character recognition method described above. For example, in step S203, optical character recognition may be performed on the picture to be processed by using the target character recognition model corresponding to each language type in a manner of the prior art, so as to recognize the output text corresponding to each language type, and determine the final output text. Alternatively, when step S203 is executed, the user may specify a language type, and perform optical character recognition on the pictures to be processed by using the target character recognition model corresponding to the language type specified by the user.
However, the present optical character recognition method may also be used in combination with the first optical character recognition method described above. That is, the language type of the text in the picture to be processed may be detected according to the first optical character recognition method, and then the target text recognition model corresponding to the language type is adopted to perform optical character recognition on the picture to be processed.
It should be noted that, in the two optical character recognition methods, in order to facilitate the text region detection and the subsequent optical character recognition, the image to be processed may be preprocessed first, for example, the size of the image to be processed is scaled proportionally, so as to meet the size requirement of each subsequent model for the image. And performing data decoding on the picture to be processed to form bitmap data so as to facilitate subsequent processing and the like.
In the two optical character recognition methods, for the convenience of performing optical character recognition by the character recognition model, independent character direction correction may be performed on each character region.
For example, each text region may be extracted as a text picture, text direction in each text picture may be detected, for example, text direction detection is performed through a classification model, and then each text picture is converted into a picture which is forward to a human viewing angle through a conventional graphic algorithm library (e.g., opencv, etc.), and then each text picture is recognized through a text recognition model.
It should be noted that, in the two optical character recognition methods, after the optical character recognition is performed, in order to ensure readability of the output text, paragraph rearrangement may be performed on the characters recognized by the character recognition model according to coordinates of a character area to which each character belongs, so as to recover a correct paragraph position of the character in the picture as much as possible.
And then, outputting the text after the paragraphs are rearranged.
According to the first optical character recognition method provided by the embodiment of the application, the language type of characters in the picture to be processed is detected firstly, and then optical character recognition is carried out on the picture to be processed by adopting the character recognition model corresponding to the language type. Therefore, optical character recognition is carried out on the picture to be processed without using character recognition models corresponding to each language type, a large amount of repeated recognition operation can be effectively avoided, the optical character recognition efficiency is improved, and the performance overhead is reduced.
According to the second optical character recognition method provided by the embodiment of the application, the target character recognition model matched with the calculation power level of the local computer can be determined according to the calculation power level of the local computer so as to perform optical character recognition on the picture to be processed. Therefore, the selected character recognition model can be matched with the calculation power level of the local machine, and calculation power resources of the local machine are utilized to the maximum extent.
In addition, according to the second optical character recognition method provided by the embodiment of the application, the computing environment for performing optical character recognition can be determined according to the hardware condition of the local computer, and then the preset computing power detection program is called in the computing environment to obtain the computing power level of the local computer, so that the computing power of the hardware in the local computer can be fully used, the current actual computing power level of the local computer can be determined, and the most appropriate character recognition model can be selected. The mode can save hardware investment and improve competitiveness.
In addition, the two optical character recognition methods provided by the embodiment of the application can be combined for use, so that the computational resources of the local equipment are utilized to the maximum extent while the optical character recognition efficiency is improved, the performance overhead is reduced.
Example two:
the present embodiment takes an example of a process of simultaneously using two types of optical character recognition in the first embodiment as an example on the basis of the first embodiment, and further illustrates the present application.
Before executing the optical character recognition process, firstly, pictures with different language types are taken as training pictures, and a language type detection model and a character detection model are trained. And respectively training a heavy-weight character recognition model and a light-weight character recognition model for each language type. The heavy-weight character recognition model has a finer recognition result than the light-weight character recognition model, but needs more computational cost.
The difference between the heavyweight character recognition model and the lightweight character recognition model is that the heavyweight character recognition model has more neural network layers, and the lightweight character recognition model has fewer neural network layers.
Referring to fig. 3, the entire optical character recognition process includes:
step 1, a processing device acquires a picture to be processed.
And 2, acquiring the hardware condition of the processing equipment, and determining a computing environment for performing optical character recognition according to the hardware condition of the processing equipment.
Such as: if the computer does not have a Graphics Processing Unit (GPU), the computing environment for performing optical character recognition can be determined to be a Central Processing Unit (CPU). If the native machine has an independent GPU and the GPU supports CUDA, the computing environment for performing optical character recognition can be determined to be the GPU, and a CUDA module is used in the GPU. If the native machine has a stand-alone GPU that does not support CUDA but supports DML, or the native machine has a GPU integrated in a CPU and the operating system is Windows10 and a version higher than Windows10, it may be determined that the computing environment for performing optical character recognition is the GPU and a DML module is used in the GPU. If the situations are not met, the calculation environment for performing the optical character recognition is determined to be the VULKAN module.
And 3, calling a preset calculation force detection program in the computing environment to obtain the calculation force level of the computer.
And 4, preprocessing the picture to be processed by using the CPU.
For example, the size of the picture to be processed is scaled and data decoding is performed to obtain bitmap data.
And 5, marking character areas from the picture to be processed through the character detection model, acquiring the character picture of each character area, and outputting the coordinate value of each character picture corresponding to the picture to be processed.
And 6, independently detecting the language type of each character picture through a language type detection model, and determining the target language type corresponding to each character picture.
And 7, detecting the character direction of each character picture, and converting the direction of each character picture into a picture with a forward visual angle for human through an opencv and other graphic processing libraries.
It should be understood that there is no timing limitation between step 4 to step 7 and step 2 to step 3. It should also be understood that there is no timing limitation between step 6 and step 7.
And 8, aiming at each character picture, respectively carrying out optical character recognition on each forward character picture by adopting a target character recognition model corresponding to the target language type of each character picture.
It should be understood that the target character recognition model is a character recognition model corresponding to the target language type and matched with the computational power level of the computer, and is one of a heavyweight character recognition model and a lightweight character recognition model corresponding to the target language type.
And 9, for the characters identified by each target character identification model, carrying out paragraph rearrangement according to the coordinates of the character picture to which each character belongs, and outputting the text after paragraph rearrangement.
By the scheme, the rapid identification speed can be realized in the equipment with low computing power, the high-precision identification can be realized in the equipment with high computing power, and the computing resources of the equipment can be fully utilized.
In addition, the technical scheme can also fully use the computing power of hardware in the equipment, achieve better computing power adaptability, fully utilize all hardware, save hardware investment and improve competitiveness.
In addition, the scheme can also be used for efficiently identifying the multi-language type, so that the identification efficiency is improved, and the product competitiveness is improved.
Example three:
based on the same inventive concept, the embodiment of the present application further provides two optical character recognition apparatuses 400 and 500. Referring to fig. 4 and 5, fig. 4 illustrates an optical character recognition apparatus using the method shown in fig. 1, and fig. 5 illustrates an optical character recognition apparatus using the method shown in fig. 2. It should be understood that the specific functions of the apparatus 400 and the apparatus 500 can be referred to the above description, and the detailed description is omitted here as appropriate to avoid redundancy. The apparatus 400 and the apparatus 500 include at least one software functional module that can be stored in a memory in the form of software or firmware or solidified in an operating system of the apparatus 400 and the apparatus 500. Specifically, the method comprises the following steps:
referring to fig. 4, the apparatus 400 includes: a detection module 401 and a first processing module 402. Wherein:
the detection module 401 is configured to detect a language type of a character in a picture to be processed;
the first processing module 402 is configured to perform optical character recognition on the to-be-processed picture by using a character recognition model corresponding to the language type.
In a feasible implementation manner of the embodiment of the present application, the detection module 401 is specifically configured to detect a text region of the picture to be processed, and detect a language type of a text in each text region;
the first processing module 402 is specifically configured to perform optical character recognition on each text region respectively by using a text recognition model of a language type corresponding to each text region.
In another possible implementation manner of the embodiment of the present application, the detection module 401 is specifically configured to detect a text region of the picture to be processed, detect a language type of a text in each text region, and determine a target language type corresponding to the picture to be processed according to the language type corresponding to each text region;
the first processing module 402 is specifically configured to perform optical character recognition on the to-be-processed picture by using a character recognition model corresponding to the target language type.
In the second possible implementation manner, the detection module 401 is specifically configured to count the number of text regions corresponding to each language type; determining the language type with the largest number of corresponding character areas; and the language type with the largest number of the corresponding character areas is the target language type corresponding to the picture to be processed.
In this embodiment of the present application, the detection module 401 is further configured to determine a computation power level of the local computer, and determine, from a plurality of character recognition models corresponding to the language type, a target character recognition model matched with the computation power level of the local computer; the plurality of character recognition models corresponding to the language type have different computational demands;
the first processing module 402 is specifically configured to perform optical character recognition on the to-be-processed picture by using a target character recognition model corresponding to the language type.
In this embodiment of the application, the detection module 401 is specifically configured to obtain a hardware condition of the local computer, determine a computing environment for performing optical character recognition according to the hardware condition of the local computer, and call a preset computation power detection program in the computing environment to obtain a computation power level of the local computer.
In this embodiment of the present application, the detection module 401 is specifically configured to determine that a computing environment for performing optical character recognition is a CPU if the local computer does not have a GPU; if the local computer has an independent GPU and the GPU supports CUDA, determining that the computing environment for performing optical character recognition is the GPU using a CUDA module; if the native has an independent GPU, the GPU does not support CUDA but supports DML, or the native has a GPU integrated in a CPU, and an operating system is Windows10 and a version higher than Windows10, determining that the computing environment for performing optical character recognition is the GPU using a DML module; if the situations are not met, the calculation environment for performing the optical character recognition is determined to be the VULKAN module.
Referring to fig. 5, the apparatus 500 includes: a determination module 501 and a second processing module 502. Wherein:
the determining module 501 is configured to determine a computation power level of the computer;
the determining module 501 is further configured to determine a target character recognition model matched with the computation power level of the local computer; different character recognition models have different computational demands;
the second processing module 502 is configured to perform optical character recognition on the picture to be processed by using the target character recognition model.
In this embodiment of the present application, the determining module 501 is specifically configured to obtain a hardware condition of the local computer, determine a computing environment for performing optical character recognition according to the hardware condition of the local computer, and call a preset computation power detecting program in the computing environment to obtain a computation power level of the local computer.
In this embodiment of the present application, the determining module 501 is specifically configured to determine, if the local computer does not have a GPU, that a computing environment for performing optical character recognition is a CPU; if the local computer has an independent GPU and the GPU supports CUDA, determining that the computing environment for performing optical character recognition is the GPU using a CUDA module; if the native has an independent GPU, the GPU does not support CUDA but supports DML, or the native has a GPU integrated in a CPU, and an operating system is Windows10 and a version higher than Windows10, determining that the computing environment for performing optical character recognition is the GPU using a DML module; if the situations are not met, the calculation environment for performing the optical character recognition is determined to be the VULKAN module.
It should be understood that, for the sake of brevity, the contents described in some embodiments are not repeated in this embodiment.
Example four:
the present embodiment provides an electronic device, which is shown in fig. 6 and includes a processor 601, a memory 602, and a communication bus 603. Wherein:
the communication bus 603 is used for connection communication between the processor 601 and the memory 602.
The processor 601 is configured to execute one or more programs stored in the memory 602 to implement the optical character recognition method in the first embodiment and/or the second embodiment.
It will be appreciated that the configuration shown in fig. 6 is merely illustrative and that the electronic device may include more or fewer components than shown in fig. 6 or have a different configuration than shown in fig. 6. For example, the electronic device further includes components such as a CPU and a GPU.
Illustratively, the electronic device may be a computer, a mobile phone, a tablet, a server, or the like.
The present embodiment also provides a computer-readable storage medium, such as a floppy disk, an optical disk, a hard disk, a flash Memory, a usb (Secure Digital Memory Card), an MMC (Multimedia Card), etc., in which one or more programs implementing the above steps are stored, and the one or more programs can be executed by one or more processors to implement the optical character recognition method in the first embodiment and/or the second embodiment. And will not be described in detail herein.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
In this context, a plurality means two or more.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (14)

1. An optical character recognition method, comprising:
detecting the language type of characters in the picture to be processed;
and adopting a character recognition model corresponding to the language type to perform optical character recognition on the picture to be processed.
2. The optical character recognition method of claim 1, wherein detecting the language type of the text in the current picture to be processed comprises:
detecting a character area of the picture to be processed;
detecting the language type of characters in each character area;
adopting the character recognition model corresponding to the language type to perform optical character recognition on the picture to be processed, comprising the following steps:
and respectively carrying out optical character recognition on each character area by adopting a character recognition model corresponding to the language type corresponding to each character area.
3. The optical character recognition method of claim 1, wherein detecting the language type of the text in the current picture to be processed comprises:
detecting a character area of the picture to be processed;
detecting the language type of characters in each character area;
determining a target language type corresponding to the picture to be processed according to the language type corresponding to each character area;
adopting the character recognition model corresponding to the language type to perform optical character recognition on the picture to be processed, comprising the following steps:
and adopting a character recognition model corresponding to the target language type to perform optical character recognition on the picture to be processed.
4. The method for recognizing optical characters according to claim 3, wherein determining the target language type corresponding to the picture to be processed according to the language type corresponding to each text region comprises:
counting the number of character areas corresponding to each language type;
determining the language type with the largest number of corresponding character areas; and the language type with the largest number of the corresponding character areas is the target language type corresponding to the picture to be processed.
5. The optical character recognition method of any one of claims 1-4, wherein before performing optical character recognition on the picture to be processed by using the character recognition model corresponding to the language type, the method further comprises:
determining the computing power level of the machine;
determining a target character recognition model matched with the calculation power level of the local computer from a plurality of character recognition models corresponding to the language types; the plurality of character recognition models corresponding to the language type have different computational demands;
adopting the character recognition model corresponding to the language type to perform optical character recognition on the picture to be processed, comprising the following steps:
and adopting a target character recognition model corresponding to the language type to perform optical character recognition on the picture to be processed.
6. The optical character recognition method as recited in claim 5, wherein said determining a computational power level of the native machine comprises:
acquiring the hardware condition of a local computer;
determining a computing environment for optical character recognition according to the hardware condition of a local computer;
and calling a preset calculation force detection program in the computing environment to obtain the calculation force level of the computer.
7. The optical character recognition method of claim 6 wherein determining a computing environment for performing optical character recognition based on native hardware conditions comprises:
if the machine does not have a GPU (graphics processing Unit), determining the computing environment for carrying out optical character recognition as a Central Processing Unit (CPU);
if the local computer has an independent GPU and the GPU supports a unified computing device architecture CUDA, determining that a computing environment for performing optical character recognition is the GPU using a CUDA module;
if the local machine is provided with an independent GPU (graphics processing Unit), the GPU does not support CUDA (compute unified device architecture) but supports DML (direct machine learning) technology, or the local machine is provided with a GPU integrated in a CPU, and an operating system is Windows10 and a version higher than Windows10, determining a computing environment for performing optical character recognition as the GPU using a DML module;
if the situations are not met, the calculation environment for performing the optical character recognition is determined to be the VULKAN module.
8. An optical character recognition method, comprising:
determining the computing power level of the machine;
determining a target character recognition model matched with the calculation power level of the local machine; different character recognition models have different computational demands;
and adopting the target character recognition model to perform optical character recognition on the picture to be processed.
9. The optical character recognition method as recited in claim 8, wherein said determining a computational power level of the native machine comprises:
acquiring the hardware condition of a local computer;
determining a computing environment for optical character recognition according to the hardware condition of a local computer;
and calling a preset calculation force detection program in the computing environment to obtain the calculation force level of the computer.
10. The optical character recognition method of claim 9 wherein determining a computing environment for performing optical character recognition based on native hardware conditions comprises:
if the machine does not have a GPU (graphics processing Unit), determining the computing environment for carrying out optical character recognition as a Central Processing Unit (CPU);
if the local computer has an independent GPU and the GPU supports a unified computing device architecture CUDA, determining that a computing environment for performing optical character recognition is the GPU using a CUDA module;
if the local machine is provided with an independent GPU (graphics processing Unit), the GPU does not support CUDA (compute unified device architecture) but supports DML (direct machine learning) technology, or the local machine is provided with a GPU integrated in a CPU, and an operating system is Windows10 and a version higher than Windows10, determining a computing environment for performing optical character recognition as the GPU using a DML module;
if the situations are not met, the calculation environment for performing the optical character recognition is determined to be the VULKAN module.
11. An optical character recognition apparatus, comprising:
the detection module is used for detecting the language type of characters in the picture to be processed;
and the first processing module is used for carrying out optical character recognition on the picture to be processed by adopting the character recognition model corresponding to the language type.
12. An optical character recognition apparatus, comprising:
the determining module is used for determining the computing power level of the computer;
the determining module is also used for determining a target character recognition model matched with the calculation power level of the local computer; different character recognition models have different computational demands;
and the second processing module is used for carrying out optical character recognition on the picture to be processed by adopting the target character recognition model.
13. An electronic device, comprising: a processor, a memory, and a communication bus;
the communication bus is used for realizing connection communication between the processor and the memory;
the processor is configured to execute a program stored in the memory to implement the optical character recognition method according to any one of claims 1 to 10.
14. A computer-readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the optical character recognition method according to any one of claims 1 to 10.
CN202111489294.5A 2021-12-07 2021-12-07 Optical character recognition method, device, electronic equipment and storage medium Pending CN114170594A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111489294.5A CN114170594A (en) 2021-12-07 2021-12-07 Optical character recognition method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111489294.5A CN114170594A (en) 2021-12-07 2021-12-07 Optical character recognition method, device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114170594A true CN114170594A (en) 2022-03-11

Family

ID=80484240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111489294.5A Pending CN114170594A (en) 2021-12-07 2021-12-07 Optical character recognition method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114170594A (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100332912A1 (en) * 2009-06-26 2010-12-30 International Business Machines Corporation Visual feedback system for users using multiple partitions on a server
JP2011180687A (en) * 2010-02-26 2011-09-15 Mitsubishi Electric Corp Multilingual document analysis device
US20140181039A1 (en) * 2012-12-21 2014-06-26 Zetta, Inc. Systems and methods for on-demand data storage
US20140180915A1 (en) * 2012-12-21 2014-06-26 Zetta, Inc. Systems and methods for real-time billing and metrics reporting
CN109685055A (en) * 2018-12-26 2019-04-26 北京金山数字娱乐科技有限公司 Text filed detection method and device in a kind of image
CN109948696A (en) * 2019-03-19 2019-06-28 上海七牛信息技术有限公司 A kind of multilingual scene character recognition method and system
CN109948615A (en) * 2019-03-26 2019-06-28 中国科学技术大学 Multi-language text detects identifying system
CN110210469A (en) * 2019-05-31 2019-09-06 中科软科技股份有限公司 A kind of method and system identifying picture character languages
CN110569830A (en) * 2019-08-01 2019-12-13 平安科技(深圳)有限公司 Multi-language text recognition method and device, computer equipment and storage medium
CN110941984A (en) * 2019-09-25 2020-03-31 西南科技大学 Study room seat state detection method and seat management system based on deep learning
CN111986101A (en) * 2020-07-09 2020-11-24 浙江工业大学 Cerebrovascular map construction method
CN113221632A (en) * 2021-03-23 2021-08-06 奇安信科技集团股份有限公司 Document picture identification method and device and computer equipment
CN113240670A (en) * 2021-06-16 2021-08-10 亿嘉和科技股份有限公司 Image segmentation method for object to be operated in live-wire operation scene
CN113297975A (en) * 2021-05-25 2021-08-24 新东方教育科技集团有限公司 Method and device for identifying table structure, storage medium and electronic equipment
CN113485549A (en) * 2021-06-29 2021-10-08 中国航空规划设计研究总院有限公司 Aviation production line manual operation guiding system and method based on mixed reality technology

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100332912A1 (en) * 2009-06-26 2010-12-30 International Business Machines Corporation Visual feedback system for users using multiple partitions on a server
JP2011180687A (en) * 2010-02-26 2011-09-15 Mitsubishi Electric Corp Multilingual document analysis device
US20140181039A1 (en) * 2012-12-21 2014-06-26 Zetta, Inc. Systems and methods for on-demand data storage
US20140180915A1 (en) * 2012-12-21 2014-06-26 Zetta, Inc. Systems and methods for real-time billing and metrics reporting
CN109685055A (en) * 2018-12-26 2019-04-26 北京金山数字娱乐科技有限公司 Text filed detection method and device in a kind of image
CN109948696A (en) * 2019-03-19 2019-06-28 上海七牛信息技术有限公司 A kind of multilingual scene character recognition method and system
CN109948615A (en) * 2019-03-26 2019-06-28 中国科学技术大学 Multi-language text detects identifying system
CN110210469A (en) * 2019-05-31 2019-09-06 中科软科技股份有限公司 A kind of method and system identifying picture character languages
CN110569830A (en) * 2019-08-01 2019-12-13 平安科技(深圳)有限公司 Multi-language text recognition method and device, computer equipment and storage medium
CN110941984A (en) * 2019-09-25 2020-03-31 西南科技大学 Study room seat state detection method and seat management system based on deep learning
CN111986101A (en) * 2020-07-09 2020-11-24 浙江工业大学 Cerebrovascular map construction method
CN113221632A (en) * 2021-03-23 2021-08-06 奇安信科技集团股份有限公司 Document picture identification method and device and computer equipment
CN113297975A (en) * 2021-05-25 2021-08-24 新东方教育科技集团有限公司 Method and device for identifying table structure, storage medium and electronic equipment
CN113240670A (en) * 2021-06-16 2021-08-10 亿嘉和科技股份有限公司 Image segmentation method for object to be operated in live-wire operation scene
CN113485549A (en) * 2021-06-29 2021-10-08 中国航空规划设计研究总院有限公司 Aviation production line manual operation guiding system and method based on mixed reality technology

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李益红等: "深度学习场景文本检测方法综述", 《计算机工程与应用》, vol. 57, no. 6, pages 42 - 48 *

Similar Documents

Publication Publication Date Title
US9354701B2 (en) Information processing apparatus and information processing method
CN112699775A (en) Certificate identification method, device and equipment based on deep learning and storage medium
CN112036292A (en) Character recognition method and device based on neural network and readable storage medium
CN111832449A (en) Engineering drawing display method and related device
CN113033543B (en) Curve text recognition method, device, equipment and medium
CN111291882A (en) Model conversion method, device, equipment and computer storage medium
CN111931729B (en) Pedestrian detection method, device, equipment and medium based on artificial intelligence
CN109657127B (en) Answer obtaining method, device, server and storage medium
CN111008624A (en) Optical character recognition method and method for generating training sample for optical character recognition
CN114373460A (en) Instruction determination method, device, equipment and medium for vehicle-mounted voice assistant
CN113762455A (en) Detection model training method, single character detection method, device, equipment and medium
CN111985491A (en) Similar information merging method, device, equipment and medium based on deep learning
CN115409041B (en) Unstructured data extraction method, device, equipment and storage medium
CN116774973A (en) Data rendering method, device, computer equipment and storage medium
CN114170594A (en) Optical character recognition method, device, electronic equipment and storage medium
US20220392243A1 (en) Method for training text classification model, electronic device and storage medium
WO2023109086A1 (en) Character recognition method, apparatus and device, and storage medium
CN115620315A (en) Handwritten text detection method, device, server and storage medium
CN113128496B (en) Method, device and equipment for extracting structured data from image
CN114495146A (en) Image text detection method and device, computer equipment and storage medium
CN113205092A (en) Text detection method, device, equipment and storage medium
CN112528984A (en) Image information extraction method, device, electronic equipment and storage medium
CN111401366A (en) Character recognition method, character recognition device, computer equipment and storage medium
CN110909688B (en) Face detection small model optimization training method, face detection method and computer system
CN113158844B (en) Ship supervision method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100032 NO.332, 3rd floor, Building 102, 28 xinjiekouwai street, Xicheng District, Beijing

Applicant after: Qianxin Technology Group Co.,Ltd.

Applicant after: Qianxin Wangshen information technology (Beijing) Co.,Ltd.

Address before: 100032 NO.332, 3rd floor, Building 102, 28 xinjiekouwai street, Xicheng District, Beijing

Applicant before: Qianxin Technology Group Co.,Ltd.

Applicant before: LEGENDSEC INFORMATION TECHNOLOGY (BEIJING) Inc.

CB02 Change of applicant information