CN112861842A

CN112861842A - Case text recognition method based on OCR and electronic equipment

Info

Publication number: CN112861842A
Application number: CN202110304175.1A
Authority: CN
Inventors: 朵思惟; 余梓飞; 张艳丽; 王斐
Original assignee: Tianjin Huizhi Xingyuan Information Technology Co ltd
Current assignee: Tianjin Huizhi Xingyuan Information Technology Co ltd
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2021-05-28

Abstract

The disclosure provides an OCR-based case text recognition method and electronic equipment. The text recognition method comprises the following steps: pre-training a target detection model by using an initial training set formed by initial labeled samples in a case picture sample set; actively learning a plurality of unlabeled samples by using the pre-trained target detection model to select a core sample set with pattern representativeness for manual labeling; merging the labeled core sample set and the initial training set into a labeled sample set; and performing iterative training on the target detection model by using a semi-supervised learning method based on the labeled sample set until the target detection model is determined to meet the preset requirement. The text recognition method provided by the disclosure can realize diversification of the labeled samples, and simultaneously selects a proper target detection model for iterative training, continuously updates and expands the labeled data set, and directly reduces manpower and time loss of manual labeling of the samples.

Description

Case text recognition method based on OCR and electronic equipment

Technical Field

The disclosure relates to the technical field of deep learning, in particular to a case text recognition method based on OCR and electronic equipment.

Background

The conventional paper files generally carry out structured processing and storage on file information in a manual input mode so as to facilitate future electronic file management and query. However, this method is too high in labor cost for processing large-scale file data, and the manual entry is also prone to errors. For structured and simple-format cases, the existing automatic recognition technology can extract the case information through the positioning symbols or simple set conversion aiming at the fixed geometric positions or special positioning symbols of the cases to be recognized, and detect and recognize the characters by utilizing the optical character recognition technology.

In practical situations, the pattern of the file is complex and the pattern is various, such as the situation of stamp, fingerprint interference, character deformation, etc. existing in the file. When the optical character recognition technology based on deep learning is used for character recognition of such cases, a large amount of high-quality labeled samples are needed, but a large amount of labeled samples are difficult to obtain in an industrial application scene. If a training result of a high-precision pre-training model is to be obtained, a large amount of manual work is required to label unlabeled samples. Under the condition of limited sample of the marked cases, a general method or a device is needed to accurately and effectively identify the text information in the cases.

Disclosure of Invention

In view of the above, an object of the present disclosure is to provide a case text recognition method and an electronic device based on OCR.

Based on the above purpose, the present disclosure provides an OCR-based case text recognition method, including:

pre-training a target detection model by using an initial training set formed by initial labeled samples in a case picture sample set;

actively learning a plurality of unlabeled samples in the case picture sample set by using the pre-trained target detection model so as to select a core sample set with pattern representativeness from the unlabeled samples for manual labeling;

in response to receiving the labeled core sample set, merging the labeled core sample set with the initial training set into a labeled sample set;

and performing iterative training on the pre-trained target detection model by using a semi-supervised learning method based on the labeled sample set until the target detection model is determined to meet the preset requirement.

The present disclosure also provides a case text recognition method, including:

detecting a text box from the acquired case picture by utilizing a target detection model which is trained in advance by the case text recognition method based on OCR;

and recognizing the text in the text box by using a preset text recognition model.

The present disclosure also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable by the processor, wherein the processor implements the case text recognition method as described above when executing the computer program.

From the above description, according to the target detection model training method, the case text recognition method and the electronic device provided by the disclosure, the labeled samples cover all possible patterns as much as possible by using an active learning mode, and each pattern contains enough samples, so that the cost of manual labeling is reduced. Meanwhile, a proper text detection model is selected for iterative training in a semi-supervised learning mode, a labeling data set is continuously updated and expanded, the performance of the model is improved, more accurate labeling of files and pictures is realized, and manpower and time loss of manual labeling of samples are directly reduced.

Drawings

In order to more clearly illustrate the technical solutions in the present disclosure or related technologies, the drawings needed to be used in the description of the embodiments or related technologies are briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a case text recognition method based on OCR according to an embodiment of the present disclosure;

FIG. 2 is a schematic flow chart of active learning according to an embodiment of the present disclosure;

FIG. 3 is a schematic flow chart of semi-supervised learning in an embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating a case text recognition method according to an embodiment of the disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

It is to be noted that technical terms or scientific terms used in the embodiments of the present disclosure should have a general meaning as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in the embodiments of the disclosure is not intended to indicate any order, quantity, or importance, but rather to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

As described in the background art, the existing labeling samples are few in types and quantity, and the plate rolling type full coverage cannot be realized. Therefore, it is necessary to provide a method for efficiently selecting a table type so as to diversify the table type. The existing target detection model YOLOv4 may output position information of the target detection box and a confidence thereof, which represents a probability value of the detection category in the target detection box. Different plate-type case files can be selected according to the confidence coefficient, and the purpose of diversification of the case files is achieved. The YOLOv4 model is widely used in the industry because its velocity is 3 to 4 times that of other algorithms. The method is mainly used for general target detection tasks such as pedestrian detection and object detection. The YOLOv4 model has good effect on multi-scale detection of the anchor box mechanism, however, the size of the anchor box provided by the model is based on two specific data sets of visual Object classification voc (visual Object classes) and coco (common Objects in context), and is obtained by using a k-means clustering algorithm. In practical application, due to the fact that the detected objects in special shapes such as horizontal length and vertical length exist in the case text, the size of the universal anchor frame influences the accuracy of the final model training, and at this time, the corresponding anchor frame size needs to be generated according to the sample of the user, and the default value is replaced. Compared with general target detection, the text detection in the case has the following characteristics: 1) the aspect ratio of the text line has a large variation range, and the aspect ratio of Chinese characters is close to 1: 1, the length-width ratio difference between the horizontal arrangement and the vertical arrangement of the text is very large; 2) the text lines have directionality, and the text can be distributed horizontally, vertically or at other angles; 3) font types are very rich. After the characteristics of the case text detection are known, the method disclosed by the invention can be used for detecting the text target by correspondingly adjusting the YOLOv4 model. The text detection task only needs to detect whether the target in the detection box is a character or not, and does not need to finely classify the text box according to the category number of the target like a common target detection task, so that each detection box only needs to be kept with the score of the confidence degree, and the classification branch in the target detection task can be disregarded.

Embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.

Referring to fig. 1, an embodiment of the present disclosure provides an OCR-based case text recognition method, including the following steps:

step S101, pre-training a target detection model by using an initial training set formed by initial labeled samples in a case picture sample set.

The existing paper files are scanned by a scanner to form picture samples, and the picture samples of the files are stored and sorted by a computer terminal and then uploaded to form a sample set for training a target detection model. Due to the fact that the layout formats of the files are complex and the types of the files are various, the samples comprise file picture samples with different styles in a concentrated mode. The file picture sample contains structured information and also contains a plurality of stamps, fingerprints, character deformation and the like. And converting the paper file into a picture sample format, and performing character recognition on the picture sample by an optical character recognition technology to finally form the electronic file.

Some samples in the existing case picture sample set are marked and used as an initial training set, a target detection model is pre-trained through the initial training set, and the pre-trained target detection model is used as a next pre-training model.

Step S102, a plurality of unlabeled samples in the case picture sample set are actively learned by utilizing the pre-trained target detection model, so that a core sample set with pattern representativeness is selected from the unlabeled samples for manual labeling.

In the step, the samples which are not marked in the case picture sample set are actively learned based on the pre-training model obtained in the last step, and representative multiple plate-type case samples are selected as core samples, so that the requirement of marking the diversity of the sample set is met, the samples do not need to be manually selected, and the labor cost is reduced. And manually marking the core sample obtained by active learning, marking the text information in the sample, and improving the accuracy of final character recognition.

Step S103, in response to receiving the labeled core sample set, combining the labeled core sample set and the initial training set into a labeled sample set.

In this step, the labeled core sample set and the initial training set are combined into a labeled sample set, so as to perform iterative training on the pre-trained target detection model.

And S104, performing iterative training on the pre-trained target detection model by using a semi-supervised learning method based on the labeled sample set until the target detection model is determined to meet the preset requirement.

And performing iterative training on the target detection model by a semi-supervised learning method, dividing the labeled sample set into a training set and a verification set, training and verifying the target detection model until the verification value meets the preset requirement, and stopping training the target detection model.

In some embodiments, the initial training set is subjected to data enhancement processing prior to pre-training the target detection model.

Specifically, data enhancement is also called data augmentation, and under the condition that labeled data is insufficient, limited data can generate value equivalent to more data on the premise of not substantially increasing the data. The data enhancement mode used by the present disclosure includes: data enhancement on color (image brightness, saturation, contrast variation), partial object occlusion, random scaling, random cropping, horizontal/vertical flipping, translation transformation, rotation/affine transformation, gaussian noise, blurring, etc. The application of data enhancement techniques can improve the accuracy of the model while helping to mitigate overfitting. In addition, data volume can be increased through data enhancement, a large number of manually marked data sets needed by deep learning are reduced, and under the condition that real training samples cannot be increased, the limitation of small data sets can be overcome through a data enhancement strategy.

In some embodiments, the method further comprises performing the data enhancement processing on the labeled core sample set before merging the labeled core sample set with the initial training set into the labeled sample set, wherein merging the labeled core sample set with the initial training set into the labeled sample set comprises: merging the labeled core sample set subjected to the data enhancement processing and the initial training set subjected to the data enhancement processing into the labeled sample set.

Specifically, the data enhancement processing is carried out on the initial training set and the labeled core sample set, the data volume of the labeled sample set is enriched, and the accuracy of the model is improved.

In some embodiments, the category label information for each target in the labeled sample set only indicates whether the target is text. Specifically, since case identification is directed to identification of text information, in the present disclosure, only whether the labeling information of each target in the labeling sample set is a text is considered, and no consideration is given to labeling of other categories.

In some embodiments, the object detection model comprises the YOLOv4 model, wherein horizontally elongated anchor frames of a first type and vertically elongated anchor frames of a second type are introduced.

In particular, the object detection model selected for use in the present disclosure is the YOLOv4 model, which provides an anchor frame size that is based on a particular visual classification dataset and is not adapted for text recognition. Because detected objects with two special shapes of horizontal thin and vertical thin exist in the file text, anchor frames corresponding to the two detected objects, namely a first type of anchor frame with horizontal thin and a second type of anchor frame with vertical thin, are introduced into the YOLOv4 model. In this disclosure we use the gen _ anchors. py file in the darknet web source code for feature extraction in the YOLOv4 model framework to generate an anchor box size that fits the text dataset.

In some embodiments, selecting the core sample set from the plurality of unlabeled samples comprises: for each detection box of a plurality of detection boxes detected in the plurality of unlabeled samples by active learning, calculating a classification uncertainty for the detection box based on a confidence of the detection box; sorting the plurality of detection frames according to the order of the classification uncertainty of each detection frame from large to small; selecting the first N detection frames in the plurality of sequenced detection frames, and taking the unmarked samples in the plurality of unmarked samples corresponding to the first N detection frames as the core sample set, wherein N represents a preset number.

Specifically, the YOLOv4 model detects the sample of the case picture through an anchor frame, and finally outputs a list. Each entry in the list includes the detection box position information of the case picture sample and a first confidence thereof. The first confidence degree represents the probability value of the character category in the detection box, and the larger the value of the first confidence degree is, the larger the probability of the character is represented.

The uncertainty represents the accuracy of the target detection model on the sample set marking result, the greater the uncertainty is, the inaccurate the target detection model on the sample set marking result is shown, and meanwhile, the table type is reflected to be complicated, and the recognition degree of the model anchor frame on the text is not high. The smaller the uncertainty is, the more accurate the labeling result of the target detection model to the sample set is, that is, the smaller the uncertainty value is, the simpler the description file is, and the structured information is taken as the main point. And calculating uncertainty values according to the first confidence, performing descending arrangement on the uncertainty values, and performing manual labeling on the first N case volume pictures to serve as a labeling sample set for training a target detection model. For text detection, only two categories, namely a text category and a non-text category, are considered when the content of the target detection box is manually marked to belong to the categories.

Specifically, the uncertainty index is defined as follows:

U(B)＝|log(P_max(B))+αlog(s²)|

wherein, P_max(B) Max { p,1-p } represents the maximum value of the probability values of the prediction categories in the target detection box; alpha is a weight parameter, represents the weight occupied by the maximum probability value and the dispersion degree of the two types of probability values in the index, and the value is 0.5; s²The variance of the two types of probabilities is represented, and the degree of dispersion of the probability values of the two types is represented, and the calculation formula is as follows:

wherein n represents the number of classes, p_iIndicates the probability of belonging to a text class in the detection box,

the calculation formula of the mean value of the probability values is as follows:

since we only consider two classes, namely literal and non-literal, we take n-2 here, the above variance formula can be simplified to

Wherein p represents the probability that the detected content of the detection box belongs to the character category.

After the classification uncertainty of all detection frames is determined, sorting the classification uncertainty in a numerical order from large to small, selecting the preset first N detection frames with larger uncertainty numerical values, taking the corresponding samples as the core sample set, and setting the numerical value of N according to specific conditions. Referring to fig. 2, a part of samples are selected from a plurality of unlabeled samples for manual labeling to form a core sample set, and the core sample set is combined into a labeled sample set.

In some embodiments, the manual labeling includes a target detection box position information labeling and a text category labeling. Manually marking the selected picture of the file with a large uncertainty value, marking the position information of the target detection frame, and generally marking the specific position by using four or eight positioning points of the target detection frame. Since the text class is either a text class or a non-text class, the text class is usually labeled with "1" and the non-text class is labeled with "0".

The target detection model is trained through the labeled sample set, the unlabeled samples are actively learned through the trained target detection model, and the samples are continuously selected, so that the labeled samples are diversified, and the manual labeling cost is reduced. With the continuous update of the existing files, the core samples are also continuously updated, and the sample patterns in the marked sample set are better expanded through an active learning mode.

In some embodiments, iteratively training the pre-trained object detection model using a semi-supervised learning approach comprises:

iteratively performing the following operations until it is determined that the target detection model meets the predetermined requirements: performing intermediate training on the target detection model by using the labeled sample set; predicting the temporary unmarked sample set left after the marked sample set is removed from the case picture sample set by using the target detection model subjected to intermediate training to obtain a plurality of labels and corresponding confidence coefficients thereof; selecting a temporary unlabeled sample with the confidence coefficient exceeding a preset threshold value in the temporary unlabeled sample set, and supplementing the temporary unlabeled sample and the label corresponding to the temporary unlabeled sample into the labeled sample set.

Specifically, the intermediate training is to divide the labeled sample set into a training set and a verification set, wherein the training set is used for training the target detection model, the verification set is used for verifying the labeling precision of the target detection model, the data of the temporary unlabeled sample set is predicted after the verification precision is achieved, a plurality of labels and corresponding confidence coefficients of the labels are obtained, and the unlabeled samples with the confidence coefficients exceeding a preset threshold value are selected and supplemented into the labeled sample set. And then training the target detection model, and iterating the model training process until a preset termination condition is reached.

The iterative training process of the model is exemplified below, with reference to fig. 3, and includes the following steps:

step S201, dividing the labeled sample set into a first training set and a first verification set, performing iterative training on the target detection model through the first training set and the first verification set, and outputting a pre-training model A in response to the fact that a first preset termination condition is reached.

Specifically, the labeling sample set is divided into a first training set and a first verification set, the target detection model is trained through the first training set, the model is verified through the first verification set after the training is finished, the labeling precision of the model is obtained, and if the labeling precision of the model does not reach a preset precision threshold value, iterative training is continuously performed on the model. And when the marking precision reaches a preset precision threshold value, triggering a first preset termination condition to obtain a pre-training model A.

Step S202, labeling the temporary unmarked sample set through the pre-training model A to obtain a second confidence coefficient of the temporary unmarked sample set, and taking the elements in the temporary unmarked sample set, of which the second confidence coefficient is greater than a preset threshold value, as a pseudo label sample set.

Specifically, the pre-training model a is used for labeling the sample of the picture of the file in the sample set to obtain a second confidence coefficient of the corresponding target detection frame, and the information label of the target detection frame obtained in the step is called a pseudo label. And selecting temporary unlabeled sample set elements with the second confidence degrees larger than a preset threshold value to form a pseudo label sample set.

Step S203, dividing the labeled sample set into a second training set and a second verification set, merging the pseudo label sample set and the second training set into a third training set, performing iterative training on the pre-training model A through the third training set and the second verification set, and outputting a pre-training model B in response to the fact that a second preset termination condition is reached.

And (3) segmenting the labeled sample set again to form a second training set and a second verification set, combining the pseudo label sample set obtained in the previous step with the second training set, training the pre-trained model A, verifying the pre-trained model A through the second verification set after the training is finished to obtain the labeling precision of the model, and continuing to perform iterative training on the model if the labeling precision of the model does not reach a preset precision threshold. And triggering a second preset termination condition after the marking precision reaches a preset precision threshold value to obtain a pre-training model B.

And step S204, labeling the pseudo label sample set through a pre-training model B to obtain a newly added labeled sample set, and merging the newly added labeled sample set and the labeled sample set into a new labeled sample set.

Specifically, the label information is obtained by labeling the pseudo label sample set through the pre-training model B, and the newly added label information is merged into the original label sample set, so that the original label sample set is expanded, and more label data are obtained.

Repeating the steps S201 to S204 until the elements in all the unmarked sample sets are marked. Iterative training is carried out on the target detection model through the continuously expanded labeling sample set, the labeling precision of the model is improved, so that more unlabeled sample data are labeled, and the workload of manual labeling is reduced.

It should be noted that the method of the embodiments of the present disclosure may be executed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may only perform one or more steps of the method of the embodiments of the present disclosure, and the devices may interact with each other to complete the method.

It should be noted that the above describes some embodiments of the disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Based on the same inventive concept, the present disclosure further provides a case text recognition method corresponding to any of the above embodiments, and referring to fig. 4, an embodiment of the present disclosure provides a case text recognition method, including:

step S401, using a target detection model trained in advance by the method according to any of the embodiments described above, detects a text box from the acquired picture of the case.

Step S402, recognizing the text in the text box by using a preset text recognition model.

In some embodiments, the text recognition model comprises a convolutional recurrent neural network, CRNN, model.

Specifically, a target detection model with high labeling precision is obtained by the model training method in any one of the embodiments, and after a case picture sample is labeled, character recognition is performed by the existing text recognition technology to obtain an electronic case. Text recognition is performed using a convolutional Recurrent Neural network (crnn) model, which is widely used in the field of text recognition. The CRNN model is mainly used for recognizing text sequences of indefinite length end to end, in particular to the problem of scene character recognition. The model does not need to cut individual words in advance, but rather converts text recognition into a time-dependent sequence learning problem, i.e., image-based sequence recognition. The model regards character recognition as a prediction method for a sequence, so a Recurrent Neural Network (RNN) (Recurrent Neural network) network for predicting the sequence is adopted, the characteristics of a picture are extracted through a Convolutional Neural Network (CNN) (probabilistic Neural network), the sequence is predicted through an RNN model, and finally a final result is obtained through a translation layer connected with a main time classification (CTC) (connected Temporal classification), and the recognized characters are output.

Based on the same inventive concept, corresponding to the method of any embodiment, the present disclosure further provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the program, the case text recognition method according to any embodiment is implemented.

Fig. 5 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random Access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

The electronic device of the above embodiment is used to implement the corresponding case text recognition method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

Based on the same inventive concept, corresponding to any of the above-mentioned embodiment methods, the present disclosure also provides a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the case text recognition method according to any of the above-mentioned embodiments.

Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

The computer instructions stored in the storage medium of the above embodiment are used to enable the computer to execute the case text recognition method according to any embodiment, and have the beneficial effects of the corresponding method embodiment, which are not described herein again.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the present disclosure, also technical features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present disclosure as described above, which are not provided in detail for the sake of brevity.

In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures for simplicity of illustration and discussion, and so as not to obscure the embodiments of the disclosure. Furthermore, devices may be shown in block diagram form in order to avoid obscuring embodiments of the present disclosure, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the embodiments of the present disclosure are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that the embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.

The disclosed embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Therefore, any omissions, modifications, equivalents, improvements, and the like that may be made within the spirit and principles of the embodiments of the disclosure are intended to be included within the scope of the disclosure.

Claims

1. An OCR-based case text recognition method comprises the following steps:

2. The method of claim 1, further comprising:

and performing data enhancement processing on the initial training set before pre-training the target detection model.

3. The method of claim 2, further comprising:

performing the data enhancement processing on the labeled core sample set prior to merging the labeled core sample set and the initial training set into the labeled sample set,

wherein merging the labeled core sample set and the initial training set into the labeled sample set comprises: merging the labeled core sample set subjected to the data enhancement processing and the initial training set subjected to the data enhancement processing into the labeled sample set.

4. The method of claim 1, wherein the category label information for each object in the labeled sample set only indicates whether the object is text.

5. The method of any one of claims 1 to 4,

the object detection model comprises a YOLOv4 model, in which a horizontally elongated first type of anchor frame and a vertically elongated second type of anchor frame are introduced.

6. The method of claim 5, wherein selecting the core sample set from the plurality of unlabeled samples comprises:

for each detection box of a plurality of detection boxes detected in the plurality of unlabeled samples by active learning, calculating a classification uncertainty for the detection box based on a confidence of the detection box;

sorting the plurality of detection frames according to the order of the classification uncertainty of each detection frame from large to small;

selecting the first N detection frames in the plurality of sequenced detection frames, and taking the unmarked samples in the plurality of unmarked samples corresponding to the first N detection frames as the core sample set, wherein N represents a preset number.

7. The method of any of claims 1 to 4, wherein iteratively training the pre-trained object detection model using a semi-supervised learning approach comprises:

iteratively performing the following operations until it is determined that the target detection model meets the predetermined requirements:

performing intermediate training on the target detection model by using the labeled sample set;

predicting the temporary unmarked sample set left after the marked sample set is removed from the case picture sample set by using the target detection model subjected to intermediate training to obtain a plurality of labels and corresponding confidence coefficients thereof;

selecting a temporary unlabeled sample with the confidence coefficient exceeding a preset threshold value in the temporary unlabeled sample set, and supplementing the temporary unlabeled sample and the label corresponding to the temporary unlabeled sample into the labeled sample set.

8. A case text recognition method comprises the following steps:

detecting a text box from the acquired picture of the case by utilizing a target detection model which is trained in advance by the method according to any one of claims 1 to 7;

9. The method of claim 8, wherein the text recognition model comprises a Convolutional Recurrent Neural Network (CRNN) model.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable by the processor, wherein the processor implements the method according to claim 8 or 9 when executing the computer program.