CN111062389A

CN111062389A - Character recognition method, device, computer readable medium and electronic device

Info

Publication number: CN111062389A
Application number: CN201911260301.7A
Authority: CN
Inventors: 高文龙; 史仪男
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-12-10
Filing date: 2019-12-10
Publication date: 2020-04-24
Anticipated expiration: 2039-12-10
Also published as: CN111062389B

Abstract

Embodiments of the present application provide a character recognition method, apparatus, computer-readable medium, and electronic device. The character recognition method includes: acquiring a picture to be processed containing text information, detecting each text area contained in the to-be-processed picture; recognizing the text information contained in each text area through a pre-trained first model for text recognition , the first model is obtained by training the text area containing sensitive information and the set output information corresponding to the text area containing sensitive information; the text information identified by the first model is output. The technical solution of the embodiment of the present application obtains a first model of text recognition by training based on the text area containing sensitive information and its set output information, so as to directly output the set output information when the text area contains sensitive information, and then from the model level It avoids the risk of sensitive information leakage caused by re-screening after identifying the text, and improves the privacy of information.

Description

Character recognition method and device, computer readable medium and electronic equipment

Technical Field

The present application relates to the field of computer and communication technologies, and in particular, to a method and an apparatus for character recognition, a computer-readable medium, and an electronic device.

Background

With the demand and the accuracy of character recognition becoming higher and higher, characters in images are generally recognized by constructing a character recognition model so as to achieve higher recognition accuracy. However, these recognition methods or character recognition models cannot meet the requirement of differential character recognition, and especially under the condition that the recognition object includes private information or the requirement of user privacy is high, the privacy of information cannot be guaranteed, so that the character recognition effect is low.

Disclosure of Invention

The embodiment of the application provides a character recognition method and device, a computer readable medium and electronic equipment, so that sensitive information leakage caused by recognition after characters are recognized can be avoided at least to a certain extent from a model level, and the privacy of information is improved.

Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.

According to an aspect of an embodiment of the present application, a method for recognizing characters is provided, including obtaining a to-be-processed picture including text information; detecting each text area contained in the picture to be processed; identifying text information contained in the respective text regions by a pre-trained first model, wherein the first model is trained by training data containing negative sample regions comprising: the method comprises the steps of obtaining a text area containing sensitive information and setting output information corresponding to the text area containing the sensitive information; and outputting the text information identified by the first model.

According to an aspect of an embodiment of the present application, there is provided a character recognition apparatus including: the acquisition unit is used for acquiring the picture to be processed containing the text information; the detection unit is used for detecting each text area contained in the picture to be processed; a recognition unit configured to recognize text information included in each of the text regions by a pre-trained first model, wherein the first model is trained by training data including a negative sample region, the negative sample region including: the method comprises the steps of obtaining a text area containing sensitive information and setting output information corresponding to the text area containing the sensitive information; and the output unit is used for outputting the text information identified by the first model.

In some embodiments of the present application, based on the foregoing solution, the text recognition apparatus includes: a first selecting unit, configured to select a negative sample region from the at least two text region samples based on the recognition result of the first model on the at least two text region samples; a first training unit, configured to train the first model using a negative sample region selected from the at least two text regions as a new text region sample.

In some embodiments of the present application, based on the foregoing solution, the first selecting unit includes: the first calculation unit is used for calculating loss values corresponding to the text regions based on the recognition results of the first model on at least two text region samples; the second calculation unit is used for determining the average loss corresponding to the at least two text region samples according to the loss value corresponding to each text region; a second extracting unit, configured to extract a negative sample region from the at least two text region samples if the average loss is smaller than a loss threshold.

In some embodiments of the present application, based on the foregoing solution, the second selecting unit is configured to: determining the average sliding loss according to the average loss and the sliding parameters; and selecting the corresponding text region with the loss value smaller than the average sliding loss as the negative sample region.

In some embodiments of the present application, based on the foregoing scheme, the to-be-processed picture includes an inspection report; the character recognition apparatus further includes: a first identification unit, configured to identify the inspection information in the inspection report through the first model; the first adjusting unit is used for increasing the loss value corresponding to the inspection report to obtain an increased loss value if the inspection information is matched with preset information; a second training unit to train the first model based on the incremental loss value and the negative sample region.

In some embodiments of the present application, based on the foregoing scheme, each text region included in the picture to be processed is detected by a pre-trained second model; the character recognition apparatus further includes: the third selecting unit is used for determining a model loss value based on positive sample pixel points corresponding to the text regions detected in the sample image and negative sample pixel points selected from the non-text regions; and the third training unit is used for carrying out back propagation training on the model loss value to obtain the second model.

In some embodiments of the present application, based on the foregoing solution, the text recognition apparatus further includes: a region identification unit, configured to identify a text region containing a text segment and a non-text region not containing the text segment in the sample image; and the fourth selecting unit is used for selecting the negative sample pixel points from the non-text area.

In some embodiments of the present application, based on the foregoing solution, the fourth selecting unit includes: a number identification unit for identifying the number of the positive sample pixel points corresponding to the text region; the third calculating unit is used for determining the number of the pixel points of the negative sample to be selected according to the product of the number of the pixel points of the positive sample and the proportion of the positive sample and the negative sample; and the fifth selecting unit is used for selecting the negative sample pixel points with the number of the negative sample pixel points from the non-text region.

In some embodiments of the present application, based on the foregoing solution, the fifth selecting unit is configured to: determining a loss value corresponding to each pixel point in the sample image according to the text region and the pixel label of each pixel point in the sample image; and selecting pixel points with smaller loss values corresponding to the pixel points from the non-text region as the negative sample pixel points.

In some embodiments of the present application, based on the foregoing scheme, the text information includes inspection information in an inspection report as the picture to be processed; the character recognition apparatus further includes: an abnormality identification unit for identifying the inspection item in the inspection information and identifying the abnormal item exceeding the inspection index in the inspection information; and the abnormity display unit is used for displaying the related information of the inspection item and the related information of the abnormal item in a distinguishing manner.

According to an aspect of the embodiments of the present application, there is provided a method for training a character recognition model, where the character recognition model includes a first model for recognizing text, and the method for training the first model includes: inputting training data containing the negative sample area into a recognition network to obtain a recognition result; the negative sample region includes: the method comprises the steps of obtaining a text area containing sensitive information and setting output information corresponding to the text area containing the sensitive information; determining a loss value of the first model according to the identification result and the set output information; and training to obtain the first model based on the loss value of the first model.

According to an aspect of an embodiment of the present application, there is provided a character recognition apparatus including: the identification unit is used for inputting the training data containing the negative sample area into an identification network to obtain an identification result; the negative sample region includes: the method comprises the steps of obtaining a text area containing sensitive information and setting output information corresponding to the text area containing the sensitive information; a loss unit, configured to determine a loss value of the first model based on the recognition result and the setting output information; and the training unit is used for training to obtain the first model based on the loss value of the first model.

According to an aspect of the embodiments of the present application, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the text recognition method as described in the above embodiments.

According to an aspect of an embodiment of the present application, there is provided an electronic device including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the text recognition method as described in the above embodiments.

In the technical solutions provided by some embodiments of the present application, since the negative sample region includes: the text area that contains sensitive information to and the output information that sets for that corresponds with the text area that contains sensitive information, consequently after obtaining the first model that is used for the text identification through the training of negative sample area, can directly output when containing sensitive information in the text area and set for output information, and then avoided discerning after the characters from the model aspect and then screened and the risk that sensitive information that causes reveals, improved the privacy of information.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:

FIG. 1 shows a schematic diagram of an exemplary system architecture to which aspects of embodiments of the present application may be applied;

fig. 2 shows a schematic diagram of a solution to which embodiments of the present application can be applied;

FIG. 3 schematically illustrates a flow diagram of a text recognition method according to one embodiment of the present application;

FIG. 4 schematically illustrates a diagram of text regions in a picture to be processed according to an embodiment of the present application;

FIG. 5 schematically illustrates a flow diagram of a second model of training text detection according to one embodiment of the present application;

FIG. 6 schematically shows a schematic diagram of a second model of text detection according to an embodiment of the present application;

FIG. 7 schematically illustrates a schematic diagram of a detection model according to an embodiment of the present application;

FIG. 8 schematically illustrates a flow chart for selecting negative example pixel points according to an embodiment of the present application;

FIG. 9 schematically illustrates a schematic diagram of a recognition network of text according to one embodiment of the present application;

FIG. 10 schematically illustrates a flow diagram for selecting a negative sample region from at least two text region samples, according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a negative sample region based training method provided by an embodiment of the present application;

fig. 12 is a flowchart illustrating detection and identification of a medical picture according to an embodiment of the present application;

FIG. 13 shows a block diagram of a text recognition apparatus according to an embodiment of the present application;

FIG. 14 shows a block diagram of a text recognition apparatus according to an embodiment of the present application;

FIG. 15 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.

As shown in fig. 1, the system architecture may include a terminal device (e.g., one or more of a smartphone 101, a tablet computer 102, and a portable computer 103 shown in fig. 1, but may also be a desktop computer, etc.), a network 104, and a server 105. The network 104 serves as a medium for providing communication links between terminal devices and the server 105. Network 104 may include various connection types, such as wired communication links, wireless communication links, and so forth.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.

A user may use a terminal device to interact with the server 105 over the network 104 to receive or send messages or the like. The server 105 may be a server that provides various services. For example, a user uploads a to-be-processed picture to the server 105 by using the terminal device 103 (or the terminal device 101 or 102), and the server 105 may acquire the to-be-processed picture including text information and detect each text region included in the to-be-processed picture; recognizing text information contained in each text region by a pre-trained first model for character recognition, wherein the first model is trained by training data containing negative sample regions, the negative sample regions comprising: the text area containing the sensitive information and the set output information corresponding to the text area containing the sensitive information; and outputting the text information identified by the first model. The first model of character recognition is obtained through training based on the text region containing the sensitive information and the set output information thereof, so that the set output information is directly output when the text region contains the sensitive information, the risk of sensitive information leakage caused by screening after the characters are recognized is further avoided from the model level, and the privacy of the information is improved.

It should be noted that the character recognition method provided in the embodiment of the present application is generally executed by the server 105, and accordingly, the character recognition apparatus is generally disposed in the server 105. However, in other embodiments of the present application, the terminal device may also have a similar function as the server, so as to execute the scheme of character recognition provided in the embodiments of the present application.

Fig. 2 shows a schematic diagram of a solution to which embodiments of the present application can be applied.

As shown in fig. 2, the execution subject in the embodiment of the present application may be a terminal device, such as an Artificial Intelligence (AI) device, a mobile phone 210 of a user, and the like, which is not limited herein.

For example, a user takes a to-be-processed picture containing text information through a mobile phone, where the to-be-processed picture in this embodiment may be a hospital examination report 220, where the to-be-processed picture contains text information corresponding to each examination item, and each text area contained in the to-be-processed picture is detected; the text information 230 contained in each text region is identified by a pre-trained first model for word recognition, wherein the first model is trained by training data containing negative sample regions comprising: the text area containing the sensitive information and the set output information corresponding to the text area containing the sensitive information; and outputting the text information identified by the first model. The first model of character recognition is obtained through training based on the text region containing the sensitive information and the set output information thereof, so that the set output information is directly output when the text region contains the sensitive information, the risk of sensitive information leakage caused by screening after the characters are recognized is further avoided from the model level, and the privacy of the information is improved.

As shown in fig. 2, in one embodiment of the present application, the terminal device outputs the text information 230 recognized by the first model on the user interface. The hospital examination report 220 includes personal information of some examiners, such as name, sex, age, case number, etc. of the examiners, and this embodiment directly outputs the set output information, such as characters for directly displaying "sensitive information", after detecting that the information is sensitive information through the first model, and the rest information is normally displayed on the user interface. In the embodiment, through the non-recognition mode, the set output information is directly output when the sensitive information is contained in the text area, so that the privacy of the user is protected, and the privacy of the information is improved.

The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:

fig. 3 shows a flow diagram of a text recognition method according to an embodiment of the present application, which may be performed by a server, which may be the server shown in fig. 1. Referring to fig. 3, the text recognition method at least includes steps S310 to S340, which are described in detail as follows:

in step S310, a to-be-processed picture including text information is acquired.

In an embodiment of the present application, the to-be-processed picture may be obtained by taking a picture in real time by a device with a camera function and calling the to-be-processed picture stored locally, or may be obtained by receiving the to-be-processed picture sent by another device or a storage device, which is not limited herein.

In an embodiment of the present application, the to-be-processed picture is a picture including text information. For example, the to-be-processed picture may be an examination report, a score sheet, a newspaper image, and the like, which is not limited herein.

Illustratively, the terminal device may obtain a to-be-processed picture containing news information by shooting a newspaper; the image stored in the local storage space of the terminal equipment can be called as a picture to be processed; the picture stored in the cloud or other storage positions can be acquired from other storage devices, servers and the like to serve as the picture to be processed.

In step S320, each text region included in the picture to be processed is detected.

In one embodiment of the application, after the picture to be processed is acquired, each text region contained in the picture to be processed is detected from the picture to be processed. The text area represents an area containing text information in the picture to be processed.

In the process of detecting the text region in this embodiment, detection may be performed based on the horizontal row of the text sequence, and when a region where the text is blank is detected, current detection is stopped, and a row of regions containing text information before the blank region is identified as the text region.

Fig. 4 is a schematic diagram of a text region in a to-be-processed picture according to an embodiment of the present disclosure.

Illustratively, based on the hospital examination report form in fig. 2, the text areas 410 each containing text information are obtained by detecting the text information in the hospital examination report form. Wherein, the text area in the hospital test report is the text area in horizontal row.

Fig. 5 is a flowchart of a second model for training text detection according to an embodiment of the present disclosure.

As shown in fig. 5, the method for detecting each text region included in the to-be-processed picture in this embodiment may be obtained by detecting through a pre-trained second model. Wherein, in the process of training the second model, steps S510 to S520 may be included, which are described in detail as follows:

in step S510, a model loss value is determined based on positive sample pixel points corresponding to the text regions detected in the sample image and negative sample pixel points selected from the non-text regions.

In an embodiment of the present application, in the process of training the second model, a detection network is first constructed based on a deep residual network (ResNet). The structure of the detection network of the present embodiment is described in detail as follows:

fig. 6 is a schematic diagram of a second model of text detection according to an embodiment of the present application. As shown in fig. 6, the detection network is obtained by training a Progressive Scale expansion network (PSENet) model based on image segmentation. The PSENet model consists of two parts: the front end is a Feature graph F (630) which is fused with multi-scale features and obtained through a ResNet (610) and a Feature Pyramid Network (FPN) (620); the back end is a multi-branch prediction structure (640).

In an embodiment of the application, in the process of training the detection network based on the sample image, firstly, normalization of red-green-blue (RBG) dimensions is performed on the sample image, and the sample image is processed to have an unspecific size, so that the effect and efficiency of model training are improved. The normalized sample images are then input into a backbone network, ResNet (610), to extract visual features at different feature levels. Wherein, the image corresponding to each feature level carries different features; inputting the small-sized feature maps output in ResNet50(610) into FPN (620) for multi-scale feature fusion, and extracting low-dimensional features, namely P, in the feature maps by integrating feature maps of different feature levels in the FPN (620)₂～P₅Effectively fusing features of different scales in a backbone network under the condition of basically no extra overhead to obtain a feature map fused with multi-scale features; after fusion characteristics of different levels are obtained, performing fusion processing on the characteristic graphs with the multi-scale characteristics to obtain F (630); f (630) is input to a back-end multi-branch prediction structure 640, which projects into n branches to produce a plurality of split results S₁～S_n-1And S_n. Wherein, each segmentation result is a segmentation mask of all text instances in a certain range, and is used for separating compact characters to obtain text regions.

Specifically, in the multi-branch prediction structure 640, a feature map fused with multi-scale features obtained by the front-end FPN is first mapped onto a plurality of branches. Each branch represents a prediction at one scale. The scale here corresponds to the kernel, i.e. each block of text instance in the prediction graph, the real size of the kernel, i.e. text instance, at full scale corresponds to the size on the prediction graph, and the small-scale kernel compresses each text instance on the prediction graph by a certain size. Therefore, the prediction graph corresponding to the small-scale kernel can be easily separated into very close characters. And finally, starting from the prediction graph corresponding to the minimum scale, firstly obtaining all character examples on the whole graph, wherein the position of each character example in the prediction graph is the corresponding central position of each kernel. And (4) amplifying each kernel in four directions from top to bottom, left to right by taking the corresponding kernel in the prediction graph which is larger than the kernel by one-level scale as a target size until the kernel is amplified to the target size. And finally, amplifying all prediction images smaller than the complete scale to obtain a final image segmentation result, and wrapping each segmentation result by using a frame to obtain a final detection result.

Further, in this embodiment, when the application scenario is to use the inspection report as the to-be-processed picture, the image content in the inspection report is relatively single, and it is not necessary that the backbone network has a strong capability of extracting the semantic features of the image, so that some simplification of the backbone network is considered, and the operation speed of the backbone network is accelerated. Optionally, the network structure of RestNet is replaced by a random network structure taking separable convolution and packet convolution as a core, so that the network model can run faster.

FIG. 7 is a schematic diagram of a detection model provided in an embodiment of the present application; wherein, the left graph is a separable convolution structure, and the right graph is a grouping convolution structure.

As shown in the left diagram of fig. 7, in the separable convolution structure, the sample image (7101) is firstly input into the separable convolution structure, and a first convolution result is output by performing 1 × 1 convolution (7103) through a separation channel (7102); passing the first convolution result through a batch normalized linear rectification function (BNReLU), and performing 3 × 3 deep filtering convolution on the first convolution result to obtain a second convolution result, wherein 3 × 3DWConv (7104); and (3) performing Batch Normalization (BN) on the second convolution result, performing 1 × 1 convolution again (7105), performing BNReLU (bayonet normalized navigation) on the output result, outputting a third convolution result, performing connection fusion (7106) on the third convolution result and the separation data output by the separation channel, and obtaining a final output result (7108) through a random channel (7107).

As shown in the right diagram of fig. 7, in the grouping convolution structure, firstly, a sample image (7201) is grouped to obtain a first sample and a second sample, and the first sample is subjected to 3 × 3 depth filtering convolution (7202) to obtain a fourth convolution result, wherein the convolution step size stride is 2; performing BN treatment on the fourth convolution result, and performing 1 × 1 convolution treatment (7203) to obtain a fifth convolution result; meanwhile, carrying out 1 × 1 convolution processing (7204) on the second sample to obtain a sixth convolution result; BNReLU processing is carried out on the sixth convolution result, and a seventh convolution result is obtained by carrying out 3 multiplied by 3 depth filtering convolution (7205); performing 1 × 1 convolution processing (7206) on the seventh convolution result to obtain an eighth convolution result; and performing connection fusion processing on the BNReLU processing result of the fifth convolution result and the BN ReLU processing result of the eighth convolution result (7207), and obtaining a final output result (7209) through a random channel (7208).

In an embodiment of the present application, as shown in fig. 8, before determining the model loss value based on the positive sample pixel point corresponding to the text region detected in the sample image and the negative sample pixel point selected from the non-text region in step S510, steps S810 to S820 are further included, which are described in detail as follows:

in step S810, a text region containing the text segment and a non-text region not containing the text segment in the sample image are identified.

In an embodiment of the application, in a training process, different text examples are not treated differently by a model, and it is desirable that the model has higher accuracy in detecting an index item and automatically filters when sensitive information is detected. The method includes the steps of identifying a text region containing text information and a non-text region not containing the text information in the sample image, wherein pixel points in the text region are positive sample pixel points, selecting negative sample pixel points from the non-text region according to the positive sample pixel points in the embodiment, calculating a model loss value of a detection model according to the positive sample pixel points and the negative sample pixel points, and training the detection network based on the model loss value to obtain a second model.

In step S820, a negative sample pixel point is selected from the non-text region.

In an embodiment of the present application, in the process of selecting the negative sample pixel point from the non-text region in the above steps, the method specifically includes:

identifying the number of positive sample pixel points corresponding to the text region;

determining the number of the negative sample pixel points to be selected according to the product of the number of the positive sample pixel points and the proportion of the positive sample pixel points and the negative sample pixel points;

and selecting a plurality of negative sample pixel points from the non-text region.

In an embodiment of the present application, the pixel points in the text region in this embodiment are positive sample pixel points, and when the number of positive sample pixel points is identified, the number may be determined according to the image resolution of the original sample image, or the number of positive sample pixel points may be determined by performing unified processing on image features in the text region and the non-text region.

In an embodiment of the application, a positive-negative sample ratio is preset, a product of the positive-negative sample ratio and the number of the positive sample pixel points is calculated to obtain the number of the negative sample pixel points to be selected, and the negative sample pixel points are selected from the non-text region based on the number of the negative sample pixel points.

For example, the ratio of positive samples to negative samples is r, and the total area of positive samples containing text information in one sample image is S, that is, the number of pixels of the positive samples is S. The number of negative sample pixel points calculated is r × S. For example, when r is 5, the number of pixels needed to select 5S negative samples is calculated.

It should be noted that, in many cases, the area of the text region in the sample image is different from the area of the non-text region, and there may be a case that the area of the non-text region is smaller than the area of the text region, in this case, in order to ensure that the number of negative sample pixels is selected, the number of negative sample pixels may be selected repeatedly from the non-text region, so as to ensure that the ratio between the number of negative sample pixels and the number of positive sample pixels is constant, and further ensure the accuracy of the calculation of the model loss value.

In an embodiment of the application, after the positive sample pixel point and the negative sample pixel point participating in calculating the model loss value are determined, the model loss value is calculated according to the positive sample pixel point and the negative sample pixel point. The specific calculation method may be a method based on a loss function, which is not described herein.

In an embodiment of the present application, in a process of selecting a negative sample pixel point from a non-text region, the method specifically includes:

determining a loss value corresponding to each pixel point in the sample image according to the text region and the pixel label of each pixel point in the sample image;

and selecting pixel points with smaller loss values corresponding to the pixel points from the non-text region as negative sample pixel points.

In an embodiment of the application, after the text region and the non-text region are detected, a loss value corresponding to each pixel point in the non-text region is calculated according to a pixel label corresponding to each pixel point in the sample image. And selecting the pixel points with smaller loss values from the non-text region according to the loss values corresponding to the pixel points as negative sample pixel points.

Furthermore, in order to reflect the effect of the negative sample pixel points in the application, the number of the negative sample pixel points can be determined by combining the number of the positive sample pixel points, and then the pixel points with smaller loss value, which are the number of the negative sample pixel points, are selected from the non-text region to serve as the negative sample pixel points.

Illustratively, the ratio of the positive sample to the negative sample is r, the total area of the positive samples containing text information in one sample image is S, that is, the number of pixels in the positive samples is S, and the number of pixels in the negative samples obtained by calculation is r × S. In the process of training the detection model, calculating a loss value of each pixel point in the picture, sequencing the loss values of the pixel points, selecting r × S negative sample pixel points with the minimum loss value from a non-text region, and calculating the model loss value of the whole detection network by using the r × S negative sample pixel points and the positive sample pixels.

In step S520, the model loss value is subjected to back propagation training to obtain a second model.

In an embodiment of the present application, after the model loss value is obtained through calculation, the neural network back propagation calculation is performed based on the model loss value, so as to continuously update various parameters in the model, so that the model loss value is continuously decreased, that is, the output data is more ideal, and the second model is obtained.

In this embodiment, the number of the positive sample pixel points and the data of the negative sample pixel points are determined, and based on the loss value of each pixel point, a certain number of negative sample pixel points with lower loss values are selected, that is, the negative sample pixel points with higher number of times than the positive sample pixel points and lower recognition effect are compared to participate in calculation of the model loss value, so that the model loss value is improved, after the second model is trained by back propagation according to the model loss value, the accuracy of the second model training is ensured, and then the text region is accurately detected through the second model.

In step S330, the text information included in each text region is identified by a pre-trained first model, wherein the first model is trained by training data including negative sample regions, and the negative sample regions include: the text area containing the sensitive information and the setting output information corresponding to the text area containing the sensitive information.

In one embodiment of the present application, the first model is pre-trained and is used for recognizing text information contained in the text region. The training sample of the first model comprises a negative sample region corresponding to a text region of sensitive information and set output information corresponding to the region, and the recognition network is trained by the text region based on the sensitive information and the output information corresponding to the text region, so that in the process of recognizing the text information in the text region, when the text information is detected to be the sensitive information, the text recognition is not carried out, the set output information is directly output, the desensitization effect of the text recognition is realized, and the privacy of a user is protected.

Further, due to the heterogeneity of the image to be processed, various information other than the sensitive information, such as noise, may be included in the image, and the processing manner for these cases is the same as that for the sensitive information, which is not described herein again. In this way, the accuracy of noise processing in the image to be processed can be reduced, so that a more accurate recognition result can be obtained.

In an embodiment of the present application, the first model is trained for a recognition network constructed based on a connection permission classification (CTC). The input of the text region is the text region output by the second model, and the output is specific text information or set output information.

Fig. 9 is a schematic diagram of a text recognition network according to an embodiment of the present application.

As shown in fig. 9, the process flow of identifying the network is from bottom to top, and first, the text region is preprocessed, for example, normalized in size, to obtain a text region with a uniform height (910); feeding the text region (910) into a Visual Geometry Group (VGG) to extract visual features (920); then, recombining the extracted visual features (920) to obtain recombined features (930); sending the recombination characteristics (930) into a front-back combined long-time memory network (Bi-directional Long short-term memory, Bi-LSTM) (940) to obtain semantic information in the image; finally, the probability transition matrix between each word in the sentence can be found through the CTC (950), and the character information in the text area is identified and obtained.

In an embodiment of the present application, in a process of obtaining parameters of a first model based on the above training of the recognition network, the method includes the following steps:

selecting a negative sample region from the at least two text region samples based on the recognition results of the first model on the at least two text region samples;

and training the first model by taking the negative sample region selected from the at least two text regions as a new text region sample.

In one embodiment of the application, in the process of training the recognition network, the recognition result is obtained by inputting at least two text region samples into the recognition network as a batch of text region samples. And selecting text region samples with low recognition effect from the recognition results as negative sample regions, and performing targeted training based on the negative sample regions to obtain more accurate parameters of the first model.

In an embodiment of the present application, as shown in fig. 10, the above-mentioned process of selecting a negative sample region from at least two text region samples based on the recognition result of the first model for the at least two text region samples includes steps S1010 to S1030:

in step S1010, a loss value corresponding to each text region is calculated based on the recognition results of the first model for the at least two text region samples.

In one embodiment of the present application, at least two text region samples are input into a recognition network, and a recognition result corresponding to each text region is obtained. And calculating loss values corresponding to the text areas according to the recognition results and the text labels set in the text areas. The specific loss value calculation method may be based on a loss function, which is not described herein.

In step S1020, an average loss corresponding to at least two text region samples is determined according to the loss value corresponding to each text region.

In an embodiment of the application, after the loss value corresponding to each text region is obtained through calculation, according to the loss value corresponding to each text region, the average loss corresponding to the batch of text region samples is determined, so as to measure the recognition effect of the batch of text region samples based on the average loss.

Optionally, the method of calculating the average loss corresponding to a batch of text region samples may be a method of directly obtaining an average value according to the loss value of each text region. Or determining the corresponding weight according to the area of each text region, and calculating a weighted average mode according to the weight and the loss value, which is not limited herein.

In step S1030, if the average loss is less than the loss threshold, a negative sample region is selected from the at least two text region samples.

In an embodiment of the present application, after the average loss is calculated, if the average loss is smaller than a preset loss threshold, it indicates that the recognition effect of the text region in the batch is low, and a negative sample region may be selected from the text region samples in the batch because text regions which are difficult to recognize exist therein.

Exemplarily, the loss threshold is set to be 1.0, and in the training process based on a batch of text region samples, when the calculated average loss is less than 1.0, a negative sample region is selected from the text region for targeted training, so that the model training effect is further improved.

In one embodiment of the present application, if the average loss is greater than or equal to the loss threshold, the negative sample region is not selected, and training continues according to the original training sample.

In an embodiment of the present application, the process of selecting the negative sample region from the text region in step S1030 specifically includes:

determining the average sliding loss according to the average loss and the sliding parameters;

and selecting the corresponding text region when the loss value is smaller than the average sliding loss as a negative sample region.

In an embodiment of the present application, after the average loss is calculated, if the average loss is smaller than the loss threshold, the average sliding loss is determined according to the average loss and the sliding parameter, so that the corresponding text region when the loss value is smaller than the average sliding loss is selected as the negative sample region for performing the targeted training.

Specifically, the formula for calculating the average sliding loss is as follows: a LOSS _ Y + (1-a). LOSS _ A; where a represents a LOSS parameter, LOSS _ Y represents a LOSS threshold, and LOSS _ a represents an average LOSS.

Illustratively, the loss threshold is set to be 1.0, and a is set to be 0.8. in the training process based on a batch of text region samples, if the calculated average loss is 0.9, the average sliding loss is 0.8 × 1.0+ (1-0.8) × 0.9 — 0.98. When the loss corresponding to a certain text region is less than 0.98, the text region is selected as a negative sample region.

Fig. 11 is a schematic diagram of a training method based on a negative sample region according to an embodiment of the present application.

As shown in fig. 11, a plurality of text regions are included in a batch of text region training samples 1110. After training is performed based on each text region in the text region training sample 1110, a recognition result 1120 corresponding to each text region is obtained. Determining the loss value of each text region according to the set label corresponding to each text region and the recognition result, selecting some text regions as negative sample regions, namely 1121, 1122 and 1123, according to the loss value of each text region and the method, and putting the negative sample regions into the next training, namely 1131.

Further, in this embodiment, the number of text regions in a batch of text region training samples is constant, and in order to ensure the uniformity of training, the remaining number of text regions are filled up through the remaining text regions, that is, 1132, so as to finally obtain a next batch of text region samples 1130, and then the next batch of text region samples 1130 is put into the next training.

For example, the to-be-processed picture in this embodiment may be an inspection report, and in order to highlight the inspection information in the inspection report, in this embodiment, in the process of training the first model, the following steps may be included:

inputting the inspection report into the first model to obtain inspection information in the inspection report;

if the inspection information is matched with the preset information, increasing the loss value corresponding to the inspection report to obtain an increased loss value;

training the first model based on the incremental loss value and the negative sample region to obtain parameters of the first model.

Specifically, a word bank is constructed by words and sentences of the inspection items, the inspection report is input into a first model, after the inspection information in the inspection report is obtained, if the matching degree of the inspection information obtained by identification in the word bank is high, the character examples are considered to correspond to the inspection items in the original image, and in order to enable the model to be better represented on the identification of the character examples, the corresponding loss is artificially increased, and the increased loss value is obtained. And training according to the increase loss value and the negative sample area to obtain parameters of the first model.

In the embodiment, the loss value of the model is adjusted in the detection model, and the sample and the loss value of the model training are adjusted in the recognition model, so that the obtained character recognition model can perform better on the character information which is wanted by people, and meanwhile, sensitive information and image noise in the inspection report are filtered, the pressure is reduced for the processing of a rear-end engine, and the precision of the whole system is correspondingly increased.

In step S340, the text information recognized by the first model is output.

In one embodiment of the application, the text information in the image to be processed is obtained by passing the image to be processed through the first model, and the text information is output. The output mode may be displayed on a user interface of the terminal device, or a text file may be generated based on the text information, and the text file is directly output, which is not limited herein.

Fig. 12 is a flowchart illustrating detection and identification of a medical picture according to an embodiment of the present application.

As shown in fig. 12, in one embodiment of the present application, a medical picture is input in step S1210, and the picture is preprocessed in step S1220, so that the picture is size-compressed to a fixed size and the pixels are transformed into a single-channel picture, so as to facilitate the same processing of the picture; inputting the medical picture after the preprocessing into the detection model in step S1230; obtaining coordinates of the text box detected by the detection model in step S1240; cutting the text box into a plurality of small pictures according to the coordinates of the text box in step S1250, and sending the plurality of small pictures into an identification model in step S1260; identifying a text segment corresponding to the small picture in step S1270; assembling the coordinates of the text box and the text fragment in step S1280; in step S1290, the entire recognition result is returned, and the recognized text information is output.

According to the method and the device, the medical picture is input into the detection model to obtain the small picture corresponding to the coordinate of the text box, then the small picture is input into the recognition model to obtain the text information in the small picture, and the text information in the medical picture can be obtained more accurately and efficiently based on the detection model and the recognition model obtained through training.

In an embodiment of the present application, after the process of outputting the text information identified by the first model in step S340, the method further includes:

identifying inspection items in the inspection information, and identifying abnormal items exceeding inspection indexes in the inspection information;

and displaying the related information of the inspection item and the related information of the abnormal item in a distinguishing way.

In one embodiment of the present application, the text information includes the inspection information in the inspection report, and the inspection information specifically includes information corresponding to each inspection item. After the text information identified by the first model is output, identifying the inspection items in the inspection information, acquiring information related to each inspection item from preset inspection item information, and displaying the information and the identified inspection information at the same time to remind a user of the meaning of the related information.

In an embodiment of the application, according to a preset normal threshold range corresponding to each inspection item, namely an inspection index, abnormal items exceeding the inspection index are determined from the inspection items, and are displayed in a preset display mode, so that related personnel are warned to pay attention to the abnormal inspection items.

Illustratively, after the examination information in the medical examination report is acquired through the mobile phone, the electronic examination information is displayed at the mobile phone terminal. And the abnormal items in the inspection information are identified, the abnormal items are displayed in red, and medical interpretation aiming at each index item is provided for the reference of a user, so that the readability of the inspection report is improved.

In an embodiment of the present application, the text recognition method may be applied in a medical environment, wherein the picture to be processed includes a medical examination report of an examiner, the sensitive information includes identity information of the examiner in the medical examination report, and the identity information may be, for example, a name, a gender, a telephone number, an identification number, and the like of the examiner, which is not limited herein.

In this embodiment, a medical examination report is obtained, each report text region included in the medical examination report is detected, medical examination information included in each report text region is identified through the first model, the medical examination information identified by the first model is finally output, and set output information is output for the detected sensitive region, where the sensitive region is a region corresponding to the identity information. For example, the output information may be "sensitive information", and after it is determined that the identity information of the verifier is included in a certain report text region, the "sensitive information" is output correspondingly to the report text region. In the embodiment, the privacy of the user information in the medical examination report is protected and the reliability of the character recognition method is improved by not recognizing the identity information of the examiner and directly outputting the set output information.

In one embodiment of the present application, there is provided a method for training a character recognition model, the character recognition model including a first model for recognizing text, the method for training the first model including: inputting training data containing the negative sample area into a recognition network to obtain a recognition result; the negative sample region includes: the text area containing the sensitive information and the set output information corresponding to the text area containing the sensitive information; determining a loss value of the first model according to the recognition result and the set output information; and training to obtain the first model based on the loss value of the first model.

In one embodiment of the present application, the character recognition model includes a second model for detecting a text region in a picture to be processed, and the training method of the second model includes: identifying text regions containing text information and non-text regions not containing text information in the sample image; selecting negative sample pixel points from the non-text region according to the positive sample pixel points corresponding to the text region; determining a model loss value of the second model according to the positive sample pixel points, the negative sample pixel points and the setting labels corresponding to the pixel points in the text region; and carrying out back propagation training on the model loss value to obtain a second model.

In one embodiment of the present application, the word recognition model includes a second model for detecting a text region, and a first model for recognizing a text. The text area output by the second model is input by the first model, the first model identifies character information from the text area, and the training methods and the data processing methods of the two models do not interfere with each other. For the details of the model training method and the data processing method, please refer to the above description, which is not repeated herein.

Embodiments of the text recognition apparatus of the present application are described below, which can be used to perform the text recognition method in the above embodiments of the present application. For details that are not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the character recognition method described above in the present application.

FIG. 13 shows a block diagram of a text recognition device according to one embodiment of the present application.

Referring to fig. 13, a character recognition apparatus 1300 according to an embodiment of the present application includes: an obtaining unit 1310, configured to obtain a to-be-processed picture including text information; a detection unit 1320, configured to detect each text region included in the picture to be processed; the identifying unit 1330 is configured to identify text information included in each text region by a pre-trained first model, wherein the first model is trained by training data including negative sample regions, and the negative sample regions include: the text area containing the sensitive information and the set output information corresponding to the text area containing the sensitive information; the output unit 1340 is configured to output the text information identified by the first model.

In some embodiments of the present application, based on the foregoing solution, the text recognition apparatus 1300 includes: a first selecting unit, configured to select a negative sample region from the at least two text region samples based on the recognition result of the first model for the at least two text region samples; and the first training unit is used for training the first model by taking the negative sample regions selected from the at least two text regions as new text region samples.

In some embodiments of the present application, based on the foregoing solution, the first selecting unit includes: the first calculation unit is used for calculating loss values corresponding to the text regions based on the recognition results of the first model on the at least two text region samples; the second calculation unit is used for determining the average loss corresponding to at least two text area samples according to the loss value corresponding to each text area; and a second extracting unit for extracting a negative sample region from the at least two text region samples if the average loss is less than the loss threshold.

In some embodiments of the present application, based on the foregoing solution, the second selecting unit is configured to: determining the average sliding loss according to the average loss and the sliding parameters; and selecting the corresponding text region when the loss value is smaller than the average sliding loss as a negative sample region.

In some embodiments of the present application, based on the foregoing scheme, the to-be-processed picture includes an inspection report; the character recognition apparatus 1300 further includes: a first identification unit for identifying the inspection information in the inspection report by the first model; the first adjusting unit is used for increasing the loss value corresponding to the inspection report to obtain an increased loss value if the inspection information is matched with the preset information; a second training unit for training the first model based on the incremental loss value and the negative sample region.

In some embodiments of the present application, based on the foregoing scheme, each text region included in the picture to be processed is detected by the pre-trained second model; the character recognition apparatus 1300 further includes: the third selecting unit is used for determining a model loss value based on positive sample pixel points corresponding to the text regions detected in the sample image and negative sample pixel points selected from the non-text regions; and the third training unit is used for carrying out back propagation training on the model loss value to obtain a second model.

In some embodiments of the present application, based on the foregoing solution, the text recognition apparatus 1300 further includes: a region identification unit for identifying a text region containing the text segment and a non-text region not containing the text segment in the sample image; and the fourth selecting unit is used for selecting the negative sample pixel points from the non-text area.

In some embodiments of the present application, based on the foregoing solution, the fourth selecting unit includes: a number identification unit for identifying the number of positive sample pixel points corresponding to the text region; the third calculating unit is used for determining the number of the pixel points of the negative sample to be selected according to the product of the number of the pixel points of the positive sample and the proportion of the positive sample and the negative sample; and the fifth selecting unit is used for selecting a plurality of negative sample pixel points from the non-text region.

In some embodiments of the present application, based on the foregoing scheme, the fifth selecting unit is configured to: determining a loss value corresponding to each pixel point in the sample image according to the text region and the pixel label of each pixel point in the sample image; and selecting pixel points with smaller loss values corresponding to the pixel points from the non-text region as negative sample pixel points.

In some embodiments of the present application, based on the foregoing scheme, the text information includes inspection information in an inspection report as a picture to be processed; the character recognition apparatus 1300 further includes: an abnormality identification unit for identifying the inspection item in the inspection information and identifying the abnormal item exceeding the inspection index in the inspection information; and the abnormity display unit is used for displaying the related information of the inspection item and the related information of the abnormal item in a distinguishing manner.

The following describes embodiments of a training apparatus for a character recognition model of the present application, which can be used to perform the training method for the character recognition model in the above embodiments of the present application. For details that are not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the training method of the character recognition model described above in the present application.

FIG. 14 shows a block diagram of a text recognition device according to one embodiment of the present application.

Referring to fig. 14, a device 1400 for training a character recognition model according to an embodiment of the present application includes: a recognition unit 1410, configured to input training data including the negative sample region into a recognition network, so as to obtain a recognition result; the negative sample region includes: the text area containing the sensitive information and the set output information corresponding to the text area containing the sensitive information; a loss unit 1420, configured to identify a result and set output information, and determine a loss value of the first model; the training unit 1430 is configured to train to obtain the first model based on the loss value of the first model.

In an embodiment of the present application, the character recognition model includes a second model for detecting a text region in the picture to be processed, and the training apparatus 1400 further includes: the detection unit is used for identifying a text region containing text information and a non-text region not containing text information in the sample image; the selecting unit is used for selecting negative sample pixel points from the non-text region according to the positive sample pixel points corresponding to the text region; the determining unit is used for determining a model loss value of the second model according to the positive sample pixel points, the negative sample pixel points and the setting labels corresponding to the pixel points in the text region; and the reverse training unit is used for carrying out reverse propagation training on the model loss value to obtain a second model.

It should be noted that the computer system 1500 of the electronic device shown in fig. 15 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 15, the computer system 1500 includes a Central Processing Unit (CPU)1501 which can perform various appropriate actions and processes, such as performing the methods in the above-described embodiments, according to a program stored in a Read-Only Memory (ROM) 1502 or a program loaded from a storage section 1508 into a Random Access Memory (RAM) 1503. In the RAM1503, various programs and data necessary for system operation are also stored. The CPU1501, the ROM1502, and the RAM1503 are connected to each other by a bus 1504. An Input/Output (I/O) interface 1505 is also connected to bus 1504.

The following components are connected to the I/O interface 1505: an input portion 1506 including a keyboard, a mouse, and the like; an output section 1507 including a Display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 1508 including a hard disk and the like; and a communication section 1509 including a network interface card such as a LAN (Local area network) card, a modem, or the like. The communication section 1509 performs communication processing via a network such as the internet. A drive 1510 is also connected to the I/O interface 1505 as needed. A removable medium 1511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1510 as necessary, so that a computer program read out therefrom is mounted into the storage section 1508 as necessary.

In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 1509, and/or installed from the removable medium 1511. When the computer program is executed by a Central Processing Unit (CPU)1501, various functions defined in the system of the present application are executed.

It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an erasable programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with a computer program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method for recognizing a character, comprising:

acquiring a picture to be processed containing text information;

detecting each text area contained in the picture to be processed;

identifying text information contained in the respective text regions by a pre-trained first model, wherein the first model is trained by training data containing negative sample regions comprising: the method comprises the steps of obtaining a text area containing sensitive information and setting output information corresponding to the text area containing the sensitive information;

and outputting the text information identified by the first model.

2. The method of claim 1, further comprising:

selecting a negative sample region from at least two text region samples based on the recognition result of the first model on the at least two text region samples;

training the first model by taking the negative sample region selected from the at least two text regions as a new text region sample.

3. The method of claim 2, wherein selecting a negative sample region from the at least two text region samples based on the recognition results of the first model for the at least two text region samples comprises:

calculating loss values corresponding to the text regions based on the recognition results of the first model on the at least two text region samples;

determining the average loss corresponding to the at least two text region samples according to the loss value corresponding to each text region;

selecting a negative sample region from the at least two text region samples if the average loss is less than a loss threshold.

4. The method of claim 3, wherein selecting a negative sample region from the at least two text region samples comprises:

and selecting the corresponding text region with the loss value smaller than the average sliding loss as the negative sample region.

5. The method of claim 2, wherein the picture to be processed comprises an inspection report; the method further comprises the following steps:

identifying, by the first model, inspection information in the inspection report;

if the inspection information is matched with preset information, increasing a loss value corresponding to the inspection report to obtain an increased loss value;

training the first model based on the incremental loss value and the negative sample region.

6. The method according to claim 1, characterized in that each text region contained in the picture to be processed is detected by a pre-trained second model; the method further comprises the following steps:

determining a model loss value based on positive sample pixel points corresponding to the text regions detected in the sample image and negative sample pixel points selected from the non-text regions;

and carrying out back propagation training on the model loss value to obtain the second model.

7. The method of claim 6, wherein before determining the model loss value based on the positive sample pixel points corresponding to the text regions detected in the sample image and the negative sample pixel points selected from the non-text regions, the method further comprises:

identifying text regions in the sample image that contain text segments and non-text regions that do not contain text segments;

and selecting the negative sample pixel points from the non-text region.

8. The method of claim 7, wherein selecting the negative sample pixel points from the non-text region comprises:

identifying the number of the positive sample pixel points corresponding to the text region;

and selecting the negative sample pixel points with the number of the negative sample pixel points from the non-text region.

9. The method of claim 7, wherein selecting the negative sample pixel points from the non-text region comprises:

and selecting pixel points with smaller loss values corresponding to the pixel points from the non-text region as the negative sample pixel points.

10. The method of claim 1, wherein the text information comprises inspection information in an inspection report as the picture to be processed; after outputting the text information recognized by the first model, the method further comprises:

11. The method of claim 1, wherein the text recognition method is applied in a medical environment, the picture to be processed comprises a medical examination report of an examiner, and the sensitive information comprises identity information of the examiner in the medical examination report; the method comprises the following steps:

acquiring the medical inspection report;

detecting each report text region contained in the medical examination report;

identifying, by the first model, medical test information contained in the respective report text region;

and outputting the medical inspection information identified by the first model, and outputting the set output information aiming at the detected sensitive area, wherein the sensitive area is an area corresponding to the identity information.

12. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out a method for word recognition according to any one of claims 1 to 11.

13. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the word recognition method of any one of claims 1 to 11.