CN117853440A

CN117853440A - Training method for text defect detection model and electronic equipment

Info

Publication number: CN117853440A
Application number: CN202311853302.9A
Authority: CN
Inventors: 徐海俊; 孙新
Original assignee: Suzhou Mega Technology Co Ltd
Current assignee: Suzhou Mega Technology Co Ltd
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-04-09

Abstract

The invention discloses a training method for a text defect detection model, electronic equipment and a storage medium, wherein the training method comprises the following steps: acquiring a training text image of a defect area marked by using a marking frame; generating a Gaussian thermodynamic diagram corresponding to the defect area; and training the text defect detection model by using the training text image and the Gaussian thermodynamic diagram until a first preset requirement is met, and completing training of the text defect detection model. In order to identify the defect area in the text image, the training text image is marked in a specific marking mode, and the marked training image is used for training the text defect detection model, so that the model meets scene requirements.

Description

Training method for text defect detection model and electronic equipment

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a training method for a text defect detection model and an electronic device.

Background

Images are used in an increasing number of contexts as carriers of information dissemination, with the need for identifying text in the images. Currently, optical character recognition (optical character recognition, OCR) technology is commonly employed to recognize text in images. The OCR technique may utilize text recognition algorithms to convert text in an image (including typed text, printed text, and/or handwritten text, etc.) into computer text.

In generating the text image, the text image may have various defects, for example, dirt, wrinkles, cracks, or the like may be present on the text image. These text defects can seriously affect the accuracy of text recognition. How to use machine vision technology to realize accurate detection and identification of defects and also consider real-time performance is a very challenging problem.

Disclosure of Invention

The embodiment of the invention provides a training method for a text defect detection model and electronic equipment, which are used for at least solving the technical problems.

According to one aspect of the present invention, there is provided a training method for a text defect detection model, including: acquiring a training text image of a defect area marked by using a marking frame; generating a Gaussian thermodynamic diagram corresponding to the defect area; and training the text defect detection model by using the training text image and the Gaussian thermodynamic diagram until a first preset requirement is met, and completing training of the text defect detection model.

Optionally, the generating a gaussian thermodynamic diagram corresponding to a defect type of the defect area includes: generating Gao Situ by using the marking frame corresponding to the defect area; the Gao Situ map is mapped into the annotation boxes using a transmission transformation between the gaussian and each annotation box, thereby generating the gaussian thermodynamic diagram.

Optionally, in a case that the defective area and the character area in the training text image have overlapping areas, generating the gaussian thermodynamic diagram corresponding to the defective area includes:

dividing the defect area into a plurality of defect subareas according to characters in the overlapped area;

for each defective subregion, a corresponding gaussian thermodynamic diagram is generated.

Optionally, the training method further trains the text defect detection model by using the training text image, the corresponding character gaussian thermodynamic diagram and the inter-word gaussian thermodynamic diagram generated by the adjacent characters until a second preset requirement is met, and the training of the text defect detection model is completed.

Optionally, the text defect detection model is a model improved based on a CRAFT model.

According to an aspect of the present invention, there is provided a text defect detection method, comprising:

inputting a text image to be processed into the text defect detection model, and outputting a text detection result, wherein the text detection result comprises a defect area with a defect in the text image and a corresponding defect type.

Optionally, the text detection result further includes a character box of a character in the text image.

According to one aspect of the present invention, there is provided a training apparatus for a text defect detection model, comprising: one or more processors; a memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the foregoing methods.

According to one aspect of the present invention, there is provided a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform the aforementioned method.

The above at least one technical scheme adopted by the embodiment of the invention can achieve the following beneficial effects:

according to the method and the device, in order to identify the defect area in the text image, the training text image is marked in a specific marking mode, and the marked training image is used for training the text defect detection model, so that the model meets scene requirements.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

fig. 1 is a schematic view illustrating a text image and a corresponding defect type thereof according to an exemplary embodiment of the present invention;

FIG. 2 is a diagram illustrating a text defect detection model according to an exemplary embodiment of the present invention;

fig. 3 is a diagram illustrating a labeling process of a defective area according to an exemplary embodiment of the present invention;

FIG. 4 is a flowchart illustrating a training method of a text defect detection model according to an exemplary embodiment of the present invention;

FIG. 5 is a diagram illustrating training of a text defect detection model according to an exemplary embodiment of the present invention;

fig. 6 is a flowchart illustrating a text defect detection method according to an exemplary embodiment of the present invention;

fig. 7 shows a schematic structural diagram of a terminal device according to an embodiment of the present invention;

fig. 8 shows a software architecture block diagram of a terminal device according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to specific embodiments of the present invention and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In embodiments of the present invention, "/" may indicate that the associated object is an "or" relationship, e.g., A/B may represent A or B; "and/or" may be used to describe that there are three relationships associated with an object, e.g., a and/or B, which may represent: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In order to facilitate description of the technical solution of the embodiments of the present invention, in the embodiments of the present invention, the words "first", "second", etc. may be used to distinguish between technical features that are the same or similar in function. The terms "first," "second," and the like do not necessarily denote any order of quantity or order of execution, nor do the terms "first," "second," and the like. In embodiments of the invention, the words "exemplary" or "such as" are used to mean examples, illustrations, or descriptions, and any embodiment or design described as "exemplary" or "such as" should not be construed as preferred or advantageous over other embodiments or designs. The use of the word "exemplary" or "such as" is intended to present the relevant concepts in a concrete fashion to facilitate understanding.

In addition, specific details are set forth in the following description in order to provide a better understanding of the present invention. It will be understood by those skilled in the art that the present invention may be practiced without some of these specific details. In some instances, well known methods, procedures, components, and circuits have not been described in detail so as not to obscure the present invention.

Before explaining the embodiment of the present invention, an application scenario of the embodiment of the present invention is explained. The terminal devices of the present invention may include devices such as cell phones, tablet computers, wearable devices (e.g., watches, bracelets, smart helmets, etc.), in-vehicle devices, smart home devices, augmented reality (augmented reality, AR)/Virtual Reality (VR) devices, notebook computers, ultra-mobile personal computer, UMPC, netbooks, personal digital assistants (personal digital assistant, PDA), etc.

The terminal device comprises a screen, which may be a touch screen. The screen may include, but is not limited to, a display device such as a display, and may also include other devices that present images such as mirroring devices, extended screens, and the like. In the present invention, the screen may display a text image to be processed, and may also display text/characters recognized from the content.

According to the exemplary embodiment of the invention, in order to detect the defects in the text image, a training text image set is marked by a specific marking mode, a text defect detection model is trained by the marked text image, and the text image is detected by the trained text defect detection model so as to detect the defect areas and the corresponding defect types in the text image. In order to more intuitively understand the text defect in the text image to be detected by the present invention, the following description will be given by way of example with reference to fig. 1.

Fig. 1 is a schematic view illustrating a text image and a corresponding defect type thereof according to an exemplary embodiment of the present invention. The text image according to the present invention refers to various types of images that can be represented with resolution, including but not limited to images including screen capturing, images captured by calling a camera, or images captured from the outside, for example, images downloaded or received by a user through a network using a terminal device, and the like.

The text image indicated by the invention refers to data containing characters and/or character strings. That is, if the acquired image contains text and/or character strings, the image may be referred to as a text image. Optionally, one or more character strings may be included in the text image, the character strings indicating a plurality of adjacent characters. Generally, a string may correspond to a particular meaning, e.g., string "apple" may indicate apple.

For convenience of description, the description will be made with the product tag as a detection object, and the image corresponding to the product tag is the text image to be executed by the present invention. As shown in fig. 1, the text image of the present invention may include a text image 110, a text image 120, a text image 130, a text image 140, a text image 150, and a text image 160, each of which has defects, that is, each of which has a defective area therein and the type of defect is different. These defects are all caused during the production or use of the product label of fig. 1. Specifically, the area where a defect exists in the text image 110 is a defective area 101, and the cause of the defective area 101 is a text deletion in the product label. The area where a defect exists in the text image 120 is a defective area 102, and the defective area 102 is generated due to wrinkling of the product label. The area where a defect exists in the text image 130 is a defective area 103, and the cause of the defective area 103 is text blurring in the product label. The area where a defect exists in the text image 140 is a defective area 104, and the defective area 104 is generated due to damage of the product label. The area where a defect exists in the text image 150 is a defective area 105, and the defective area 105 is generated due to the presence of bubbles in the product label during the attaching process. The area where a defect exists in the text image 160 is a defective area 106, and the defective area 106 is generated due to the presence of water drops on the product label.

The text defect detection method according to the exemplary embodiment of the present invention can detect not only the defect areas in the text image as shown in fig. 1, but also determine the defect types corresponding to the defect areas, and further, recognize each character.

To enable the above effects, exemplary embodiments of the present invention may employ a text defect detection model as shown in fig. 2. The text defect detection model may include a feature extraction network, a feature fusion network, and an output layer. As shown in fig. 2, the feature extraction network performs feature extraction based on VGG16 (Visual Geometry Group ) bottleneck network; the feature fusion network can utilize 4 upsampling modules to fuse features of different scales; the output layer uses 4 conv (convolution) layers as the final output layer, and the output layer can output the defect detection result, including the position of the defect area (i.e., the defect area) and the defect type. Additionally, according to embodiments of the present invention, the output layer may also output a character gaussian thermodynamic diagram and an inter-word gaussian thermodynamic diagram simultaneously, it being noted that the character gaussian thermodynamic diagram is used to characterize each pixel as a score for the center of the character, and the inter-word gaussian thermodynamic diagram is used to characterize each pixel as a score for the center of the inter-word (also referred to as a connection). In order to enable the text defect detection model in fig. 2 to achieve a better processing effect, the text defect detection model needs to be trained. Before training is performed on the text defect detection model, an adaptive labeling work needs to be performed on the training text image set.

Fig. 3 is a diagram illustrating a labeling process of a defective area according to an exemplary embodiment of the present invention. The training text image set mentioned in the present invention is an image set composed of a large number of text images, the sources of which are not limited by the present invention. The following description will be given by taking the training text image 310 in fig. 3 as an example only. The text image 310 shown in fig. 3 includes defect areas 301, 302 and a character area 303, wherein the defect areas 301 and 302 refer to areas where defects exist, and the defect areas 301 and 302 can be marked with marking boxes (for example, marking boxes 30 and 34) shown in fig. 3 for the defect areas 301 and 302. It can be seen that the defect shapes of different defect areas are different due to different defect types, and accordingly, the shapes of the marking frames are changed correspondingly. That is, the shape of the label frame corresponds to the shape of the defective region. Further, there is an overlapping area of the defective area 301 and the character area 303. The character area 303 refers to an area where characters exist in the text image 310, and as shown in fig. 3, each character in the character area 303 may be marked by a character box.

It should be noted that if the defective area exists in an area across a plurality of characters, that is, if the defective area and the character area have overlapping areas, the callout box 30 is also divided into defective sub-areas 31 to 33 by the character box. That is, in the case where there is an overlap of the defective area and the character area, the sub-area in which the defective area is divided by the character frame may be used as the defective area to perform the subsequent processing.

Then, a corresponding gaussian thermodynamic diagram is generated for each defective region. Specifically, for each defective region 31, 32, 33, and 34, a probability of each pixel as the center of each defective region may be calculated, and the calculated probabilities may be encoded using a gaussian thermodynamic diagram and a corresponding gaussian thermodynamic diagram may be finally generated. In practice, the above calculation is time-consuming, and for this purpose, the embodiment of the present invention may further acquire the gaussian thermodynamic diagram corresponding to each defect area in the following manner: 1) Preparing a two-dimensional isotropic Gaussian map; 2) Calculating transmission transformation between the Gaussian image area and each labeling frame; 3) Gao Situ is mapped into the label box area. Thereby, a text image 320 as in fig. 3 can be acquired. In addition, as can be seen from fig. 3, the shape of the label frame corresponding to the defect area 301 and the defect area 302 is different, and corresponds to different defect types, respectively. Thus, in the labeling process, each defect region may be additionally labeled with a defect type, for example, the defect region 301 corresponds to a text defect, and the defect region 302 corresponds to a water droplet.

After the training text image sets have been labeled in the labeling manner shown in fig. 3, the text defect detection model in fig. 2 may be trained using the labeled training text image sets, and for this purpose, a training method will be described in detail below with reference to fig. 4. Fig. 4 is a flowchart illustrating a training method of a text defect detection model according to an exemplary embodiment of the present invention.

In step S410, a training text image in which a defective area is marked with a mark box is acquired. As an exemplary embodiment of the present invention, the callout box is related to the shape of the defective area.

In step S420, a gaussian thermodynamic diagram corresponding to the defective area is generated.

As an exemplary embodiment of the present invention, a labeling frame corresponding to the defect area is used to generate Gao Situ; the Gao Situ map is mapped into the annotation boxes using a transmission transformation between the gaussian and each annotation box, thereby generating the gaussian thermodynamic diagram.

In step S430, training the text defect detection model by using the training text image and the gaussian thermodynamic diagram until a first preset requirement is reached, so as to complete training the text defect detection model. In other words, a loss function may be used to calculate a loss value for the training text image and the corresponding labeling information, and the full network parameter may be adjusted using the loss value until the text defect detection model meets the preset requirement. In a specific embodiment, the loss function may calculate a defect score (a score characterized by using a gaussian thermodynamic diagram) corresponding to the defect region obtained via a text defect detection model, and a loss value of the defect score obtained in step S420. It should be noted that the first preset requirement may be set by the technician in advance as required, for example, the loss value is lower than a certain threshold value.

As another exemplary embodiment, in the case that the text defect detection model may detect the character region in the text image at the same time, the loss function may calculate the loss values of the character score, the inter-character score, and the defect region score corresponding to the character region until a second preset requirement is reached, and complete training of the text defect detection model. It should be noted that the second preset requirement may be set by the technician in advance as required, such as the loss value being below a certain threshold. In addition, note that: the three loss values mentioned above may correspond to the respective weights, the loss functions may be L1, L2, etc., and in adjusting the parameters using the above loss functions, gradient-based optimization algorithms, such as SGD, ADAM, and HAPM, may be used.

As an exemplary embodiment of the present invention, the text defect detection model is a model improved based on a CRAFT model. A preferred embodiment of the present invention will be described below with reference to fig. 5, fig. 5 being a diagram showing training of the above-mentioned text defect detection model according to an exemplary embodiment of the present invention. It should be noted that the input image in fig. 5 may be such that the real image may also be a virtual image, i.e. an automatically generated image is used for training.

As shown in fig. 5, preprocessing may be performed on the text image while training is performed. The preprocessing includes noise removal, edge enhancement, and the like. The preprocessed image may be used for further processing, such as extraction of contours, segmentation, etc. As shown in fig. 5, a character region indicating a region where a character exists may be extracted from the text image using the above-mentioned preprocessing, character segmentation may be performed on the character region using the above-mentioned text defect detection method, the character region where each character is located may be identified, and the corresponding character may be marked using a character frame. As shown in fig. 5, after inputting the text image into the text defect detection model, obtaining a detection result includes: and inputting the detection results and labeling information for labeling the text images in the training process into a loss function, and adjusting each parameter of the text defect detection model according to the loss value until the detection effect is achieved.

Fig. 6 is a flowchart illustrating a text defect detection method according to an exemplary embodiment of the present invention.

In step S610, a text image to be processed is acquired.

In step S620, the text image is input into a text defect detection model, and a text detection result is output, where the text detection result includes a defect area where a defect exists in the text image and a corresponding defect type.

As an embodiment of the present invention, the text detection result may further include a character box of each character in the text image. That is, the text defect detection method is capable of detecting not only a defective area in the text image, a corresponding defect type, and each character in the text.

According to still another aspect of the present invention, there is also provided a training apparatus for a text defect detection model. Fig. 7 shows a schematic block diagram of a training device 700 according to an embodiment of the invention. As shown in fig. 7, the image processing apparatus 700 may include an acquisition unit 710, a generation unit 720, and a training unit 730.

An acquisition unit 710 for acquiring a training text image for marking the defect area with the annotation box.

And a generating unit 720, configured to generate a gaussian thermodynamic diagram corresponding to the defect area.

And the training unit 710 is configured to train the text defect detection model by using the training text image and the gaussian thermodynamic diagram until a first preset requirement is reached, thereby completing training the text defect detection model.

Those skilled in the art can understand the implementation manner and technical effects of the training device by reading the above description about the training method of the text defect detection model, and for brevity, the description is omitted here.

According to still another aspect of the present invention, an electronic device is also provided. Fig. 8 shows a schematic block diagram of an electronic device according to an embodiment of the invention. As shown in fig. 8, the electronic device 800 includes a processor 810 and a memory 820, the memory 820 storing a computer program that, when executed by the processor 810, is configured to perform the training method described above.

According to another aspect of the present application, there is also provided a storage medium. Program instructions are stored on the storage medium for executing the above-described target object detection method 100 at run-time. The storage medium may include, for example, erasable programmable read-only memory (EPROM), portable read-only memory (CD-ROM), USB memory, or any combination of the foregoing storage media. The storage medium may be any combination of one or more computer readable storage media.

Those skilled in the art can understand the specific implementation and beneficial effects of the above-mentioned ultrasonic detection system, electronic device and storage medium by reading the above description about the detection method of the target object, and for brevity, the description is omitted here.

Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the above illustrative embodiments are merely illustrative and are not intended to limit the scope of the present application thereto. Various changes and modifications may be made therein by one of ordinary skill in the art without departing from the scope and spirit of the present application. All such changes and modifications are intended to be included within the scope of the present application as set forth in the appended claims.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, e.g., the division of the elements is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple elements or components may be combined or integrated into another device, or some features may be omitted, or not performed.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the present application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in order to streamline the application and aid in understanding one or more of the various inventive aspects, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof in the description of exemplary embodiments of the application. However, the method of this application should not be construed to reflect the following intent: i.e., the claimed application requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be combined in any combination, except combinations where the features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the present application and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

Various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some of the modules in an ultrasound detection system according to embodiments of the present application may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present application may also be embodied as device programs (e.g., computer programs and computer program products) for performing part or all of the methods described herein. Such a program embodying the present application may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.

The foregoing is merely illustrative of specific embodiments of the present application and the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes or substitutions are intended to be covered by the scope of the present application. The protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A training method for a text defect detection model, comprising:

acquiring a training text image of a defect area marked by using a marking frame;

generating a Gaussian thermodynamic diagram corresponding to the defect area;

and training the text defect detection model by using the training text image and the Gaussian thermodynamic diagram until a first preset requirement is met, and completing training of the text defect detection model.

2. The training method of claim 1, wherein the generating a gaussian thermodynamic diagram corresponding to a defect type of the defect region comprises:

generating Gao Situ by using the marking frame corresponding to the defect area;

the Gao Situ map is mapped into the annotation boxes using a transmission transformation between the gaussian and each annotation box, thereby generating the gaussian thermodynamic diagram.

3. The training method of claim 1, wherein generating a gaussian thermodynamic diagram corresponding to the defective region in the case where the defective region has an overlapping region with a character region in the training text image comprises:

dividing the defect area into a plurality of defect subareas according to the characters in the overlapped area;

4. The training method of claim 1, further comprising training the text defect detection model using the training text image and the corresponding character gaussian thermodynamic diagram and the inter-word gaussian thermodynamic diagram generated by the adjacent characters until a second predetermined requirement is met, thereby completing training the text defect detection model.

5. The training method of claim 1, wherein the text defect detection model is a model that is improved based on a CRAFT model.

6. A text defect detection method, comprising:

inputting a text image to be processed into the text defect detection model according to any one of claims 1 to 5, and outputting a text detection result, wherein the text detection result comprises a defect area with a defect in the text image and a corresponding defect type.

7. The method of claim 6, wherein the text detection result further comprises a character box of a character in the text image.

8. A text processing apparatus, comprising:

one or more processors;

a memory; and

one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing any of the methods of claims 1-7.

9. A computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a computing device, cause the computing device to perform any of the methods of claims 1-7.