CN112949766A

CN112949766A - Target area detection model training method, system, device and medium

Info

Publication number: CN112949766A
Application number: CN202110370329.7A
Authority: CN
Inventors: 不公告发明人
Original assignee: Chengdu Shuzhilian Technology Co Ltd
Current assignee: Chengdu Shuzhilian Technology Co Ltd
Priority date: 2021-04-07
Filing date: 2021-04-07
Publication date: 2021-06-11

Abstract

The invention discloses a method, a system, a device and a medium for training a target area detection model, which relate to the field of image processing and comprise the following steps: collecting a picture containing a preset target, and labeling an area where the preset target is located in the picture to obtain a data set; building a teacher model and a student model for identifying the area of the preset target in the picture, training the teacher model by using the data set to obtain a first model, and training the student model by using the data set to obtain a second model; distilling the second model by utilizing the knowledge of the first model to obtain a target area detection model; the number of parameters needing to be learned in the teacher model is larger than that of parameters needing to be learned in the student model; the method solves the problems that the existing optical character region detection technology based on deep learning is complex in model and difficult to meet the real-time performance, and improves the reasoning speed of the text detection model on the premise of ensuring the model identification precision.

Description

Target area detection model training method, system, device and medium

Technical Field

The invention relates to the field of image processing, in particular to a target area detection model training method, a target area detection model training system, a target area detection model training device and a target area detection model training medium.

Background

The picture contains a large amount of information, corresponding target information in the picture can be obtained through a corresponding picture processing technology, specifically, a region where a target in the picture is located can be identified through a computer, and therefore relevant information in the target region is obtained.

However, the current target detection method model based on deep learning is generally complex, time-consuming in target identification, and difficult to meet the real-time requirement in the actual application scene.

Disclosure of Invention

The invention provides a target area detection model training method, a system and a target area detection method, solves the problems that the existing optical character area detection technology based on deep learning is complex in model and difficult to meet the real-time performance, and improves the reasoning speed of a text detection model on the premise of ensuring the recognition accuracy of the model.

In order to achieve the above object, the present invention provides a target region detection model training method, including:

collecting a picture containing a preset target, and labeling an area where the preset target is located in the picture to obtain a data set;

building a teacher model and a student model for identifying the area of the preset target in the picture, training the teacher model by using the data set to obtain a first model, and training the student model by using the data set to obtain a second model;

distilling the second model by utilizing the knowledge of the first model to obtain a target area detection model;

the number of the parameters needing to be learned in the teacher model is larger than that of the parameters needing to be learned in the student model.

The target area detection model is obtained by distilling the second model by utilizing knowledge of the first model, so that the precision of the target area detection model is higher than that of the student model although the precision of the target area detection model is lower than that of the teacher model, the precision of the target area detection model is guaranteed, and the purpose of knowledge distillation is to improve the precision of the student model under the condition that parameters of the student model are not changed. Therefore, the target area detection model has small parameters, so that the inference speed is naturally greater than that of the teacher model with large parameters, the parameter quantity of a certain model refers to the quantity of parameters needing to be learned in the model, and the parameter quantity is a professional term and refers to the quantity of parameters needing to be learned in the neural network model. The parameter number is a scalar, without units. Therefore, the target area detection model obtained through the training of the method has smaller model parameter number and higher model precision.

The invention can improve the reasoning speed of the original text detection model to the maximum extent and accurately obtain the coordinate information of the region where the target is located in the picture. According to the technical scheme, corresponding artificial neural network models are trained, a large-parameter-quantity model and a small-parameter-quantity model are trained to serve as a teacher model and a student model respectively, then a second model is distilled by knowledge of a first model by adopting a knowledge distillation principle, a distilled target area detection model is obtained, the parameter quantity of the target area detection model is smaller than that of the teacher model, therefore, the model lightweight effect is achieved, the precision of the target area detection model is higher than that of the student model, and the precision of the target area detection model is guaranteed. The method comprises the steps of firstly, acquiring a large number of related pictures containing targets from an internet platform or other platforms or systems by using a crawler technology or other technical modes, and storing the acquired picture data in corresponding picture databases. And then randomly selecting a certain picture from the database, and labeling the picture by using a corresponding picture labeling tool, wherein the specific method for labeling comprises the steps of firstly reading the picture to be labeled by using the labeling tool, then sequentially labeling four vertex coordinates of a rectangular area where a target is located in each picture clockwise by using a rectangular labeling tool in the labeling tool, and then storing the labeled coordinate information as a text in a txt format. And finally, constructing a text detection model EAST (efficiency and Accuracy Scene text) based on an artificial neural network, respectively training a teacher model and a student model by using the marked pictures as training data sets, and then performing knowledge distillation treatment on the student model by using the teacher model to finally obtain a distilled student model.

Preferably, in the method, the data set picture is labeled by using a corresponding image labeling tool, specifically, four vertex coordinates of the area where the target is located are sequentially labeled clockwise by using a rectangular labeling tool in the labeling tool and stored, and in the labeling, each labeling frame is kept in the same labeling sequence, or all the labeling frames can be labeled counterclockwise.

Preferably, the method further includes performing data enhancement processing on the data set to obtain a data enhancement set, training the teacher model by using the data enhancement set to obtain a first model, and training the student model by using the data enhancement set to obtain a second model. The data enhancement set with richer quantity is obtained by data enhancement processing, the model can be trained by the data enhancement set to increase the sample data volume of the model, and the precision of the trained model can be further improved, namely, the teacher model can be trained by the data enhancement set to obtain a first model with higher precision.

The data enhancement mode in the invention comprises the following steps: randomly rotating the pictures in the data set, and cutting the rotated pictures into standard pictures, for example, rotating any one picture in the data set by any angle to obtain pictures with a plurality of rotation angles, thereby increasing the number of the pictures in the data set, wherein the input of the model is a matrix, and the part of the pictures which are randomly rotated and have no original pictures can be subjected to zero filling treatment in a computer, so the length-width ratio of the parts of the pictures is different from that of the original pictures, and the cutting treatment is required for ensuring that the input pictures of the model are the pictures with the same size; the standard picture refers to a picture with uniform length and width, that is, a picture with an aspect ratio of 1. Specifically, the angle of the random rotation is any angle between 0 and 360 degrees, the aspect ratio of the cropped picture is 1, and the actual picture resolution is a fixed value.

After the data enhancement processing is performed on the data set to obtain a data enhancement set, the method further comprises the following processing: and updating the coordinate information of the labeling frame in the labeled picture, wherein the labeled information may be inaccurate after the picture is rotated, so that the coordinate information of the labeling frame in the labeled picture needs to be updated to ensure the accuracy of the labeled information.

The data enhancement mode in the invention also comprises: and performing random data enhancement processing such as blurring, contrast and brightness on the pictures in the data enhancement set each time the EAST model is trained. And the coordinates of four vertexes of the rectangular text box correspond to the position information of the preset target in the picture.

When the data marking tool is used for marking the picture data, firstly, the coordinates of the top left vertex of the rectangular region where the target is located are marked, then the coordinates of the remaining three points are marked in sequence according to the clockwise sequence, and finally, the marked coordinates of the four vertices of the target region are stored in a txt data format.

Preferably, the method further comprises: dividing the data set into a training set, a verification set and a test set, training the teacher model and the student model by using the training set, testing the flexible first model and the second model by using the test set, and verifying the first model and the second model by using the verification set.

Preferably, in the method, the region where the preset target is located is a rectangular region, and the method further includes: and acquiring the pictures in the test set, inputting the pictures in the test set into the target area detection model, and outputting the coordinates of four vertexes of a rectangular area where a preset target is located in the pictures in the test set.

Preferably, the method further comprises: and performing non-maximum suppression processing on the output data of the target area detection model, and storing the vertex coordinates of the area where the target is located in the processed data. Redundant target frames predicted by the target area detection model can be removed through the maximum value.

The training process of the EAST neural network model comprises the following steps:

cleaning and data enhancement of training data;

building a structure of an EAST network model;

the area where the preset target is located adopts a rotating rectangular frame (RBOX). The EAST model has two input modes (rectangle and quadrangle) of the labeled square frame, and the minimum circumscribed rectangle frame of the labeled quadrangle frame is used as the input labeled frame.

Preferably, the teacher model and the student model in the method are EAST neural network models, and the corresponding loss function is as follows: l is_E＝L_s+λ_g1L_gWherein L is_sFor error of fractional feature map, used to obtain mask feature map of text region, L_gThe error of the geometric feature map is used for obtaining the distance from the pixel point in the mask feature map to the fixed point of the rectangular marking frame, and finally the predicted coordinate of the fixed point of the rectangular marking frame, namely lambda, is obtained_g1Is L_sAnd L_gThe weight between the two. The purpose of the design is to obtain more accurate target frame coordinates and prevent the target frame divergence or the error caused by regression by adopting a single geometric feature map from being large.

Preferably, the method adopts KL divergence and the loss function of the original EAST model as the loss function of knowledge distillation, and trains the student model again by using the KL divergence to learn the distribution among the teacher model prediction value classes, so that the prediction accuracy of the student model is improved.

Preferably, the distillation loss function of knowledge in the present process is: l is_D＝λ_g2L_KL+(1-λ_g)L_EWherein L is_KLIs KL divergence, L_EIs a loss function of the EAST neural network model, λ_g2Is L_KLAnd L_EThe weight between the two. The purpose of the design is to obtain more accurate target frame coordinates and prevent the target frame divergence or the error caused by regression by adopting a single geometric feature map from being large.

The invention also provides a training system of the target area detection model, which comprises:

the image acquisition and labeling unit is used for acquiring an image containing a preset target, and labeling an area where the preset target is located in the image to obtain a data set;

the model building and training unit is used for building a teacher model and a student model for identifying the area of the preset target in the picture, training the teacher model by using the data set to obtain a first model, and training the student model by using the data set to obtain a second model;

the knowledge distillation unit is used for distilling the second model by utilizing the knowledge of the first model to obtain a target area detection model;

The invention also provides a target area detection method, which comprises the following steps:

obtaining a picture to be processed containing a preset target;

and inputting the picture to be processed into the target area detection model obtained by adopting the target area detection model training method, and outputting four vertex coordinates of a rectangular area in which a preset target is located in the picture to be processed.

The invention also provides a target area detection model training device, which comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes the steps of the target area detection model training method when executing the computer program.

The present invention provides a computer readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the target region detection model training method.

One or more technical schemes provided by the invention at least have the following technical effects or advantages:

the parameter quantity of the target area detection model obtained by training is smaller than that of the teacher model, so that the target area detection model is light, the real-time requirement can be met, and the reasoning speed of the target area detection model is increased. The target area detection model obtained by training is obtained by distilling the knowledge of the trained teacher model, the precision of the model is superior to that of the student model, and the precision of the target area detection model is guaranteed.

The invention adopts the target area detection model to identify the coordinate information of the rectangular area where the preset target is located, and can greatly simplify the construction and training process of the related target detection algorithm to a certain extent.

The knowledge distillation method for the lightweight design of the target area detection model provided by the invention can obviously improve the reasoning speed of the model under the condition of ensuring the precision of the model, and greatly expands the application scene of the text detection method.

The target area detection model obtained by model distillation in the invention achieves higher level in recognition accuracy and speed, and can well meet the requirements on recognition accuracy and recognition speed in actual deployment.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention;

FIG. 1 is a schematic diagram of a network architecture of a teacher model and a student model based on an EAST model;

FIG. 2 is a schematic diagram of the principle of distillation of knowledge;

FIG. 3 is a schematic flow chart of a target region detection model training method;

fig. 4 is a schematic composition diagram of a target region detection model training system.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments of the present invention and features of the embodiments may be combined with each other without conflicting with each other.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described and thus the scope of the present invention is not limited by the specific embodiments disclosed below.

It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this specification and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

The present description uses flowcharts to illustrate operations performed by a system according to embodiments of the present description. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

Example one

The preset target in the photo in the embodiment may be various types of targets, such as numbers, texts, symbols, and the like, which is described by taking number detection in the photo in the embodiment of the present invention as an example, other types of target detection are also within the protection scope of the present invention, and the embodiment of the present invention is not correspondingly delineated.

The picture in the embodiment of the invention can be a certificate picture containing certificate number information, the certificate picture such as an enterprise business license, a food business license, an identity card, a bank card and the like all contain specific certificate numbers, and the area where the certificate number is located in the certificate picture can be identified through a computer, so that the related text information in the certificate picture can be obtained.

The area where the license number is located is a rectangular area generally, and can also be an area with other shapes in practical application.

The embodiment of the invention provides a knowledge distillation method of a text detection model based on an artificial neural network, which realizes the lightweight design of the text detection model and improves the reasoning performance of an original model. Referring to fig. 3, the whole process of knowledge distillation of the text detection model is as follows:

step 1: and (5) acquiring and storing the license picture. Crawling the license picture data on a corresponding internet platform by adopting a crawler technology, and dividing the acquired license picture data into a training set, a verification set and a test set according to the ratio of 8:1: 1; in practical application, the image obtaining mode may be any one of a plurality of modes, for example, a crawler technology or other technical modes are used to obtain an image from an internet platform or other platforms or systems, and the image obtaining mode is not limited in the present invention; the data set can be divided into other proportions in the actual data set dividing process, and the dividing proportion of the data set is not limited in the embodiment of the invention; marking a data set picture by using a picture marking tool (labelme), specifically marking four vertex coordinates of a text area where a license number is located clockwise by using a rectangular marking tool in the labelme, and storing the coordinates in a txt format; in the actual application process, other marking tools can be adopted to mark the picture, and the invention does not specifically limit the marking mode or tool of the picture;

step 2: data cleansing and enhancement of training. And performing data enhancement processing on the marked training data set, randomly rotating and cutting collected picture data with different resolutions into a standard picture with an aspect ratio of 1, fixing the resolution of the cut picture as a fixed value, updating coordinate information of a marked frame in the cut picture, and performing data enhancement processing on the cut picture, such as random blurring, contrast, brightness and the like. The length-width ratio of the cutting can be flexibly adjusted in the actual cutting process, and the length-width ratio of the cutting is not limited by the invention.

And step 3: a teacher model and a student model based on an EAST model are respectively constructed, the teacher model and the student model are trained by using training set data, and a network structure schematic diagram of the models is shown in figure 1.

The text area adopts a rotating Rectangular Box (RBOX);

the corresponding penalty function is: l is_E＝L_s+λ_g1L_gWherein L is_sError of fractional feature maps, L_gError of geometric feature map, λ_g1For the weight between the two, λ is set here_g11 is ═ 1; wherein, in practical application_g1The value can be flexibly adjusted according to actual needs, and the value is not limited by the invention.

Carrying out location-Aware NMS non-maximum value inhibition processing on a prediction coordinate frame output by the EAST model;

and 4, step 4: and (3) performing knowledge distillation treatment on the student models by using a knowledge distillation principle and using the teacher model to finally obtain the distilled student models, wherein the knowledge distillation principle is shown in figure 2. The Knowledge distillation is a model compression method, and is a training method based on a teacher-student network thought, wherein the Knowledge distillation is to extract Knowledge ("Knowledge") contained in a trained model into another model. The precision of the student model after knowledge distillation is generally smaller than that of a teacher model but larger than that of the student model before the knowledge distillation, and the purpose of the knowledge distillation is to improve the precision of the student model under the condition of not changing parameters of the student model. The method adopts KL divergence and the loss function of an original EAST model as the loss function of knowledge distillation, trains the student model again by using the KL divergence to learn the distribution among the teacher model prediction value classes, thereby improving the prediction precision of the student model, and adopts the combination of the KL divergence and the loss function of the EAST model as the loss function of the model distillation method to prevent the model from not converging due to the divergence of the prediction values of the student model during knowledge distillation. The score feature map is shown as L in FIG. 1_sConv layer to the right of the arrow, geometric feature map L in FIG. 1_gConv layer to the right of the arrow.

Taking KL Divergence (Kullback-Leibler Divergence) and a loss function of an original EAST model as a loss function in knowledge distillation, wherein the corresponding knowledge distillation loss function is as follows: l is_D＝λ_g2L_KL+(1-λ_g)L_EWherein L is_KLIs KL divergence, L_EAs a loss function of the EAST model, λ_g2The weight between the two is set to 0.9 here. Wherein, in practical application_g2The value can be flexibly adjusted according to actual needs, and the value is not limited by the invention.

And 5: and testing the student models before and after distillation, firstly inputting data in a test data set into the trained model to obtain predicted coordinate information of the text box, then carrying out location-Aware NMS (network management system) non-maximum inhibition processing on the predicted coordinate box, and finally obtaining and storing the final predicted coordinate of the text box. The statistical test concentrates on the situation of the student model prediction coordinate frame before and after distillation, and the result pair of the obtained model distillation method is shown in table 1:

TABLE 1 comparison of model distillation results

model	backbone	Parameters(M)	Recall	Precision	F1-score
						teacher	Vgg16	57.62M	0.7275	0.8046	0.7641
Student	mobilenetV3	14.83M	0.6595	0.7603	0.7085
						Distill	mobilenetV3	14.83M	0.7135	0.7987	0.7571

From table 1, it can be seen that the model size of the student model (Distill) after distillation is only about 1/4 of the teacher model (teacher), but the evaluation indexes of the student model (Recall), Precision and F1-score are all very close to those of the teacher model, and the performance of the student model after distillation is obviously superior to that of the student model (student) before distillation, which proves the effectiveness of the knowledge distillation method.

The picture in the method can be a license picture or other pictures containing numbers consisting of Roman numerals and English letters, the picture object to be processed is not limited, and the method is suitable for the pictures containing the numbers consisting of Roman numerals and English letters. The pictures may be obtained from an internet platform through a crawler or from other sources.

The knowledge distillation method for the text detection model provided by the method or the system has important reference significance for improving the reasoning speed of the model and expanding the deployment application scene of the model.

Example two

Referring to fig. 4, a second embodiment of the present invention provides a target area detection model training system, including:

obtaining a picture to be processed containing a preset target;

and inputting the picture to be processed into a target area detection model obtained by adopting the target area detection model training method, and outputting four vertex coordinates of a rectangular area where a preset target is located in the picture to be processed.

EXAMPLE III

The third embodiment of the present invention provides a target area detection model training apparatus, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the target area detection model training method when executing the computer program.

The processor may be a Central Processing Unit (CPU), or other general-purpose processor, a digital signal processor (digital signal processor), an Application Specific Integrated Circuit (Application Specific Integrated Circuit), an off-the-shelf programmable gate array (field programmable gate array) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may be used for storing the computer program and/or the module, and the processor may implement various functions of the target area detection model training apparatus in the invention by operating or executing the data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a smart memory card, a secure digital card, a flash memory card, at least one magnetic disk storage device, a flash memory device, or other volatile solid state storage device.

Example four

The fourth embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps of the target area detection model training method are implemented.

The target area detection model training apparatus, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, all or part of the flow in the method of implementing the embodiments of the present invention may also be stored in a computer readable storage medium through a computer program, and when the computer program is executed by a processor, the computer program may implement the steps of the above-described method embodiments. Wherein the computer program comprises computer program code, an object code form, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying said computer program code, a recording medium, a usb-disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory, a random access memory, a point carrier signal, a telecommunications signal, a software distribution medium, etc. It should be noted that the computer readable medium may contain content that is appropriately increased or decreased as required by legislation and patent practice in the jurisdiction.

While the invention has been described with respect to the basic concepts, it will be apparent to those skilled in the art that the foregoing detailed disclosure is only by way of example and not intended to limit the invention. Various modifications, improvements and adaptations to the present description may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present specification and thus fall within the spirit and scope of the exemplary embodiments of the present specification.

Also, the description uses specific words to describe embodiments of the description. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the specification is included. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the specification may be combined as appropriate.

Moreover, those skilled in the art will appreciate that aspects of the present description may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereof. Accordingly, aspects of this description may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.), or by a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present description may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Computer program code required for the operation of various portions of this specification may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Additionally, the order in which the elements and sequences of the process are recited in the specification, the use of alphanumeric characters, or other designations, is not intended to limit the order in which the processes and methods of the specification occur, unless otherwise specified in the claims. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the present specification, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to imply that more features than are expressly recited in a claim. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

For each patent, patent application publication, and other material, such as articles, books, specifications, publications, documents, etc., cited in this specification, the entire contents of each are hereby incorporated by reference into this specification. Except where the application history document does not conform to or conflict with the contents of the present specification, it is to be understood that the application history document, as used herein in the present specification or appended claims, is intended to define the broadest scope of the present specification (whether presently or later in the specification) rather than the broadest scope of the present specification. It is to be understood that the descriptions, definitions and/or uses of terms in the accompanying materials of this specification shall control if they are inconsistent or contrary to the descriptions and/or uses of terms in this specification.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present disclosure. Other variations are also possible within the scope of the present description. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the specification can be considered consistent with the teachings of the specification. Accordingly, the embodiments of the present description are not limited to only those embodiments explicitly described and depicted herein.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. The target area detection model training method is characterized by comprising the following steps:

2. The method of claim 1, further comprising performing a data enhancement process on the data set to obtain a data enhancement set, training the teacher model with the data enhancement set to obtain the first model, and training the student model with the data enhancement set to obtain the second model.

3. The method for training the target area detection model according to claim 2, wherein the data enhancement mode in the method comprises: and randomly rotating the pictures in the data set, and cutting the rotated pictures into standard pictures.

4. The method for training the target area detection model according to claim 2, further comprising: and randomly enhancing the fuzzy data, the contrast data and the brightness data of the pictures in the data enhancement set when the teacher model and the student model are trained each time.

5. The method of training a target region detection model according to claim 1, further comprising: dividing the data set into a training set, a validation set, and a test set, training the teacher model and the student model using the training set, testing the first model and the second model using the test set, and validating the first model and the second model using the validation set.

6. The method for training the target area detection model according to claim 5, wherein the area where the preset target is located is a rectangular area, and the method further comprises: and acquiring the pictures in the test set, inputting the pictures in the test set into the target area detection model, and outputting the coordinates of four vertexes of a rectangular area where a preset target is located in the pictures in the test set.

7. The method of training a target region detection model according to claim 6, further comprising: and performing non-maximum suppression processing on the output data of the target area detection model, and storing the vertex coordinates of the area where the target is located in the processed data.

8. The target area detection model training method of claim 1, wherein the teacher model and the student model are EAST neural network models, and the correspondence loss function is: l is_E＝L_s+λ_g1L_gWherein L is_sError of fractional feature maps, L_gError of geometric feature map, λ_g1Is L_sAnd L_gThe weight between the two.

9. The target region detection model training method of claim 8, wherein the method uses the KL divergence and loss function of the original EAST model as the loss function of knowledge distillation.

10. The target region detection model training method of claim 9, wherein the knowledge distillation loss function is: l is_D＝λ_g2L_KL+(1-λ_g)L_EWherein L is_KLIs KL divergence, L_EIs a loss function of the EAST neural network model, λ_g2Is L_KLAnd L_EThe weight between the two.

11. A target area detection model training system, the system comprising:

12. An object region detection model training apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the object region detection model training method according to any one of claims 1 to 10 when executing the computer program.

13. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method for training a target area detection model according to any one of claims 1 to 10.