CN113793292A

CN113793292A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN113793292A
Application number: CN202010450436.6A
Authority: CN
Inventors: 周静辉; 陈想; 魏溪含
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-05-25
Filing date: 2020-05-25
Publication date: 2021-12-14

Abstract

The embodiment of the invention provides a target detection and model training method, a device, electronic equipment and a computer storage medium thereof, wherein the data processing method comprises the following steps: acquiring an image to be detected, wherein the image to be detected comprises a target object to be detected, and the target object comprises a first dimension value; inputting an image to be detected into a target detection model, wherein the target detection model generates a plurality of levels of feature maps corresponding to the image to be detected; and acquiring a detection result of the target object detection output by acquiring the hierarchical characteristic diagram corresponding to the first dimension value of the target object. By the embodiment of the invention, more accurate detection results can be obtained.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a data processing method and device, electronic equipment and a computer storage medium.

Background

With the development of computer vision technology, the application of target detection, that is, the detection of a target object in an image by using a computer image processing technology, is more and more extensive. Through target detection, a target object in the image can be effectively identified for corresponding processing.

However, in many detection scenarios, the target object to be detected is relatively thin and often has a slender shape, or is not easily distinguished from some background interference in the image. For example, in many scenarios of industrial manufacturing, fine flaws may have a large impact on the quality rating of industrial products. To capture fine imperfections, high precision, high resolution capture devices are often used. In this case, the patterns, textures and the like of the industrial product can be acquired by high-precision and high-resolution acquisition equipment. Taking the detection of the small flaws on the tire as an example, the resolution of the image for quality inspection of the tire is about 1600 pixels wide, the image length is 3000 pixels to 10000 pixels, and the tire pattern is also collected, which can seriously affect the detection of the small flaws on the tire. Taking the detection of the fine defects of the battery as an example, in the near-infrared battery quality inspection in the photovoltaic industry, the length and the width of the image resolution are about 1200 pixels, and the battery also has background dark stripes, which also affect the detection of the fine defects. In the chemical fiber quality inspection, in an image having an image resolution of about 4000 pixels wide and about 3000 pixels long, a large number of defects having a width of less than 10 pixels are present in the background pattern and are not easily detected.

The above-mentioned problem of difficulty in detecting fine flaws is also prevalent in other similar detection scenarios, especially in industrial detection scenarios.

Disclosure of Invention

Embodiments of the present invention provide a data processing scheme to at least partially solve the above problems.

According to a first aspect of the embodiments of the present invention, there is provided a data processing method, including: acquiring an image to be detected, wherein the image to be detected comprises a target object to be detected, and the target object comprises a first dimension value; inputting an image to be detected into a target detection model, wherein the target detection model generates a plurality of levels of feature maps corresponding to the image to be detected; and acquiring a detection result of the target object detection output by acquiring the hierarchical characteristic diagram corresponding to the first dimension value of the target object.

According to a second aspect of the embodiments of the present invention, there is provided a data processing method, including: acquiring a sample image for training a target detection model, wherein the sample image comprises an annotation frame and a target object, and the annotation frame comprises a first dimension value; inputting the sample image into a convolution layer of the target detection model to obtain a plurality of levels of feature maps; performing bounding box prediction on the target object in a feature map of a hierarchy corresponding to the first dimension value; and training the target detection model according to the result of the boundary box prediction and the loss function.

According to a third aspect of the embodiments of the present invention, there is provided a data processing method, including: obtaining a model training request of a target detection model; acquiring a sample image for training the target detection model according to the model training request, wherein the sample image comprises an annotation frame and a target object, and the annotation frame comprises a first dimension value; inputting the sample image into a convolution layer of the target detection model to obtain a plurality of levels of feature maps; performing bounding box prediction on the target object in a feature map of a hierarchy corresponding to the first dimension value; and training the target detection model according to the result of the boundary box prediction and the loss function.

According to a fourth aspect of the embodiments of the present invention, there is provided a data processing apparatus including: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a sample image for training a target detection model, the sample image comprises an annotation frame and a target object, and the annotation frame comprises a first dimension value; the second acquisition module is used for inputting the sample image into the convolution layer of the target detection model to obtain a plurality of levels of feature maps; a prediction module, configured to perform bounding box prediction on the target object in a feature map of a hierarchy corresponding to the first dimension value; and the training module is used for training the target detection model according to the result of the boundary box prediction and the loss function.

According to a fifth aspect of embodiments of the present invention, there is provided a data processing apparatus including: the third acquisition module is used for acquiring an image to be detected, wherein the image to be detected comprises a target object to be detected, and the target object comprises a first dimension value; the system comprises an input module, a target detection module and a processing module, wherein the input module is used for inputting an image to be detected into a target detection model, and the target detection model generates a plurality of levels of characteristic graphs corresponding to the image to be detected; and the fourth acquisition module is used for acquiring a detection result of the target object detection output by the hierarchical characteristic diagram corresponding to the first dimension value of the target object.

According to a sixth aspect of an embodiment of the present invention, there is provided an electronic apparatus including: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus; the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the corresponding operation of the data processing method according to the first aspect; or, the operations corresponding to the data processing method according to the second aspect or the third aspect are performed.

According to a seventh aspect of embodiments of the present invention, there is provided a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the data processing method according to the first aspect; alternatively, a data processing method as described in the second or third aspect is implemented.

According to the data processing scheme provided by the embodiment of the invention, the target detection model and the training thereof can be deployed at the server side, the server side obtains the sample image according to the model training request of the client side, the boundary frame prediction is carried out through the hierarchical characteristic diagram corresponding to the first dimension value based on the label frame and the first dimension value thereof in the sample image and the target object in the sample image, and then the target detection model is trained based on the prediction result and the loss function, so that the training of the target detection model under the condition of no requirement on the resource or performance of the client side is realized, and the training effect and the training efficiency are ensured.

According to another data processing scheme provided by the embodiment of the invention, when the target object detection is carried out on the image to be detected, the target object detection is carried out on the characteristic diagram of the level corresponding to the first dimension value of the target object from the characteristic diagrams of a plurality of levels through the target detection model, so that an output accurate detection result can be obtained. This is because, for the target detection model using the multi-scale strategy, different target objects have different detection effects at different detection levels, where semantic features are stronger at a high scale, while regression points are more at a low scale, and context features are stronger. The first dimension value of the target object can guide the applicable detection level of the target object to a certain extent so as to obtain a more accurate detection result.

According to another data processing scheme provided by the embodiment of the invention, when the target detection model with the multi-scale strategy is trained, the first dimension value of the labeling frame labeling the target object is fully considered to determine the more appropriate level feature map capable of predicting the boundary frame of the target object, so that the feature map of the level is used for detection training, and a better training effect is achieved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present invention, and it is also possible for a person skilled in the art to obtain other drawings based on the drawings.

FIG. 1A is a flowchart illustrating steps of a data processing method according to an embodiment of the present invention;

FIG. 1B is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 1A;

FIG. 2A is a flowchart illustrating steps of a data processing method according to a second embodiment of the present invention;

FIG. 2B is a schematic diagram of an exemplary object detection model in the embodiment shown in FIG. 2A;

FIG. 3A is a flowchart illustrating steps of a data processing method according to a third embodiment of the present invention;

FIG. 3B is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 3A;

FIG. 4A is a flowchart illustrating steps of a data processing method according to a fourth embodiment of the present invention;

FIG. 4B is a diagram illustrating an example of a scenario in the embodiment shown in FIG. 4A;

fig. 5 is a block diagram of a data processing apparatus according to a fifth embodiment of the present invention;

fig. 6 is a block diagram of a data processing apparatus according to a sixth embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to a seventh embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention shall fall within the scope of the protection of the embodiments of the present invention.

The following further describes specific implementation of the embodiments of the present invention with reference to the drawings.

In order to facilitate understanding of the scheme provided by the embodiment of the present invention, a data processing procedure related to training of the target detection model used in the embodiment of the present invention is first described below, and then a data processing scheme related to target detection provided by the embodiment of the present invention is described based on the trained target detection model.

Example one

Referring to fig. 1A, a flowchart illustrating steps of a data processing method according to a first embodiment of the present invention is shown.

The data processing method of the embodiment can be used for training the target detection model, and comprises the following steps:

step S102: sample images for training a target detection model are acquired.

The sample image comprises an annotation frame and a target object, wherein the annotation frame comprises a first dimension value.

The labeling frame is used for labeling the target object, and at least one target object and a labeling frame corresponding to the target object are arranged in one sample image. In this embodiment of the present invention, the label box further includes a first dimension value, which may be suitable data capable of assisting in determining a feature map of a hierarchy used for detecting the target object, including but not limited to: aspect ratio or aspect ratio, coordinate information of the label box, and the like.

Step S104: and inputting the sample image into the convolution layer of the target detection model to obtain a plurality of levels of feature maps.

The target detection model is a model that can detect a target area in an input image and output a category of a target object of the target area. In the embodiment of the present invention, the target detection model may be any suitable model using a multi-scale strategy, and particularly, a one-stage target detection model, including but not limited to: FCOS (full relational One-Stage Object Detection) model, YOLO (You Only Look One) model, SSD (Single Shot MultiBox detector) model, Retina-Net model, and the like.

In the target detection model such as the above, a multi-scale strategy is generally used, and feature extraction and detection are performed on a target object at different scales from different feature extraction levels. Alternatively, the multi-scale strategy may use an FPN (Feature Pyramid Network) structure to perform the prediction output of multiple scales.

After an image, such as a sample image in a training process, is input into a target detection model, feature extraction processing is usually performed to obtain a corresponding feature map. In a feasible manner, a plurality of levels of feature extraction can be performed on the sample image through the target detection model, so as to obtain a plurality of corresponding feature maps.

Step S106: and performing boundary box prediction on the target object in a characteristic diagram of a hierarchy corresponding to the first dimension value of the labeling box.

In a feasible manner, before the feature map of the hierarchy corresponding to the first dimension value is not determined, the feature map of the hierarchy corresponding to the labeling frame in the sample image may be determined as a candidate feature map to be processed from the plurality of feature maps, so as to save time for subsequently determining the feature map of the hierarchy corresponding to the first dimension value and improve the model training efficiency.

As mentioned above, the sample image has at least one labeling box therein for labeling at least one target object. The target detection model generally adopts labeling frames with different sizes to perform detection through different hierarchies, so that different labeling frames can correspond to different feature extraction and detection hierarchies.

In a feasible manner, the offset of each boundary between the target object and the label box may be calculated first; determining a characteristic diagram of a hierarchy corresponding to the first dimension value according to the first dimension value of the labeling frame and the offset; performing bounding box prediction on the target object in a feature map of a level corresponding to the first dimension value. And combining the first dimension value and the offset, the feature map used for carrying out the bounding box prediction can be determined more accurately.

For example, the labeling boxes may be traversed first, a hierarchy corresponding to each traversed labeling box is determined, and then the target object detection processing is performed on the feature map of the hierarchy.

Taking a labeling frame as an example, determining a hierarchy corresponding to the labeling frame, and acquiring a feature map of the hierarchy as a candidate feature map to be processed. Then, the target object labeled by the labeling frame is obtained from the candidate feature map. For the target object, its offset from each boundary of the labeling box is calculated, which can be characterized by a distance. For example, the offset of each boundary of the target object and the labeling frame is obtained by calculating the distance between each pixel point of the target object and each boundary of the labeling frame.

If a feature map of a certain level has a plurality of labeled boxes, each labeled box can determine the offset of the target object labeled by each labeled box and each boundary of the labeled box by adopting the above method.

In the embodiments of the present invention, the numbers "plural" and "plural" relating to "plural" mean two or more unless otherwise specified.

After the offset is determined, corresponding processing can be performed according to the offset and the related information of the labeling frame. In one possible approach, this step can be implemented as: calculating the offset of each boundary of a target object and a labeling frame labeling the target object in a candidate characteristic diagram to be processed; and determining a characteristic diagram of a hierarchy corresponding to the first dimension value according to the first dimension value of the labeling frame and the offset. In this way, the candidate feature map may be modified by the first dimension value to determine a final feature map of a level corresponding to the first dimension value. For some target objects with elongated shapes, the hierarchy level corresponding to the candidate feature map may be high, so that sufficient detection information may not be obtained by subsequent detection, and therefore, the candidate feature map may be corrected based on the first dimension value. However, it should be understood by those skilled in the art that for some other shapes of target objects, the candidate feature map may be used as the feature map of the level corresponding to the first dimension value, and no correction is required.

Optionally, when determining the feature map of the hierarchy corresponding to the first dimension value according to the first dimension value and the offset, determining the allocation information of the target object according to the first dimension value of the label frame and the offset; and determining a characteristic diagram of a hierarchy corresponding to the first dimension value according to the distribution information. The assignment information may indicate the first dimension value and its corresponding level, which may be learned during training of the object detection model.

When the allocation information of the target object is determined according to the first dimension value of the labeling frame and the offset, the allocation information of the target object may be (1) determined according to the aspect ratio or the width-to-length ratio of the labeling frame and the offset; or, (2) determining the distribution information of the target object according to the length-to-height ratio or the height-to-length ratio of the labeling frame and the offset; or, (3) determining the distribution information of the target object according to the horizontal coordinate information and the vertical coordinate information of the labeling frame and the offset.

In the case of (1) above, the aspect ratio or width-to-length ratio of the labeling box characterizes the shape of the target object in both length and width dimensions, so as to facilitate subsequent guided assignment.

In the case of (2) above, the length-to-height ratio or height-to-height ratio of the labeling box characterizes the shape of the target object in both length and height dimensions, so as to facilitate subsequent guided assignment.

In the case of the above (3), on the one hand, the shape of the target object may be evaluated based on the distance between the minimum horizontal coordinate and the maximum horizontal coordinate of the labeling frame, and the distance between the minimum vertical coordinate and the maximum vertical coordinate of the labeling frame; on the other hand, the shape of the target object may be determined based on the coordinates, for example, if the minimum horizontal coordinate is used as the reference, the maximum horizontal coordinate is located within a first predetermined distance from the minimum horizontal coordinate, and the minimum vertical coordinate is used as the reference, the maximum vertical coordinate is located outside a second predetermined distance from the minimum vertical coordinate, the shapes of the standard frame and the target object may be determined, and the subsequent assignment may be guided based on the determined shapes. And vice versa. Wherein, the first preset distance and the second preset distance can be determined by those skilled in the art according to the actual situation. For example, a first predetermined distance being set small and a second predetermined distance being set large, an elongated shape may be characterized, and so on.

For simplicity of explanation, in various embodiments of the present invention, the embodiments of the present invention are explained in a manner of (1), that is, in a manner of aspect ratio and aspect ratio. Based on the above description, however, one skilled in the art should be able to adapt the description to other two situations in a manner that will be described in the following description, all of which are within the scope of the embodiments of the present invention.

As previously mentioned, the aspect ratio or width-to-length ratio of the labeling box may characterize the shape of the target object from the length-to-width dimension, e.g., an aspect ratio of 5: 1 indicates that the target object is a horizontally elongated object, and for example, the aspect ratio is 1: and 5, the target object is a vertically elongated object. In the embodiment of the present invention, the aspect ratio or the width-to-length ratio of the labeling box may further guide the target object to be allocated to the feature map of the corresponding hierarchy, so as to perform detection at the hierarchy.

In the multi-scale strategy, semantic features are stronger under high scale (high level), regression points are more under low scale (low level), and context features are stronger, so that a target object in a slender shape can obtain a better detection effect under low scale.

Therefore, in combination with the aspect ratio or width-to-length ratio of the labeling box and the offset, the target object in some shapes such as an elongated shape can be assigned to a feature map with a lower scale for detection, so as to obtain a more accurate detection effect.

When the allocation information of the target object is determined, the target object may be allocated to a hierarchical feature map matching the allocation information, and the bounding box prediction of the target object may be performed according to the hierarchical feature map.

The corresponding relation between the distribution information and the distribution levels can be learned through continuous training of the target detection model.

Furthermore, optionally, a corresponding relationship between the first dimension value and the corresponding hierarchy may also be output; or outputting the first dimension value, the distribution information corresponding to the first dimension value, and the corresponding relation between the levels corresponding to the distribution information. Those skilled in the art can evaluate and analyze the training effect according to the output information to facilitate further adjustment or optimization of the target detection model.

In one possible approach, a bounding box regression range may be set for each level, and based on the range and the offset, a bounding box prediction may be performed on the target object.

Step S108: and training the target detection model according to the result of the boundary box prediction and the loss function.

For example, a loss value is calculated according to the result of the bounding box prediction and a preset loss function, and the target detection model is trained on relevant parameters including distribution information and distribution levels according to the loss value.

The training iteration of the target detection model is performed, and each iteration is performed, the parameters in the target detection model are trained once until a training termination condition is met, for example, the training reaches a preset number of times, or a loss value meets a preset threshold value, and the like.

In some cases, when the sample image includes a plurality of target objects and a plurality of corresponding labeling frames, it may be selected whether to perform target detection by using a feature map of a hierarchy corresponding to a first dimension value according to a proportion of corresponding dimension values in a plurality of first dimension values corresponding to the plurality of labeling frames, or to perform target detection by using a feature map of a hierarchy corresponding to a labeling frame in a conventional manner. In this case, the scheme of the embodiment of the present invention may be implemented as follows: acquiring a sample image for training a target detection model, wherein the sample image comprises an annotation frame and a target object, and the annotation frame comprises a first dimension value; grouping a plurality of first dimension values corresponding to the plurality of labeling frames, and determining the ratio of the number of the first dimension values in each group to the number of all the first dimension values; inputting the sample image into a convolution layer of the target detection model to obtain a plurality of levels of feature maps; judging whether the ratio is larger than a preset ratio or not; if so, performing bounding box prediction on the target object in a feature map of a hierarchy corresponding to the first dimension value; otherwise, carrying out boundary box prediction on the target object in the characteristic diagram of the hierarchy corresponding to the labeling box; and training the target detection model according to the result of the boundary box prediction and the loss function. By the method, the flexibility of target object detection is improved.

Hereinafter, the above-described process is exemplified by a simple example, as shown in fig. 1B.

In fig. 1B, two

target objects

1 and 2 are set in the sample image a, and are respectively labeled to form labeled

frames

1 and 2, wherein the labeled frame 2 corresponding to the target object 2 is set to be a slender shaped labeled frame. In addition, multi-scale feature extraction and detection are carried out through five feature levels P3-P7 in the target detection model, wherein P3 is the lowest level, and P7 is the highest level.

And inputting the sample image A into a target detection model, performing convolution processing and feature extraction processing on an FPN structure through a backbone network of the target detection model to obtain a feature map corresponding to each level of P3-P7, and setting the feature map as F3-F7. If the annotation box 1 corresponds to the P5 level, calculating the offset of each boundary between the target object 1 annotated by the annotation box 1 and the annotation box 1 in F5, and if the aspect ratio of the annotation box 1 is 2: 3, the allocation information of the target object 1 may be determined according to the offset and the aspect ratio thereof, for example, still allocated to the feature map of the P5 level, so as to perform the bounding box prediction of the target object 1. And then, obtaining a loss value according to the boundary box prediction result and a preset loss function, and training a target detection model according to the loss value.

On the other hand, if the annotation box 2 corresponds to the P4 level, the offset of each boundary between the target object 2 annotated by the annotation box 2 and the annotation box 2 in F4 is calculated, and if the aspect ratio of the annotation box 2 is 1: 5, the allocation information of the target object 2 may be determined according to the offset and the aspect ratio thereof, for example, the allocation information is allocated to the feature map of the P3 level, that is, the feature map of a lower scale, so as to perform the bounding box prediction of the target object 2. And then, obtaining a loss value according to the boundary box prediction result and a preset loss function, and training a target detection model according to the loss value.

Therefore, according to the embodiment, when the target detection model with the multi-scale strategy is trained, the first dimension value of the labeling frame labeling the target object, such as the length-width ratio of the labeling frame, is fully considered, so that the feature map of a more appropriate level capable of predicting the boundary frame of the target object is determined, the feature map of the level is used for detection training, and a better training effect is achieved.

The data processing method of the present embodiment may be performed by any suitable electronic device having data processing capabilities, including but not limited to: servers, and PCs, etc.

Example two

Referring to fig. 2A, a flowchart illustrating steps of a data processing method according to a second embodiment of the present invention is shown.

In this embodiment, a specific target detection model FCOS model is taken as an example, and a training process of the target detection model according to the embodiment of the present invention is exemplarily described. It should be clear to a person skilled in the art that other similar object detection models or object detection models with multi-scale strategies may be applied to the solution of the embodiments of the present invention.

The FCOS model implements target detection in a pixel-by-pixel prediction manner, and its structure is shown in FIG. 2B, which includes: a Backbone network Backbone part, a characteristic pyramid structure FPN part and a detection part. Wherein, the backhaul part can adopt a multilayer convolution layer structure to extract the characteristics of the image; the FPN section is used to implement a multi-scale strategy, as shown in the figure, which uses five-scale feature mappings, i.e., five levels, P3-P7. The P3, the P4 and the P5 are obtained by performing convolution with 1 × 1 on the C3, the C4 and the C5 of the feature layer of the Backbone part, the P6 is obtained by performing convolution operation with the step size of 2 on the basis of the P5, and the P7 is obtained by performing convolution operation with the step size of 2 on the basis of the P6. Each level needs to go through pixel-by-pixel regression, and different levels need to go back to different size ranges.

In a conventional FCOS, a range of bounding box regression is defined at each level, and for each level, (1) a regression target in the current level is calculated: l, t, r and b (the distance between the pixel point and the labeling frame); judging whether max (l, t, r, b) > mi or max (l, t, r, b) < mi is met, wherein mi represents the maximum regression distance of the current level; if yes, the regression prediction of the boundary frame is not carried out, otherwise, the regression prediction of the boundary frame is carried out at the current level. However, the conventional method cannot detect a target object with a certain shape, and therefore, the embodiment of the present invention provides a training scheme that considers the first dimension value of the labeling frame of the target object.

The FCOS model uses a centrality center-less policy in the detection section to suppress detected low quality bounding boxes, as shown in FIG. 2B. The center-less strategy adds a branch in parallel with classification in each level of prediction to bring the predicted bounding box as close to the center as possible.

In addition, in the present embodiment, the preprocessing portion of the sample image and the initialization portion of the target detection model are also improved, and the following description is made in detail.

Based on this, the data processing method of the present embodiment can be used for training the target detection model, and the data processing method includes the following steps:

step S202: and acquiring an original sample image, and preprocessing the original sample image to acquire a sample image for training a target detection model.

In the present embodiment, the preprocessing operation on one sample image is taken as an example, but it should be understood by those skilled in the art that in practical application, the preprocessing operation can be performed on all sample images.

The method comprises the following steps: carrying out offset enhancement on original annotation data of an original sample image to obtain an offset-enhanced annotation frame; and acquiring a sample image for training a target detection model according to the original sample image and the labeling frame. The original sample image has an original labeling frame (namely original labeling data), and more labeling frames can be obtained by performing offset enhancement processing on the original sample image, so that the appearance frequency and the shape of the labeling frames in the sample image are enriched, and the hit rate of subsequent bounding box regression prediction can be improved.

In a feasible manner, the original annotation data of the original sample image is subjected to offset enhancement, and obtaining the offset-enhanced annotation frame may be implemented as follows: determining the ratio of the area formed by the original annotation data in the original sample image to the area of the training subarea of the original sample image; determining an offset enhancement frequency according to the ratio; according to the offset enhancement frequency, carrying out disturbance enhancement on the original marked data; and obtaining the marking frame after the offset enhancement according to the result of the disturbance enhancement. Wherein the area of the training sub-region may be determined by a person skilled in the art in a suitable manner, including but not limited to, determining from empirical values, determining by multiple pre-training, and the like. The target detection model is trained through the label box after disturbance enhancement, and the generalization capability of the target detection model can be improved.

In addition, in order to adapt to target object detection in an industrial detection scenario, in one possible approach, before performing offset enhancement on original annotation data of a sample image to obtain an annotation frame in the sample image, the method further includes: the original sample image is scaled by resampling based on the region pixel relationship. The processed sample image is used for training the target detection model, so that the target detection model has stronger anti-ripple interference capability.

An example of the above process is as follows, including:

substep 1: the original sample image is subjected to a scaling resize process.

For example, resize processing is performed on the sample image using resampling based on the regional pixel relationships. For example, resize the sample image using the inter _ area function in OPENCV.

Substep 2: and determining a sub-region with a preset size as a training region based on the sample image after resize processing and in combination with the original labeling frame.

As mentioned above, the preset size may be determined according to an empirical value or a pre-training result.

Substep 3: and performing offset enhancement on the original labeling frame in the training area according to the size and the position of the original labeling frame.

In this embodiment, the offset perturbation enhancement is performed. For example, for each original labeling box, calculating the ratio of the area of the original labeling box to the area of the sub-region; multiplying the reciprocal of the ratio by a set coefficient to obtain an offset enhancement frequency; and carrying out disturbance enhancement on the original labeling frame according to the offset enhancement frequency.

The setting coefficient may be set by a person skilled in the art according to actual conditions, and the embodiment of the present invention is not limited thereto. In performing the perturbation enhancement, two sets of random data may be computed, based on: and (4) calculating the offset of the original labeling frame by using a random number and a disturbance range, and newly adding a group channel of the labeling frame every time of offset enhancement.

It should be noted that, in practical applications, the above three sub-steps may be performed alternatively as described above, while other sub-steps are performed in other manners. For example, the shift enhancement of the labeling frame is performed according to sub-step 3, while the resize of the original sample image can be performed in other suitable manners, such as addition or interpolation, in sub-step 1.

By using the resampling mode based on the area pixel relation, the interference of downsampling of the image resize, such as moire interference, can be reduced, and meanwhile, the information loss is reduced. The down-sampling of the image leads to less input data, less resource occupation in the training stage and the application stage and higher speed. In order to compensate for the information loss caused by image down-sampling at another level, the substep 3 uses a random offset enhancement mode to improve the robustness and recall capability of the algorithm.

Step S204: and initializing the target detection model.

The method comprises the following steps: loading model parameters for a backbone network of a target detection model; the network weights in the detection header structure (e.g., "Head" of the detection section shown in fig. 2B) are initialized, and so on.

In this embodiment, optionally, the target detection model is initialized; wherein the initialization comprises at least one of: (1) initializing a backbone network of a target detection model by loading pre-trained model parameters, wherein the pre-trained model parameters are parameters subjected to joint training through a plurality of preset subtasks; (2) training and unlocking a parameter locking layer in a target detection model, wherein the parameter locking layer at least comprises a batch normalization layer.

In order to meet the requirements of an industrial detection scenario, in a feasible manner, as described in (1), a parameter jointly trained through a plurality of preset subtasks is loaded to a Backbone network (e.g., a backhaul in fig. 2B) of a target detection model, for example, a subtask used for target object detection in a photovoltaic scenario, a steel scenario, a cloth scenario, and the like, so that the target detection model is more suitable for the industrial detection scenario.

When initializing the network weight in the detection head structure, a Kaiming initialization mode can be used for random initialization.

In addition, for a non-industrial detection scenario, the target detection model may lock the BN (Batch Normalization) layer or the first layers of the backbone network, such as the first two layers, as in a conventional model, without training the parameters thereof. In this embodiment, however, the BN layer and/or the lock layer in the backbone network are unlocked, so that the parameters of these layers are adjusted each time the iterative training of the target detection model is performed, so as to better adapt to an industrial detection scenario.

Step S206: and inputting the preprocessed sample image into the initialized target detection model.

Step S208: and obtaining a plurality of levels of feature maps through the convolution layer of the target detection model.

The method comprises the following steps: and performing feature extraction of a plurality of levels on the sample image through the convolution layer of the target detection model to obtain a plurality of feature maps of a plurality of corresponding levels.

Taking the FCOS in fig. 2B as an example, after a sample image is input into a target detection model, feature extraction is performed through a backhaul part, and then, feature maps of multiple levels are obtained through an FPN part. In the structure shown in fig. 2B, five profiles respectively corresponding to the levels P3-P7 will be obtained.

Step S210: and performing boundary box prediction on the target object in the feature map of the hierarchy corresponding to the first dimension value of the labeling box.

For example, a feature map of a hierarchy corresponding to an annotation frame in a sample image may be determined as a candidate feature map to be processed from among a plurality of feature maps. Then, based on the candidate feature map, a feature map of a corresponding hierarchy is determined from the first dimension value of the labeling frame.

For example, in the FCOS shown in fig. 2B, if the level corresponding to a certain label box X in the sample image is P5, a feature map at the P5 level is acquired as a candidate feature map. Then, calculating the offset of each boundary of the target object and the labeling frame labeling the target object in the candidate feature map.

In a feasible manner, the offset of each pixel point of the target object and each boundary of the labeling frame in the candidate feature map can be calculated; and determining the maximum offset as the offset of each boundary of the labeling frame and the target object from the offsets of each boundary of each pixel point and each boundary of the labeling frame. Therefore, the offset degree of the target object relative to the marking frame can be determined more accurately.

For example, in the feature map of the P5 hierarchy, for the target object Y, the offsets from the current pixel point to the four edges of the labeling frame X are calculated on a pixel point by pixel point basis, and are respectively denoted as l, r, t, and d. Then, the relative offset amount of the target object Y with respect to the labeling frame X is O ═ max (l, r, t, d).

Further optionally, if the first dimension value of the labeling frame is in the form of an aspect ratio or an aspect ratio, the allocation information of the target object may be determined according to a maximum ratio of the aspect ratio and the aspect ratio of the labeling frame, and the offset.

In one possible approach, the maximum ratio can be determined from the aspect ratio (ratio of long side to short side) and the width-to-length ratio (ratio of short side to long side) of the label box; and determining the distribution information of the target object according to the maximum ratio of the labeling frame and the offset. Therefore, the maximum ratio is greater than or equal to 1, so that the shape characteristic of the labeling frame can be represented, and the subsequent calculation is facilitated.

Optionally, determining the allocation information of the target object according to the maximum ratio of the labeling box and the offset may include: processing the maximum ratio of the marking frames according to a preset activation function to obtain a processing result; and determining the distribution information of the target object according to the ratio of the offset to the processing result. In this way, the obtained allocation information can be in a reasonable and easily allocated interval.

For example, in the candidate feature map at the P5 level, the maximum ratio R of the aspect ratio and the width-to-length ratio of the corresponding labeling box X is calculated as max (h/w, w/h) for the target object Y, where h denotes a long side and w denotes a short side.

After determining that the relative offset of the target object Y with respect to the labeling frame X is O ═ max (l, R, t, d), and the maximum ratio R of the labeling frame X is max (h/w, w/h), the assignment information, in this example, the assignment value, can be calculated using the following formula:

S＝0.5*O/Sigmoid(R-1.0)

wherein Sigmoid () represents an activation function.

But not limited thereto, those skilled in the art may also use other activation functions and adjust corresponding parameters thereof so that the S value is within a reasonable interval, and optionally, the S value may take any value between [ 0.5, 1 ].

Aiming at the characteristic that the maximum ratio range of the length-width ratio and the width-length ratio of the elongated object to be detected is large, the elongated object to be detected is distributed to a feature map with a larger resolution by using a Sigmoid function in combination with the relative offset and the maximum ratio information, and the recall rate of the elongated object to be detected and the resolution capability of the elongated object to be detected relative to background interference are improved. In addition, the elongated object to be detected corresponding to the scale can be better learned by using a multi-scale distribution training mode, and the accuracy and the robustness of the algorithm are improved.

After the allocation information is determined, the target object may be allocated to a feature map of a hierarchy matching the allocation information, and the bounding box prediction of the target object may be performed according to the feature map of the hierarchy. Since the allocation information is determined based on the first dimension value and the offset, it is considered that the feature map of the hierarchy corresponding to the first dimension value is determined based on the allocation information, and the bounding box prediction of the target object is performed thereon.

Taking the FCOS shown in fig. 2B as an example, conventionally, the target object is assigned to the feature map of the corresponding hierarchy based on the aforementioned O ═ max (l, r, t, d), and regression prediction of the bounding box is performed. As in the previous example, the target object Y is still assigned to the P5 level for bounding box prediction.

In the embodiment, the level determination is performed according to the S value, and if the S value corresponds to a level with a lower scale, the target object Y is assigned to the level with the lower scale, such as the P4 level or the P3 level, for performing regression prediction of the bounding box. Of course, if the S value corresponds to the current level, the target object Y is still assigned to level P5. Therefore, the target detection model provided by the embodiment of the invention can effectively detect both the conventional target object and the irregular target object (such as the target object with the elongated shape).

The target detection model is provided with a multi-scale distribution list, the length of the multi-scale distribution list is consistent with the scale of a plurality of detection scales (levels) in the target detection model, parameters in the list represent the range of distribution information such as distribution values under certain scale distribution, and the parameters are learned through the training of the target detection model. According to the multi-scale distribution list and the distribution values of the labeling boxes, the target objects labeled by the labeling boxes can be distributed to the feature maps of the corresponding scales (levels) for carrying out the boundary box prediction.

Furthermore, the detection algorithm training of the target detection model can be performed according to the predicted boundary box and the label box of the boundary box prediction and the category corresponding to the boundary box. Wherein the multi-scale loss function can be balanced using linear combinations.

Step S212: and training the target detection model according to the result of the boundary box prediction and the loss function.

For example, a corresponding loss value may be determined according to a difference between the predicted bounding box and the labeling box, a category of the predicted bounding box and the predicted labeling box, and a preset loss function, and the target detection model may be trained according to the loss value, including training parameters in the multi-scale distribution list until a training termination condition is reached.

The scheme of the embodiment is applied to fine defect detection of the photovoltaic cell, and when the original FCOS model mode is used in the data of the photovoltaic cell, the test result after training on the photovoltaic cell is that the mAP is 54.19%; after the optimization mode of the embodiment of the invention is used, the test result after training on the photovoltaic cell piece is 70.10% of mAP, which is improved by about 16 points compared with the AP before optimization. Wherein, for the fine defect category: "single crack" indicates that the AP obtained in the original FCOS model was 47.55%, and the AP obtained in the example of the present invention was 74.06%. Wherein, for the elongated defect category: "black line", the AP of the original FCOS model was 31.31%, and the AP of the model of the present invention was 46.92%. Therefore, the scheme of the embodiment of the invention can better detect the tiny flaws under the background interference in the high-resolution industrial image.

It can be seen that, according to the embodiment, when a target detection model with a multi-scale strategy is trained, length and width features of a labeling frame labeling a target object are fully considered, a more appropriate detection level capable of predicting the boundary frame of the target object is determined according to a maximum ratio of the aspect ratio to the width ratio and offsets of the target object from each boundary of the labeling frame, the target object is allocated to a feature map of a corresponding level according to the more appropriate detection level, and the feature map of the level is used for detection training, so that a better training effect is achieved.

EXAMPLE III

Referring to fig. 3A, a flowchart of steps of a data processing method according to a third embodiment of the present invention is shown.

In this embodiment, based on the solutions in the first and second embodiments, the data processing method provided in the embodiment of the present invention is described from the perspective of applying a trained target detection model.

The data processing method of the embodiment can be used for target detection, and comprises the following steps:

step S302: and acquiring an image to be detected.

The image to be detected comprises a target object to be detected, the target detection model is used for detecting the target object of the image to be detected, and the target detection model can be a model trained by the data processing method in the first embodiment or the second embodiment. The target object includes a first dimension value, which may be an aspect ratio or a width-to-length ratio of the target object, or an aspect ratio of the target object, or horizontal coordinate information and vertical coordinate information of the target object, as described above. It should be noted that, in another possible manner, the target object in the image to be detected may not include the first dimension value, and the first dimension value may be obtained through self-calculation by the target detection model.

Step S304: and inputting the image to be detected into a target detection model.

In a feasible manner, if the target detection model preprocesses the sample image during the training process, in order to maintain consistency with the target detection model and improve detection speed and efficiency, the inputting the image to be detected into the target detection model may be implemented as: and carrying out scaling processing on the image to be detected through resampling based on the regional pixel relation, and inputting the scaled image to be detected into the target detection model.

In another possible way, when the image to be detected includes a plurality of target objects to be detected, this step may be implemented as: carrying out image segmentation on an image to be detected to obtain a plurality of image areas where a plurality of target objects to be detected are located; generating a plurality of corresponding sub-images according to the plurality of image areas; and respectively inputting the plurality of sub-images into the target detection model. By image segmentation, the image can be divided into several regions and the target object of interest extracted. This can reduce the load of target object detection. Optionally, after the detection of each target object is completed, the detection result may be fused into the image to be detected.

Step S306: and acquiring a detection result of the target object detection output by acquiring the hierarchical characteristic diagram corresponding to the first dimension value of the target object.

In one possible approach, the feature map of the hierarchy corresponding to the first dimension value of the target object is: a feature map of a hierarchy corresponding to an aspect ratio or a width-to-length ratio of the target object.

In another possible way, the feature map of the hierarchy corresponding to the first dimension value of the target object is: a feature map of a hierarchy corresponding to the aspect ratio or aspect ratio of the target object.

In yet another possible approach, the feature map of the hierarchy corresponding to the first dimension value of the target object is: a feature map of a hierarchy corresponding to a horizontal distance determined from horizontal coordinate information of the target object and a vertical distance determined from vertical coordinate information of the target object. For example, the detection result of the target object may be output according to a horizontal distance between the minimum horizontal coordinate and the maximum horizontal coordinate of the labeling frame and a vertical distance between the minimum vertical coordinate and the maximum vertical coordinate of the labeling frame. For another example, the shape of the target object may be determined based on the coordinates, for example, if the minimum horizontal coordinate is used as the reference, the maximum horizontal coordinate is located within a first predetermined distance from the minimum horizontal coordinate, and if the minimum vertical coordinate is used as the reference, the maximum vertical coordinate is located outside a second predetermined distance from the minimum vertical coordinate, the shapes of the standard frame and the target object may be determined, and the subsequent detection level allocation may be guided based on the determined shapes, and the detection result of the target object may be output. And vice versa. Wherein, the first preset distance and the second preset distance can be determined by those skilled in the art according to the actual situation. For example, a first predetermined distance being set small and a second predetermined distance being set large, an elongated shape may be characterized, and so on.

As described above, the first dimension value of the target object to be detected is fully considered in the target detection model of the present embodiment, so that the target detection model with the multi-scale strategy detects the target object by using a more appropriate feature map and feature hierarchy.

In an alternative, this step may be implemented as: according to the length-width ratio or the width-length ratio of the target object, the target object is distributed to the feature map of the corresponding level to carry out the detection of the bounding box; and obtaining a detection result of the boundary frame detection output by the target detection model.

The step of assigning the target object to the feature map of the corresponding hierarchy for bounding box detection according to the aspect ratio or the width-to-length ratio of the target object may be: and according to the maximum ratio of the aspect ratio or the width-length ratio of the target object, distributing the target object to the feature map of the corresponding level for detecting the bounding box. Thus, the calculation cost of the target detection model is reduced.

In addition, optionally, the target object may be detected according to a feature map of a hierarchy corresponding to the first dimension value and a color of the target object, and a detection result may be output. That is, not only the first dimension value of the target object but also the color of the target object are considered at the same time, so as to further improve the detection accuracy of the target object.

Hereinafter, the above-described process is exemplified by an example, as shown in fig. 3B.

In fig. 3B, the image a to be detected is pre-processed and scaled to the image a' to be detected with a certain size, for example, 512 × 512 to 256 × 256. Then, the image A' to be detected is input into the trained target detection model. And detecting the target object Y in the target detection model. If the target object Y is in a slender shape, the detection of the target object Y will be performed on a feature map assigned to a low scale, compared to a conventional target detection model. For example, the target object Y is detected at the P5 level in the conventional manner, but is assigned to the P3 level for detection in the present embodiment. The P3 level focuses more on the context information of the target object Y, and the regression detection point is more, so that a more accurate detection result can be obtained, as shown in fig. 3B.

In another application scenario, the data processing scheme provided by the embodiment of the present invention may also be applied to an image generation scenario or a poster generation scenario, and the detected target object is used as a part of an image to be generated and is combined with other parts. The data processing scheme under such a scenario can be implemented as follows: acquiring a first image, wherein the first image comprises a preset target object, and the target object comprises a first dimension value; inputting a first image into a target detection model, wherein the target detection model generates a plurality of levels of feature maps corresponding to the first image; acquiring a detection result of the target object, which is output by detecting the target object by acquiring the hierarchical characteristic diagram corresponding to the first dimension value; and synthesizing a second image according to the detection result of the target object and preset image information. The preset image information may be set by a person skilled in the art according to actual requirements, for example, the preset image information may include at least one of text information and image information. In this way, the detection result of the target object is effectively utilized, and the generation efficiency of the composite image is improved.

According to the embodiment, when the target object detection is performed on the image to be detected, the target object detection is performed on the characteristic diagram of the hierarchy corresponding to the first dimension value of the target object from the characteristic diagrams of the multiple hierarchies through the target detection model, and then an output accurate detection result can be obtained. This is because, for the target detection model using the multi-scale strategy, different target objects have different detection effects at different detection levels, where semantic features are stronger at a high scale, while regression points are more at a low scale, and context features are stronger. The first dimension value of the target object, such as the aspect ratio or the width-to-length ratio, especially the aspect ratio or the width-to-length ratio of the elongated target object, may guide the applicable detection level of the target object to a certain extent to obtain a more accurate detection result.

Example four

Referring to fig. 4A, a flowchart of steps of a data processing method according to a fourth embodiment of the present invention is shown.

In this embodiment, a data processing method provided in an embodiment of the present invention is described by taking an example in which a target detection model is deployed on a server (such as a cloud or a server or a SaaS platform) and the target detection model is trained according to a client request.

The data processing method of the embodiment can train a target detection model of a server based on a request of a client, and comprises the following steps:

step S402: and obtaining a model training request of the target detection model.

The target detection model is a model for detecting a target object, such as the target detection model described in the first or second embodiment, and the model training request may be a request in any appropriate form.

Step S404: and acquiring a sample image for training the target detection model according to the model training request.

The sample image comprises an annotation frame and a target object, wherein the annotation frame comprises a first dimension value. As previously mentioned, the first dimension value may be: the aspect ratio or width-to-length ratio of the labeling frame, the aspect ratio or height-to-length ratio of the labeling frame, the horizontal coordinate information and the vertical coordinate information of the labeling frame, and the like.

In a feasible manner, when a target detection model is deployed on a SaaS (Software-as-a-Service) platform, and a model training request of a client is received by the SaaS platform to train the target detection model, a sample image for training the target detection model can be locally obtained from the SaaS platform according to the model training request. In this case, the SaaS platform locally stores the image suitable for the sample, and can directly obtain the image, thereby improving the speed and efficiency of training the target detection model.

In another feasible mode, when the target detection model is deployed on the SaaS platform and the model training request of the client is received through the SaaS platform to train the target detection model, the SaaS platform may collect a sample image for training the target detection model from a third party according to the model training request. Such as obtaining the sample image from a third-party website over a network or from a data interface provided by a third-party application. Under the condition, the SaaS platform obtains the sample image through a third party, local storage is not needed, and storage resources of the SaaS platform are saved.

In another feasible manner, when the target detection model is deployed on the SaaS platform and the data model is trained by receiving a training request of a client model through the SaaS platform, the SaaS platform may obtain a sample image for training the target detection model from the client according to the model training request. In this case, the sample image is stored in the client, and the SaaS platform obtains the sample image from the client, so that a target detection model more meeting the requirements of the client can be trained.

Step S406: and training a target detection model by using the sample image.

The method comprises the following steps: inputting a sample image into a convolution layer of a target detection model to obtain a plurality of levels of feature maps; performing bounding box prediction on a target object in a feature map of a hierarchy corresponding to the first dimension value; and training the target detection model according to the result of the boundary box prediction and the loss function.

Wherein performing bounding box prediction on the target object in the feature map of the hierarchy corresponding to the first dimension value may include: calculating the offset of each boundary of the target object and the labeling frame; determining a feature map of a hierarchy corresponding to the first dimension value according to the first dimension value and the offset; performing bounding box prediction on the target object in a feature map of a level corresponding to the first dimension value.

Optionally, the determining, according to the first dimension value and the offset, the feature map of the hierarchy corresponding to the first dimension value may be implemented as: determining distribution information of the target object according to the first dimension value and the offset; and determining a characteristic diagram of a hierarchy corresponding to the first dimension value according to the distribution information.

It should be noted that, the above process of training the target detection model can refer to the description of the data processing method in the foregoing embodiment one or two, and is not described herein again.

In the following, the above process is exemplarily described by taking an example that the target detection model is deployed on the SaaS platform, as shown in fig. 4B.

In fig. 4B, the client sends a model training request to the SaaS platform; after the SaaS platform receives the request, the processing equipment acquires a sample image for training the target detection model from the local storage equipment; the SaaS platform trains a target detection model based on the acquired sample image; and after finishing the training of the target detection model, the SaaS platform sends a training completion message to the client. Subsequently, if the client has a requirement, the client can send the image to be detected to the SaaS platform to obtain a detection result of the corresponding target object.

In the above, the target detection model is deployed on the SaaS platform as an example, but it should be understood by those skilled in the art that the solution of the present embodiment is also applicable to a case where the target detection model is deployed on a server in other forms.

It can be seen that, according to this embodiment, both the target detection model and the training thereof are deployed at the server, the server obtains the sample image according to the model training request of the client, and based on the label frame and the first dimension value thereof in the sample image and the target object in the sample image, the boundary frame prediction is performed through the feature map of the hierarchy corresponding to the first dimension value, and then the target detection model is trained based on the prediction result and the loss function, so that the training of the target detection model under the condition of no requirement on the resource or performance of the client is realized, and the training effect and efficiency are ensured.

EXAMPLE five

Referring to fig. 5, a block diagram of a data processing apparatus according to a fifth embodiment of the present invention is shown.

The data processing apparatus of the present embodiment may be used for target detection model training, and the data processing apparatus includes: a first obtaining module 502, configured to obtain a sample image for training a target detection model, where the sample image includes an annotation frame and a target object, and the annotation frame includes a first dimension value; a second obtaining module 504, configured to input the sample image into a convolution layer of the target detection model, so as to obtain feature maps of multiple levels; a prediction module 506, configured to perform bounding box prediction on the target object in a feature map of a hierarchy corresponding to the first dimension value; a training module 508, configured to train the target detection model according to the result of the bounding box prediction and the loss function.

Optionally, the prediction module 506 comprises: a calculating module 5062, configured to calculate an offset of each boundary of the target object and the label box; a determining module 5064, configured to determine, according to the first dimension value of the labeling frame and the offset, a feature map of a hierarchy corresponding to the first dimension value; an executing module 5066, configured to perform bounding box prediction on the target object in a feature map of a hierarchy corresponding to the first dimension value.

Optionally, the determining module 5064 is configured to determine, according to the first dimension value of the labeling frame and the offset, allocation information of the target object; and determining a characteristic diagram of a hierarchy corresponding to the first dimension value according to the distribution information.

Optionally, the determining module 5064 is configured to determine the allocation information of the target object according to the aspect ratio or the aspect ratio of the label box and the offset.

Optionally, the determining module 5064 is configured to determine a maximum ratio from the aspect ratio and the width-to-length ratio of the labeling frame when determining the allocation information of the target object according to the aspect ratio or the width-to-length ratio of the labeling frame and the offset; and determining the distribution information of the target object according to the maximum ratio of the labeling frame and the offset.

Optionally, when determining the distribution information of the target object according to the maximum ratio of the labeling frame and the offset, the determining module 5064 processes the maximum ratio of the labeling frame according to a preset activation function to obtain a processing result; and determining the distribution information of the target object according to the ratio of the offset to the processing result.

Optionally, the determining module 5064 is configured to determine the allocation information of the target object according to the length-to-height ratio or the height-to-length ratio of the label box and the offset.

Optionally, the determining module 5064 is configured to determine the allocation information of the target object according to the horizontal coordinate information and the vertical coordinate information of the labeling box and the offset.

Optionally, the calculating module 5062 is configured to calculate an offset between each pixel point of the target object and each boundary of the annotation frame; and determining the maximum offset as the offset of each boundary of the target object and the labeling frame from the offsets of each pixel point and each boundary of the labeling frame.

Optionally, the first obtaining module 502 is configured to perform offset enhancement on original annotation data of an original sample image, so as to obtain an offset-enhanced annotation frame; and acquiring a sample image for training a target detection model according to the original sample image and the labeling frame.

Optionally, the first obtaining module 502 is configured to determine a ratio of an area formed by original annotation data in the original sample image to an area of a training sub-region of the original sample image; determining an offset enhancement frequency according to the ratio; according to the offset enhancement frequency, carrying out disturbance enhancement on the original marked data; and obtaining the marking frame after the offset enhancement according to the result of the disturbance enhancement.

Optionally, the first obtaining module 502 is further configured to perform scaling processing on the sample image through resampling based on a region pixel relationship before performing offset enhancement on the original labeling frame of the sample image to obtain an offset-enhanced labeling frame.

Optionally, the data processing apparatus of this embodiment further includes: an initialization module 510, configured to initialize the target detection model before the sample image is input into the convolutional layer of the target detection model; wherein the initialization comprises at least one of: initializing a backbone network of the target detection model by loading pre-trained model parameters, wherein the pre-trained model parameters are parameters subjected to joint training through a plurality of preset subtasks; and training and unlocking a parameter locking layer in the target detection model, wherein the parameter locking layer at least comprises a batch normalization layer.

Optionally, the data processing apparatus of this embodiment further includes: an output module 512, configured to output a corresponding relationship between the first dimension value and a corresponding hierarchy; or outputting the first dimension value, the distribution information corresponding to the first dimension value, and the corresponding relation between the levels corresponding to the distribution information.

Optionally, the sample image includes a plurality of target objects and a plurality of corresponding labeling frames; the first obtaining module 502 is further configured to, before the sample image is input into the convolutional layer of the target detection model and a feature map of multiple hierarchies is obtained, group a plurality of first dimension values corresponding to the plurality of labeling frames, and determine a ratio of the number of the first dimension values in each group to the number of all the first dimension values; the predicting module 506 is further configured to determine whether the ratio is greater than a preset ratio after the second obtaining module 504 inputs the sample image into the convolution layer of the target detection model to obtain feature maps of multiple levels; if so, executing the operation of performing the boundary box prediction on the target object in the feature map of the hierarchy corresponding to the first dimension value; otherwise, executing the operation of carrying out the boundary box prediction on the target object in the characteristic diagram of the hierarchy corresponding to the labeling box.

The data processing apparatus of this embodiment is configured to implement the corresponding data processing method in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiment, which are not described herein again. In addition, the functional implementation of each module in the data processing apparatus of this embodiment can refer to the description of the corresponding part in the foregoing method embodiment, and is not repeated here.

EXAMPLE six

Referring to fig. 6, a block diagram of a data processing apparatus according to a sixth embodiment of the present invention is shown.

The data processing apparatus of the present embodiment is applicable to target detection, and includes: a third obtaining module 602, configured to obtain an image to be detected, where the image to be detected includes a target object to be detected, and the target object includes a first dimension value; an input module 604, configured to input an image to be detected into a target detection model, where the target detection model generates a feature map of multiple levels corresponding to the image to be detected; a fourth obtaining module 606, configured to obtain a detection result that is output by performing target object detection on the hierarchical feature map corresponding to the first dimension value of the target object.

Alternatively, the target detection model may be a target detection model obtained by training through the target detection model training apparatus in the fifth embodiment.

Optionally, the feature map of the hierarchy corresponding to the first dimension value of the target object is: a feature map of a hierarchy corresponding to an aspect ratio or a width-to-length ratio of the target object.

Optionally, the feature map of the hierarchy corresponding to the first dimension value of the target object is: a feature map of a hierarchy corresponding to the aspect ratio or aspect ratio of the target object.

Optionally, the feature map of the hierarchy corresponding to the first dimension value of the target object is: a feature map of a hierarchy corresponding to a horizontal distance determined from horizontal coordinate information of the target object and a vertical distance determined from vertical coordinate information of the target object.

Optionally, the fourth obtaining module 606 is configured to, according to an aspect ratio or a width-to-length ratio of the target object, allocate the target object to a feature map of a corresponding hierarchy for bounding box detection; and obtaining a detection result of the boundary frame detection output by the target detection model.

Optionally, when the target object is allocated to the feature map of the corresponding hierarchy for bounding box detection according to the aspect ratio or the width-to-length ratio of the target object, the fourth obtaining module 606 allocates the target object to the feature map of the corresponding hierarchy for bounding box detection according to the maximum ratio of the aspect ratio or the width-to-length ratio of the target object.

Optionally, the input module 604 is configured to perform scaling processing on the image to be detected through resampling based on the region pixel relationship, and input the scaled image to be detected into the target detection model.

Optionally, the image to be detected includes a plurality of target objects to be detected; an input module 604, configured to perform image segmentation on the image to be detected, so as to obtain a plurality of image areas where the plurality of target objects to be detected are located; generating a plurality of corresponding sub-images according to the image areas; and respectively inputting the plurality of sub-images into the target detection model.

Optionally, the fourth obtaining module 606 is configured to detect the target object according to the feature map of the hierarchy corresponding to the first dimension value and the color of the target object, and output a detection result.

EXAMPLE seven

Referring to fig. 7, a schematic structural diagram of an electronic device according to a seventh embodiment of the present invention is shown, and the specific embodiment of the present invention does not limit the specific implementation of the electronic device.

As shown in fig. 7, the electronic device may include: a processor (processor)702, a Communications Interface 704, a memory 706, and a communication bus 708.

Wherein:

the processor 702, communication interface 704, and memory 706 communicate with each other via a communication bus 708.

A communication interface 704 for communicating with other electronic devices or servers.

The processor 702 is configured to execute the program 710, and may specifically execute relevant steps in the above-described method embodiments.

In particular, the program 710 may include program code that includes computer operating instructions.

The processor 702 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the present invention. The intelligent device comprises one or more processors which can be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

The memory 706 stores a program 710. The memory 706 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

In a first embodiment:

the program 710 may specifically be used to cause the processor 702 to perform the following operations: acquiring a sample image for training a target detection model, wherein the sample image comprises an annotation frame and a target object, and the annotation frame comprises a first dimension value; inputting the sample image into a convolution layer of the target detection model to obtain a plurality of levels of feature maps; performing bounding box prediction on the target object in a feature map of a hierarchy corresponding to the first dimension value; and training the target detection model according to the result of the boundary box prediction and the loss function.

In an alternative embodiment, the program 710 is further configured to cause the processor 702 to, when performing the bounding box prediction on the target object in the feature map of the hierarchy corresponding to the first dimension value: calculating the offset of each boundary of the target object and the labeling frame; determining a feature map of a hierarchy corresponding to the first dimension value according to the first dimension value and the offset; performing bounding box prediction on the target object in a feature map of a level corresponding to the first dimension value.

In an alternative embodiment, the program 710 is further configured to cause the processor 702, when determining the feature map of the hierarchy corresponding to the first dimension value according to the first dimension value and the offset amount, to: determining the distribution information of the target object according to the first dimension value of the labeling frame and the offset; and determining a characteristic diagram of a hierarchy corresponding to the first dimension value according to the distribution information.

In an alternative embodiment, the program 710 is further configured to cause the processor 702, in calculating the offsets of the target object from the respective boundaries of the callout box: calculating the offset of each pixel point of the target object and each boundary of the labeling frame; and determining the maximum offset as the offset of each boundary of the target object and the labeling frame from the offsets of each pixel point and each boundary of the labeling frame.

In an alternative embodiment, the program 710 is further configured to cause the processor 702, when determining the assignment information of the target object according to the first dimension value of the label box and the offset amount: and determining the distribution information of the target object according to the aspect ratio or the width-length ratio of the labeling frame and the offset.

In an alternative embodiment, the program 710 is further configured to cause the processor 702, when determining the allocation information of the target object according to the aspect ratio or the aspect ratio of the label box and the offset amount: determining the maximum ratio from the aspect ratio or the width-length ratio of the labeling frame; and determining the distribution information of the target object according to the maximum ratio of the labeling frame and the offset.

In an alternative embodiment, the program 710 is further configured to cause the processor 702, when determining the allocation information of the target object according to the maximum ratio of the label box and the offset amount: processing the maximum ratio of the marking frames according to a preset activation function to obtain a processing result; and determining the distribution information of the target object according to the ratio of the offset to the processing result.

In an alternative embodiment, the program 710 is further configured to cause the processor 702, when determining the assignment information of the target object according to the first dimension value of the label box and the offset amount: and determining the distribution information of the target object according to the length-height ratio or the height-length ratio of the labeling frame and the offset.

In an alternative embodiment, the program 710 is further configured to cause the processor 702, when determining the assignment information of the target object according to the first dimension value of the label box and the offset amount: and determining the distribution information of the target object according to the horizontal coordinate information and the vertical coordinate information of the labeling frame and the offset.

In an optional implementation, the program 710 is further configured to enable the processor 702, when acquiring a sample image for training a target detection model, perform offset enhancement on original annotation data of the original sample image to obtain an offset-enhanced annotation frame; and acquiring a sample image for training a target detection model according to the original sample image and the labeling frame.

In an alternative embodiment, the program 710 is further configured to cause the processor 702 to, when performing a shift enhancement on the original annotation data of the original sample image to obtain a shift-enhanced annotation frame: determining the ratio of the area formed by original labeling data in an original sample image to the area of a training sub-region of the original sample image; determining an offset enhancement frequency according to the ratio; according to the offset enhancement frequency, carrying out disturbance enhancement on the original marked data; and obtaining the marking frame after the offset enhancement according to the result of the disturbance enhancement.

In an alternative embodiment, the program 710 is further configured to cause the processor 702 to perform a scaling process on the original sample image by resampling based on the region pixel relationship before performing a shift enhancement on the original annotation data of the original sample image to obtain a shift-enhanced annotation frame.

In an alternative embodiment, the program 710 is further configured to cause the processor 702 to initialize the object detection model before inputting the sample image into the convolutional layer of the object detection model; wherein the initialization comprises at least one of: initializing a backbone network of the target detection model by loading pre-trained model parameters, wherein the pre-trained model parameters are parameters subjected to joint training through a plurality of preset subtasks; and training and unlocking a parameter locking layer in the target detection model, wherein the parameter locking layer at least comprises a batch normalization layer.

In an alternative embodiment, the program 710 is further configured to cause the processor 702 to output a correspondence between the first dimension value and the corresponding hierarchy level; or outputting the first dimension value, the distribution information corresponding to the first dimension value, and the corresponding relation between the levels corresponding to the distribution information.

In an optional embodiment, the sample image includes a plurality of target objects and a plurality of corresponding labeling boxes; the program 710 is further configured to enable the processor 702 to group a plurality of first dimension values corresponding to the plurality of labeling boxes before inputting the sample image into the convolutional layer of the target detection model to obtain a plurality of hierarchical feature maps, and determine a ratio of the number of first dimension values in each group to the number of all first dimension values; the program 710 is further configured to enable the processor 702 to determine whether the ratio is greater than a preset ratio after inputting the sample image into the convolution layer of the target detection model to obtain a plurality of levels of feature maps; if so, executing the operation of performing the boundary box prediction on the target object in the feature map of the hierarchy corresponding to the first dimension value; otherwise, executing the operation of carrying out the boundary box prediction on the target object in the characteristic diagram of the hierarchy corresponding to the labeling box.

In a second embodiment:

the program 710 may specifically be used to cause the processor 702 to perform the following operations: acquiring an image to be detected, wherein the image to be detected comprises a target object to be detected, and the target object comprises a first dimension value; inputting an image to be detected into a target detection model, wherein the target detection model generates a plurality of levels of feature maps corresponding to the image to be detected; and acquiring a detection result of the target object detection output by acquiring the hierarchical characteristic diagram corresponding to the first dimension value of the target object. The target detection model may be a target detection model obtained by training according to the first embodiment.

In an alternative embodiment, the feature map of the hierarchy corresponding to the first dimension value of the target object is: a feature map of a hierarchy corresponding to an aspect ratio or a width-to-length ratio of the target object.

In an alternative embodiment, the feature map of the hierarchy corresponding to the first dimension value of the target object is: a feature map of a hierarchy corresponding to the aspect ratio or aspect ratio of the target object.

In an alternative embodiment, the feature map of the hierarchy corresponding to the first dimension value of the target object is: a feature map of a hierarchy corresponding to a horizontal distance determined from horizontal coordinate information of the target object and a vertical distance determined from vertical coordinate information of the target object.

In an alternative embodiment, the program 710 is further configured to cause the processor 702, when obtaining the feature map of the hierarchy corresponding to the first dimension value of the target object for target object detection output detection result: according to the length-width ratio or the width-length ratio of the target object, distributing the target object to the feature map of the corresponding level for detecting the bounding box; and obtaining a detection result of the boundary frame detection output by the target detection model.

In an alternative embodiment, the program 710 is further configured to cause the processor 702, when allocating the target object to the feature map of the corresponding hierarchy for bounding box detection according to the aspect ratio or aspect ratio of the target object: and according to the maximum ratio of the aspect ratio or the width-length ratio of the target object, distributing the target object to the feature map of the corresponding level for detecting the bounding box.

In an alternative embodiment, the program 710 is further configured to cause the processor 702, when inputting the image to be detected into the object detection model: and carrying out scaling processing on the image to be detected through resampling based on the regional pixel relation, and inputting the scaled image to be detected into a target detection model.

In an optional implementation manner, the image to be detected includes a plurality of target objects to be detected; the program 710 is further for causing the processor 702, when inputting the image to be detected into the object detection model: carrying out image segmentation on the image to be detected to obtain a plurality of image areas where the target objects to be detected are located; generating a plurality of corresponding sub-images according to the image areas; and respectively inputting the plurality of sub-images into the target detection model.

In an alternative embodiment, the program 710 is further configured to enable the processor 702, when acquiring the feature map of the hierarchy corresponding to the first dimension value for the detection result of the target object detection output: and detecting the target object according to the feature map of the hierarchy corresponding to the first dimension value and the color of the target object, and outputting a detection result.

In a third embodiment:

the program 710 may specifically be used to cause the processor 702 to perform the following operations: obtaining a model training request of a target detection model; acquiring a sample image for training the target detection model according to the model training request, wherein the sample image comprises an annotation frame and a target object, and the annotation frame comprises a first dimension value; inputting the sample image into a convolution layer of the target detection model to obtain a plurality of levels of feature maps; performing bounding box prediction on the target object in a feature map of a hierarchy corresponding to the first dimension value; and training the target detection model according to the result of the boundary box prediction and the loss function.

In an alternative embodiment, the program 710 is further configured to cause the processor 702, when determining the feature map of the hierarchy corresponding to the first dimension value according to the first dimension value and the offset amount, to: determining distribution information of the target object according to the first dimension value and the offset; and determining a characteristic diagram of a hierarchy corresponding to the first dimension value according to the distribution information.

In an alternative embodiment, the first dimension value is an aspect ratio or a width-to-length ratio of the labeling box.

In an alternative embodiment, the program 710 is further configured to cause the processor 702, when obtaining sample images for training the target detection model according to the model training request, to: according to the model training request, locally obtaining a sample image for training the target detection model from a SaaS platform; or acquiring a sample image for training the target detection model from a third party by the SaaS platform according to the model training request; or acquiring a sample image for training the target detection model from the client by the SaaS platform according to the model training request.

For specific implementation of each step in the program 710, reference may be made to corresponding steps and corresponding descriptions in units in the foregoing method embodiments, which are not described herein again. It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described devices and modules may refer to the corresponding process descriptions in the foregoing method embodiments, and are not described herein again.

It should be noted that, according to the implementation requirement, each component/step described in the embodiment of the present invention may be divided into more components/steps, and two or more components/steps or partial operations of the components/steps may also be combined into a new component/step to achieve the purpose of the embodiment of the present invention.

The above-described method according to an embodiment of the present invention may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium downloaded through a network and to be stored in a local recording medium, so that the method described herein may be stored in such software processing on a recording medium using a general-purpose computer, a dedicated processor, or programmable or dedicated hardware such as an ASIC or FPGA. It will be appreciated that the computer, processor, microprocessor controller or programmable hardware includes memory components (e.g., RAM, ROM, flash memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the data processing methods described herein. Further, when a general-purpose computer accesses code for implementing the data processing method shown herein, execution of the code converts the general-purpose computer into a special-purpose computer for executing the data processing method shown herein.

Those of ordinary skill in the art will appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present embodiments.

The above embodiments are only for illustrating the embodiments of the present invention and not for limiting the embodiments of the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the embodiments of the present invention, so that all equivalent technical solutions also belong to the scope of the embodiments of the present invention, and the scope of patent protection of the embodiments of the present invention should be defined by the claims.

Claims

1. A data processing method, comprising:

Get the model training request of the target detection model;

obtaining a sample image for training the target detection model according to the model training request, wherein the sample image includes a label frame and a target object, and the label frame includes a first dimension value;

Input the sample image into the convolution layer of the target detection model to obtain feature maps of multiple levels;

performing bounding box prediction on the target object in the feature map of the level corresponding to the first dimension value;

The target detection model is trained according to the result of the bounding box prediction and the loss function.

2. The method according to claim 1, wherein the performing bounding box prediction on the target object in the feature map of the level corresponding to the first dimension value comprises:

Calculate the offset between the target object and each boundary of the callout frame;

According to the first dimension value and the offset, determine the feature map of the level corresponding to the first dimension value;

In the feature map of the hierarchy corresponding to the first dimension value, bounding box prediction is performed on the target object.

3. The method according to claim 2, wherein the determining, according to the first dimension value and the offset, the feature map of the level corresponding to the first dimension value, comprises:

determining allocation information of the target object according to the first dimension value and the offset;

According to the allocation information, a feature map of a hierarchy corresponding to the first dimension value is determined.

4. The method according to any one of claims 1-3, wherein the first dimension value is an aspect ratio or an aspect ratio of the annotation frame.

5. The method according to any one of claims 1-3, wherein the obtaining sample images for training the target detection model according to the model training request comprises:

According to the model training request, locally obtain sample images for training the target detection model from the SaaS platform;

or,

According to the model training request, the SaaS platform obtains sample images for training the target detection model from a third party;

or,

According to the model training request, the SaaS platform obtains sample images for training the target detection model from the client.

6. A data processing method, comprising:

acquiring an image to be detected, wherein the image to be detected includes a target object to be detected, and the target object includes a first dimension value;

Inputting the image to be detected into a target detection model, wherein the target detection model generates feature maps of multiple levels corresponding to the image to be detected;

The detection result of the target object detection output of the feature map of the level corresponding to the first dimension value is acquired.

7. The method of claim 6, wherein,

The feature map of the level corresponding to the first dimension value is: the feature map of the level corresponding to the aspect ratio or the aspect ratio of the target object.

8. The method of claim 6, wherein,

9. The method of claim 6, wherein,

The feature map of the level corresponding to the first dimension value is: the level corresponding to the horizontal distance determined according to the horizontal coordinate information of the target object and the vertical distance determined according to the vertical coordinate information of the target object feature map.

10. The method according to claim 7, wherein the obtaining the detection result of the target object detection output by the feature map of the level corresponding to the first dimension value comprises:

According to the aspect ratio or the aspect ratio of the target object, the target object is allocated to the feature map of the corresponding level to perform bounding box detection;

A detection result of bounding box detection output by the target detection model is obtained.

11. The method according to claim 10, wherein, according to the aspect ratio or aspect ratio of the target object, allocating the target object to a feature map of a corresponding level to perform bounding box detection, comprising:

According to the aspect ratio of the target object or the maximum ratio of the aspect ratio, the target object is allocated to the feature map of the corresponding level to perform bounding box detection.

12. The method according to any one of claims 6-11, wherein the inputting the image to be detected into a target detection model comprises:

The to-be-detected image is subjected to scaling processing through resampling based on the relationship between regional pixels, and the scaled to-be-detected image is input into the target detection model.

13. The method according to claim 6, wherein the to-be-detected image includes a plurality of to-be-detected target objects;

The inputting the to-be-detected image into the target detection model includes: performing image segmentation on the to-be-detected image to obtain multiple image areas where the multiple to-be-detected target objects are located; corresponding multiple sub-images; inputting the multiple sub-images into the target detection model respectively.

14. The method according to claim 6, wherein the obtaining the detection result of the target object detection output by the feature map of the level corresponding to the first dimension value comprises:

The target object is detected according to the feature map of the level corresponding to the first dimension value and the color of the target object, and a detection result is output.

15. A data processing method, comprising:

acquiring a first image, wherein the first image includes a preset target object, and the target object includes a first dimension value;

inputting the first image into a target detection model, wherein the target detection model generates feature maps of multiple levels corresponding to the first image;

Obtaining the detection result of the target object output by the feature map of the level corresponding to the first dimension value for target object detection output;

A second image is synthesized according to the detection result of the target object and preset image information.

16. A data processing method, comprising:

acquiring a sample image for training a target detection model, wherein the sample image includes a label frame and a target object, and the label frame includes a first dimension value;

17. The method according to claim 16, wherein the performing bounding box prediction on the target object in the feature map of the level corresponding to the first dimension value comprises:

18. The method according to claim 17, wherein the determining the feature map of the level corresponding to the first dimension value according to the first dimension value and the offset comprises:

Determine the allocation information of the target object according to the first dimension value of the callout frame and the offset;

19. The method according to claim 18, wherein the determining the allocation information of the target object according to the first dimension value of the callout box and the offset comprises:

The allocation information of the target object is determined according to the aspect ratio or the aspect ratio of the callout frame and the offset.

20. The method according to claim 19, wherein the determining the allocation information of the target object according to the aspect ratio or the aspect ratio of the callout frame and the offset comprises:

From the aspect ratio and the aspect ratio of the callout frame, determine the maximum ratio;

The allocation information of the target object is determined according to the maximum ratio of the marked frame and the offset.

21. The method according to claim 20, wherein the determining the allocation information of the target object according to the maximum ratio of the callout frame and the offset comprises:

Process the maximum ratio of the marked frame according to a preset activation function to obtain a processing result;

According to the ratio of the offset and the processing result, the allocation information of the target object is determined.

22. The method according to claim 18, wherein the determining the allocation information of the target object according to the first dimension value of the callout box and the offset comprises:

The allocation information of the target object is determined according to the length-to-height ratio or the height-to-length ratio of the callout frame and the offset.

23. The method according to claim 18, wherein the determining the allocation information of the target object according to the first dimension value of the callout box and the offset comprises:

The allocation information of the target object is determined according to the horizontal coordinate information and the vertical coordinate information of the callout frame and the offset.

24. The method according to claim 17, wherein the calculating the offset of the target object and each boundary of the callout box comprises:

Calculate the offset between each pixel of the target object and each boundary of the labeling frame;

Among the offsets between each pixel and each boundary of the annotation frame, the maximum offset is determined as the offset between the target object and each boundary of the annotation frame.

25. The method according to claim 16, wherein the obtaining sample images for training a target detection model comprises:

Perform offset enhancement on the original annotation data of the original sample image to obtain the offset enhanced annotation frame;

According to the original sample image and the annotation frame, a sample image for training the target detection model is obtained.

26. The method according to claim 25, wherein, performing offset enhancement on the original annotation data of the original sample image to obtain an offset-enhanced annotation frame, comprising:

Determine the ratio of the area formed by the original labeling data in the original sample image to the area of the training sub-region of the original sample image;

According to the ratio, determine the offset enhancement frequency;

performing disturbance enhancement on the original label data according to the offset enhancement frequency;

According to the result of perturbation enhancement, the offset enhanced annotation frame is obtained.

27. The method according to claim 25, wherein, before the offset enhancement is performed on the original annotation data of the original sample image to obtain the offset enhanced annotation frame, the method further comprises:

The original sample image is scaled by resampling based on regional pixel relationships.

28. The method of claim 16, wherein prior to said inputting said sample image into a convolutional layer of said object detection model, said method further comprises:

Initialize the target detection model;

Wherein, the initialization includes at least one of the following:

By loading pre-trained model parameters, the backbone network of the target detection model is initialized, wherein the pre-trained model parameters are parameters that are jointly trained by a plurality of preset subtasks;

The parameter locking layer in the target detection model is trained and unlocked, wherein the parameter locking layer at least includes a batch normalization layer.

29. The method of claim 18, wherein the method further comprises:

outputting the correspondence between the first dimension value and the corresponding level;

or,

Outputting the first dimension value, the assignment information corresponding to the first dimension value, and the correspondence between the levels corresponding to the assignment information.

30. The method according to claim 16, wherein the sample image includes a plurality of target objects and a plurality of corresponding annotation frames;

Before inputting the sample image into the convolution layer of the target detection model to obtain feature maps of multiple levels, the method further includes: performing multiple first dimension values corresponding to the multiple annotation frames. grouping, and determining the ratio of the number of first dimension values in each group to the number of all first dimension values;

After inputting the sample image into the convolution layer of the target detection model to obtain feature maps of multiple levels, the method further includes: judging whether the ratio is greater than a preset ratio; if it is greater, executing the Describe the operation of performing bounding box prediction on the target object in the feature map of the level corresponding to the first dimension value; otherwise, perform the operation of predicting the target object in the feature map of the level corresponding to the labeling frame. The object to perform bounding box prediction operations.

31. A data processing apparatus comprising:

a first acquisition module, configured to acquire a sample image for training a target detection model, wherein the sample image includes a label frame and a target object, and the label frame includes a first dimension value;

The second acquisition module is used to input the sample image into the convolution layer of the target detection model to obtain feature maps of multiple levels;

a prediction module, configured to perform bounding box prediction on the target object in the feature map of the level corresponding to the first dimension value;

A training module, configured to train the target detection model according to the predicted result of the bounding box and the loss function.

32. A data processing apparatus comprising:

a third acquiring module, configured to acquire an image to be detected, wherein the image to be detected includes a target object to be detected, and the target object includes a first dimension value;

an input module for inputting the image to be detected into a target detection model, wherein the target detection model generates feature maps of multiple levels corresponding to the image to be detected;

The fourth obtaining module is configured to obtain the detection result of the target object detection output of the feature map of the level corresponding to the first dimension value.

33. An electronic device, comprising: a processor, a memory, a communication interface and a communication bus, and the processor, the memory and the communication interface communicate with each other through the communication bus;

The memory is used to store at least one executable instruction, and the executable instruction causes the processor to perform an operation corresponding to the data processing method according to any one of claims 1-5; or, perform the operation as claimed in claim 6 - an operation corresponding to the data processing method described in any one of 14; or, executing an operation corresponding to the data processing method described in claim 15; or, executing the data processing method described in any one of claims 16-30 The operation corresponding to the processing method.

34. A computer storage medium on which a computer program is stored, and when the program is executed by a processor, the data processing method according to any one of claims 1-5 is realized; The data processing method according to any one of the claims; or, executing the data processing method according to claim 15; or, implementing the data processing method according to any one of claims 16-30.