CN113222874A

CN113222874A - Data enhancement method, device and equipment applied to target detection and storage medium

Info

Publication number: CN113222874A
Application number: CN202110610137.9A
Authority: CN
Inventors: 韦嘉楠; 周超勇; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-06-01
Filing date: 2021-06-01
Publication date: 2021-08-06
Anticipated expiration: 2041-06-01
Also published as: CN113222874B

Abstract

The invention discloses a data enhancement method, a device, equipment and a storage medium applied to target detection, wherein the method comprises the following steps: arranging four images in a final image frame, wherein the images all have target boundary frames; penetrating through each vertical overlapping area, and randomly generating a corresponding vertical dividing line in the final image frame; multiplying and superposing mask matrixes of the images on the two sides of the vertical dividing line with image matrixes of the images respectively to form a fused image E on one side and a fused image F on the other side; penetrating through at least one transverse overlapping area, randomly generating a transverse dividing line in a final image frame, and multiplying and superposing mask matrixes of the fused image E on one side and the fused image F on the other side with corresponding image matrixes respectively to form a final image; and taking the product of the classification vector of the classification to which each target belongs and the average value of the mask values in the target bounding box as the classification vector of each target of the final image. The invention can reduce the loss of the target bounding box while completing the data enhancement.

Description

Data enhancement method, device and equipment applied to target detection and storage medium

Technical Field

The present invention relates to the field of data enhancement technologies, and in particular, to a data enhancement method, apparatus, device, and storage medium for target detection.

Background

Machine learning is a multi-domain interdisciplinary, which is a fundamental approach to making computers intelligent, applied throughout the various domains of artificial intelligence, and machine learning algorithms attempt to mine from large amounts of data the rules underlying them and use them for prediction or classification. It should be noted that the goal of machine learning is to adapt the learned model well to the "new sample" rather than just performing well on the training sample, the ability of the learned model to adapt to the new sample is called the Generalization (Generalization) ability. And a large amount of training data is needed to obtain a model with better generalization ability through machine learning.

The task of object detection is to find all objects of interest in an image, determine their position and size, and is one of the core problems in the field of machine vision. The training data required by the target detection model should be labeled with the classification of the target and with the coordinates of the target bounding box.

Currently, data enhancement is widely used in computer vision, which expands the scale of training data sets and improves the generalization ability of models by making a series of random changes to training images to produce similar but different training samples. The currently applied data enhancement method is to respectively perform operations such as turning, clipping, color gamut change and the like on a picture, then perform splicing, clip out the overlapping area of the picture in the splicing process, only keep the picture content of the uppermost layer in the overlapping area, and clip out the picture content covered below the overlapping area, and generate new training data by using the combination of the pictures, thereby providing a large amount of training data.

However, this data enhancement method also creates a new problem, because the spliced pictures are respectively subjected to operations of flipping, cropping, color gamut changing, etc., and the cropping again of the overlapped area is more likely to cause excessive loss of the picture and the bounding box, resulting in possible loss of part of the target image, and for this reason, some solutions are: for the bounding box with a part cut out, the bounding box can be directly deleted, or the cut-out bounding box is used as a new bounding box boundary, or the coordinates of the bounding box are recalculated in a new picture by keeping the size of the original bounding box unchanged. However, these operations may change the accuracy of the bounding box, thereby affecting the training effect.

Disclosure of Invention

The invention provides a data enhancement method and device applied to target detection, electronic equipment and a storage medium, and mainly aims to reduce the loss of a target boundary frame and achieve the purpose of data enhancement.

In order to achieve the above object, the present invention provides a data enhancement method applied to target detection, including:

respectively arranging four images at four corners in a final image frame in a partially overlapped mode, wherein each image is provided with a target boundary frame marked with a target, and the target boundary frames are marked with the belonged classification of the target and the central coordinate of the target boundary frame;

penetrating through each vertical overlapping area, and randomly generating a corresponding vertical dividing line in the final image frame, wherein the vertical overlapping area is an overlapping area formed between two images on the same side in the vertical direction in the final image frame;

multiplying and superposing mask matrixes of the images on the two sides of each vertical dividing line with an image matrix of the image respectively to form a fused image E on one side and a fused image F on the other side;

randomly generating a transverse dividing line penetrating through a final image frame in the final image frame through at least one transverse overlapping area, and multiplying and overlapping mask matrixes of the fused image E on one side and the fused image F on the other side with corresponding image matrixes to form a final image, wherein the transverse overlapping area is an overlapping area formed by two images on the same side in the left-right direction in the final image frame;

and updating the center coordinates of each target boundary frame in the final image frame, and taking the product of the classification vector of each target and the average value of the mask values in the target boundary frame as the classification vector of each target of the final image.

Alternatively, the mask matrix includes the present image area mask and the non-present image area mask, the present image area mask being such that a mask value is set from an edge of one image to the dividing line in the overlapping area so as to gradually decrease from a first set value to a second set value, the mask value is set from the dividing line to an edge of the other image so as to gradually decrease from the second set value to 0, the mask value in the non-overlapping area is set to the first set value,

the non-local image area mask is a mask filled with a value of 0 outside the local image area in the common minimum envelope rectangle of the images on both sides of the dividing line, the local image area mask and the non-local image area mask together form a mask matrix of the image,

the image matrix comprises the image area elements and non-image area elements, the image area elements are the element values of the image, and the non-image area elements are 0 values filled outside the image area in the common minimum enveloping rectangle of the images at two sides of the dividing line.

Optionally, before the images are arranged at the four corners of the final image frame, the images are preprocessed by at least one of flipping, scaling and tone transformation.

Optionally, at least one corner of each of the four images is aligned with a corresponding corner of the final image frame.

Optionally, the classification vector is represented by a one-hot vector.

Optionally, an image matrix a of the one-side fused image E is obtained_eThe formula of (1) is as follows:

A_e＝M_a⊙A_a+M_b⊙A_b

wherein M is_a、A_aRespectively a mask matrix and an image matrix of one image in the one-side fused image E;

M_b、A_bthe mask matrix and the image matrix of the other image in the one-side fused image E; obtaining an image matrix A of the other side fused image F_fThe formula of (1) is as follows:

A_f＝M_c⊙A_c+M_d⊙A_d

wherein M is_c、A_cRespectively a mask matrix and an image matrix of one image in the fused image F on the other side;

M_d、A_dis the mask matrix of the other image in the other side fused image F and the image matrix thereof;

obtaining an image matrix A of the final image_mixThe formula of (1) is as follows:

A_mix＝M_e⊙A_e+M_f⊙A_f

M_eis a mask matrix of the one-side fused image E;

M_fis the mask matrix of the other side fused image F.

Optionally, the first set value is 1, and the second set value is 0.5.

The invention also provides a data enhancement device applied to target detection, which comprises:

the image arrangement module is used for arranging four images at four corners in a final image frame in a partially overlapped mode, each image is provided with a target boundary frame marked with a target, and the target boundary frames are marked with the belonged classification of the target and the central coordinate of the target boundary frame;

a dividing line generating module, configured to pass through each vertical overlapping area, and randomly generate a corresponding vertical dividing line in the final image frame, where the vertical overlapping area is an overlapping area formed between two images on the same side in the vertical direction in the final image frame, and is used to pass through at least one horizontal overlapping area, and a horizontal dividing line passing through the final image frame is randomly generated in the final image frame, and the horizontal overlapping area is an overlapping area formed by two images on the same side in the left-right direction in the final image frame;

the first image fusion module is used for multiplying and superposing mask matrixes of the images on the two sides of each vertical dividing line with the image matrixes of the images respectively so as to form a fused image E on one side and a fused image F on the other side;

the second image fusion module is used for multiplying the mask matrix of the fused image E on one side and the mask matrix of the fused image F on the other side by the corresponding image matrix respectively and superposing the mask matrixes to form a final image;

and the target updating module is used for updating the center coordinates of each target boundary frame in the final image frame, and taking the product of the classification vector of each target and the average value of the mask values in the target boundary frame as the classification vector of each target of the final image.

The present invention also provides an electronic device, including:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a data enhancement method as described above for target detection.

The present invention also provides a computer-readable storage medium, storing a computer program, characterized in that the computer program, when executed by a processor, implements a data enhancement method as described above for object detection.

According to the invention, the arranged images are re-segmented by adopting the vertical segmentation lines and the horizontal segmentation lines, and then the overlapping regions are fused according to different weights. And multiple images are fused and spliced in an overlapping area in a weight gradient mode, so that the loss of a target boundary frame is effectively reduced, and the data enhancement of the images for training a target detection model is realized.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a data enhancement method applied to target detection according to the present invention;

FIG. 2a is a schematic diagram of the arrangement of images A, B, C, D of a first embodiment provided by the present invention;

FIG. 2b is a schematic diagram of the fusion of the images A, B into a side-fused image E according to the first embodiment of the present invention;

FIG. 2c is a schematic diagram of the fusion of the image C, D of the first embodiment into the other fused image F provided by the invention;

FIG. 3a is a schematic diagram of the arrangement of images A, B, C, D of a second embodiment provided by the present invention;

FIG. 3b is a schematic view of a parting line of a second embodiment of the present invention;

FIG. 4a is a schematic diagram of the placement of an image A, B, C, D according to a third embodiment of the present invention;

FIG. 4b is a schematic view of a parting line of a third embodiment of the present invention;

FIG. 5a is a schematic diagram of the arrangement of an image A, B, C, D of a fourth embodiment provided by the present invention;

FIG. 5b is a schematic view of a parting line of a fourth embodiment of the present invention;

FIG. 6 is a schematic layout of an image A, B, C, D of a fifth embodiment provided by the present invention;

FIG. 7 is a block diagram of an embodiment of a data enhancement apparatus for target detection according to the present invention;

FIG. 8 is a schematic structural diagram of an embodiment of an electronic device implementing a data enhancement method applied to target detection according to the present invention;

the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Fig. 1 is a schematic flow chart of a data enhancement method applied to target detection according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.

First embodiment

In this embodiment, the data enhancement method applied to target detection includes:

and S10, arranging four images at four corners in a final image frame in a partially overlapped mode, wherein each image is provided with a target boundary frame marked with a target, and the target boundary frames are marked with the belonged classification of the target and the center coordinates of the target boundary frame.

As shown in fig. 2a, the final image frame is rectangular, and has a width W and a height h. The four corners refer to positions of the four preprocessed images, namely, upper left, upper right, lower left and lower right, respectively, in the final image frame, and preferably, one corner of the image is aligned with a corresponding corner of the final image frame, so that the final image formed in the final image frame is finally fused and spliced. Of course, the present invention does not exclude that an image may have a plurality of corners aligned with the corresponding corners of the final image frame, and the calculation method is not different, and only one corner alignment will be described as an example. In fig. 2a, the first image is shown laid on top left, the second image on top right, the third image on bottom left, and the fourth image on bottom right. For the sake of distinction, the four images are respectively identified herein as A, B, C, D, and the width and height of the image area correspond to w in turn_a，w_b，w_c，w_d，h_a，h_b，h_c，h_d. For clarity, fig. 2a is a drawing on the left side shown by a chain line, and a drawing on the right side shown by a chain double-dashed line. It can be seen that the right side of image a overlaps the left side of image B, the underside of image a overlaps both images C and D, the underside of image B does not overlap images C and D, and the upper side edges of images C and D are flush.

Targets are marked in each image, each target has a classification and a target boundary box to which the target belongs, and the center coordinates of each target boundary box are marked in the image. The difference between object detection and image classification is that image classification only needs to put images into categories, and object detection not only needs to classify objects, but also needs to determine the location of objects, i.e. to obtain the position information of objects in the images, and to frame the objects with an object bounding box, and to give the center coordinates of the object bounding box. Data enhancement of images requires re-presenting the position information of the target bounding box of each target in each image.

And step S20, penetrating through each vertical overlapping area, and randomly generating a corresponding vertical dividing line in the final image frame. The vertical overlapping area refers to an overlapping area formed between the left and right images.

If the image a and the image B in fig. 2a form a vertical overlapping region, and the image C and the image D also form a vertical overlapping region, a common vertical dividing line may be set in the two vertical overlapping regions through the two vertical overlapping regions, and the vertical dotted line in fig. 2a is a vertical dividing line with coordinates W_mixed. Of course, a vertical dividing line can also be provided in each case, as will be explained further below.

In step S30, the mask matrices of the images on both sides of each vertical dividing line are multiplied by the image matrix of the image and superimposed to form one-side fused image E and the other-side fused image F, respectively.

The mask matrix includes a mask value set in an overlapping area so that a first setting value gradually decreases to a second setting value from an edge of one image to the dividing line, the mask value set in a manner that the second setting value gradually decreases to 0 from the dividing line to an edge of the other image, and the mask value set in a non-overlapping area is set to the first setting value. The first set value is preferably 1, and the second set value is preferably 0.5.

The non-local image area mask is a mask filled with a value of 0 outside the local image area in the common minimum envelope rectangle of the images on both sides of the dividing line, and the local image area mask and the non-local image area mask together form a mask matrix of the image.

The image matrix comprises the image area elements and non-image area elements, the image area elements are the element values of the image, and the non-image area elements are filled with 0 values outside the image area in the common minimum enveloping rectangle of the images at two sides of the dividing line. The elements of the image refer to pixels.

Taking A, B example of left and right fusion of two images, image a on the left and image B on the right, the fusion effect to be achieved is:

in the non-overlapping area of the image A and the image B, reserving the corresponding area, namely setting the mask value of the corresponding position as 1;

in the overlapping region of image A and image B, the coordinate is W_mixedOn the vertical split lines of the columns, the images A, B are each 50% weighted, and the mask values of A, B are all 0.5.

For an A image, the mask value of the image area mask is 1 from the left edge of the B image to the left edge of the A image and the A image area below the longitudinal overlapping area, which belong to the non-overlapping area_mixedThe mask value of a column is gradually decreased from 1 to 0.5 and from W_mixedThe mask value is gradually decremented from 0.5 to 0 from the column to the right edge of the a image.

The present image area mask of the a image thus obtained is as follows:

the mask matrix M which is consistent with the common minimum envelope rectangle of the images A and B in size can be obtained by refilling the mask with 0 value in the common minimum envelope rectangle of the images A and B and in the area outside the image area_a。

For B image, the non-overlapped region is from the right edge of B image to the right edge of A image, the mask value is 1, and the non-overlapped region is from the right edge of A image to W_mixedThe mask value of a column is gradually decreased from 1 to 0.5 and from W_mixedThe mask value is gradually decremented from 0.5 to 0 from the column to the left edge of the B picture.

The mask of the B image in the present image area is thus obtained as follows:

filling 0 value mask in the common minimum envelope rectangle of image A and image B except the image area to obtain mask matrix M of B image_b。

For the A image, the elements of the image area are the areas covered by the image, the corresponding pixel values are all adopted, and the other areas except the image area in the common minimum envelope rectangle of the image A and the image B are all set to be 0, so that the image matrix A is obtained_a. For B image, its own image area element is the area covered by its image, all takes its corresponding pixel value, and the common minimum envelope rectangle of image A and image B except its own image area is set to 0, so as to obtain its image matrix A_b。

Mask matrix M_a，M_bAnd the image A, B are both matrices of the same size, and element multiplication can be implemented. Thus, an up-fusion image E obtained after progressive fusion of the image A and the image B is obtained, and the formula is as follows:

A_e＝M_a⊙A_a+M_b⊙A_b

wherein "" indicates dot product, and the image matrices of the A image and the B image are A, respectively_a，A_bThe image matrix of image E is A_e。

Thus, a fused image E on one side is obtained after the two images A, B are fused and spliced, and the size of the fused image E is consistent with the size of the common minimum envelope rectangle of the image a and the image B, and the form is shown in fig. 2B.

The same method can be adopted to obtain a mask matrix and an image matrix of the image C and the image D, wherein the mask matrix of the image C is as follows:

the mask matrix for image D is as follows:

fusing and splicing the image C and the image D to obtain a fused image F on the other side, wherein the size of the fused image F is consistent with the size of the common minimum enveloping rectangle of the image C and the image D, the form of the fused image F is shown in fig. 2C, and the image matrix A of the image F is obtained_f。

Step S40, crossing the transverse overlapping area, randomly generating a transverse dividing line in the final image frame, the coordinate of which is h, and the transverse dividing line penetrates through the final image frame_mixedAnd for the one-side fused image E and the other-side fused image F, multiplying the mask matrixes of the one-side fused image E and the other-side fused image F by the corresponding image matrixes respectively and superposing the mask matrixes to form a final image. The horizontal overlapping area is an overlapping area formed between upper and lower images.

In the transverse overlapping area, a mask value is set from the edge of the one-side fused image E to the transverse dividing line in such a manner that a first setting value gradually decreases to a second setting value, a mask value is set from the transverse dividing line to the edge of the other-side fused image F in such a manner that a second setting value gradually decreases to 0, and a mask value in the non-overlapping area is set to the first setting value, thereby obtaining the present image area mask. The first set value is preferably 1, and the second set value is preferably 0.5.

Filling masks with 0 value outside the image area in the common minimum envelope rectangle (final image frame) of the images E and F to obtain non-local image area masks, wherein the local image area masks and the non-local image area masks jointly form a mask matrix of the images, thereby obtaining a mask matrix M of the fused image E on one side_e。

Likewise, a mask matrix M of the other-side fused image F can be obtained_f。

Obtaining a mask matrix M_e，M_fThen, the corresponding image matrixes A are respectively_e、A_fMultiplying and superposing to obtain a final image matrix A_mix＝M_e⊙A_e+M_f⊙A_f。

Step S50 is to update the coordinates of the target bounding box of each image in the final image frame, and to take the product of the classification vector of each target class and the average value of the mask values in the target bounding box as the classification vector of each target of the final image.

Specifically, each image A, B, C, D has its target bounding box for target detection, but each target bounding box is represented in the form of coordinates in the original respective image, and after stitching with other images, its actual coordinates have changed, so after the final image is formed, it needs to be converted into a coordinate representation of each target bounding box in the final image.

Usually the position of the target bounding box is expressed in (P)_xi，P_yi，P_wi，P_hi) Is shown in which P is_xiRepresenting the X-axis coordinate, P, of the center point of the target bounding box in the original image_yiRepresenting the Y-axis coordinate, P, of the center point of the target bounding box in the original image_wiRepresenting the width, P, of the bounding box of the object_hiRepresenting the height of the target bounding box. For example, the upper left corner of the final image is used as the origin, and the position of the target bounding box in the original image B is (P)_xi-B，P_yi-B，P_wi-B，P_hi-B) Then the position of the target bounding box in the final image after fusion splicing is [ (W-W)_b+P_xi-B)，P_yi-B，P_wi-B，P_hi-B]. Similarly, new coordinates can be obtained for each target bounding box by calculation. Since only simple coordinate conversion is performed, it will not be described in detail herein.

Further, before each image is arranged at the position to be fused in the final image frame, each image is subjected to at least one of pre-processing of turning, scaling and tone transformation. Specifically, the flipping is to perform at least one of a horizontal flipping operation and a vertical flipping operation on the image. The flipping is described with the image as a vertical setting, the horizontal flipping refers to flipping the image 180 ° in a horizontal plane, and the vertical flipping refers to flipping the image 180 ° around a horizontal axis in the image plane.

Specifically, the scaling refers to performing a scaling up or down operation on the image.

In addition, the tone transformation includes brightness, saturation, and other operations, which all may cause the image to change, and will not be described in detail herein.

Further, the classification vector may be represented in the form of a one-hot vector. For example, the average value of the mask matrix corresponding to the region in a certain target bounding box in the image a is 0.8, the classification vector corresponding to the target bounding box is represented by a one-hot vector (0, 1, 0, 0), and the one-hot vector after fusion splicing is (0, 0.8, 0, 0).

It should be noted that, in steps S30 and S40, the two images on the two sides of the horizontal dividing line are fused, and then the images on the two sides of the horizontal dividing line are fused, and steps S30 and S40 are only exemplary, and the horizontal dividing line may be generated, the two images on the two sides of the horizontal dividing line are fused, and then the two images are fused into the vertical dividing line, and then the images on the two sides of the vertical dividing line are fused.

In addition, fig. 2a to 2c show that the vertical dividing line takes the form of a strip, and the vertical overlapping area may also be as shown in fig. 3a to 3 b. Fig. 3a shows the arrangement of the image A, B, C, D, and the hatching in fig. 3b shows the vertical overlap area and the horizontal overlap area formed separately, and it can be seen that there are two vertical overlap areas and one horizontal overlap area. And one vertical dividing line cannot pass through two vertical overlapping areas, two vertical dividing lines can be arranged. And fusing the images on the two sides of each vertical dividing line, then generating a transverse dividing line, and fusing the fused images on the two sides of the transverse dividing line. The specific fusion mode is the same as the method, and the detailed description is omitted here.

Fig. 4a is another arrangement of images A, B, C, D, whereby the vertical overlap region and the lateral overlap region are seen in cross-section in fig. 4 b. The number of the vertical overlapping areas and the number of the horizontal overlapping areas are two, the horizontal dividing line cannot penetrate through the two horizontal overlapping areas at the same time, and the horizontal section line only penetrates through one horizontal overlapping area at the moment.

Fig. 5a is another arrangement of images A, B, C, D, whereby the vertical overlap region and the lateral overlap region are seen in cross-section in fig. 5 b. The number of the vertical overlapping areas and the number of the transverse overlapping areas are two, and the transverse section line can simultaneously penetrate through the two transverse overlapping areas or only penetrate through one transverse overlapping area.

In addition, the above is described in the form that the corners of the image A, B, C, D are aligned with the corners of the final image frame, but the present application is not limited to this arrangement, and may be in the form that the corners are not aligned, for example, in fig. 6, the corners of the image C are not completely aligned with the lower left corner. Similarly, the image a and the image B are fused into the fused image E on one side, and the image C and the image D are fused into the fused image F on the other side, and the fusion is performed according to the method described above.

Second embodiment

In addition, fig. 2a to 2c show that the vertical dividing line takes the form of a strip, and the vertical overlapping area may also be as shown in fig. 3a to 3 b. Fig. 3a shows the arrangement of the image A, B, C, D, and the hatching in fig. 3b shows the vertical overlap area and the horizontal overlap area formed separately, and it can be seen that there are two vertical overlap areas and one horizontal overlap area. And one vertical dividing line cannot pass through two vertical overlapping areas, two vertical dividing lines can be arranged. And fusing the images on the two sides of each vertical dividing line, then generating a transverse dividing line, and fusing the fused images on the two sides of the transverse dividing line. The specific fusion manner is the same as that of the first embodiment, and the description thereof is omitted here.

Third embodiment

Fig. 4a is another arrangement of images A, B, C, D, whereby the vertical overlap region and the lateral overlap region are seen in cross-section in fig. 4 b. The number of the vertical overlapping areas and the number of the horizontal overlapping areas are two, the horizontal dividing line cannot penetrate through the two horizontal overlapping areas at the same time, and the horizontal section line only penetrates through one horizontal overlapping area at the moment. The specific image fusion method is the same as that of the first embodiment, and the description thereof is omitted here.

Fourth embodiment

Fig. 5a is another arrangement of images A, B, C, D, whereby the vertical overlap region and the lateral overlap region are seen in cross-section in fig. 5 b. The number of the vertical overlapping areas and the number of the transverse overlapping areas are two, and the transverse section line can simultaneously penetrate through the two transverse overlapping areas or only penetrate through one transverse overlapping area. The specific image fusion method is the same as that of the first embodiment, and the description thereof is omitted here.

Fifth embodiment

In addition, the above is described in the form that the corners of the image A, B, C, D are aligned with the corners of the final image frame, but the present application is not limited to this arrangement, and may be in the form that the corners are not aligned, for example, in fig. 6, the corners of the image C are not completely aligned with the lower left corner. Similarly, the image a and the image B are fused into the fused image E on one side, and the image C and the image D are fused into the fused image F on the other side, and the fusion is performed according to the method described above. The specific image fusion method is the same as that of the first embodiment, and the description thereof is omitted here.

Fig. 7 is a schematic diagram of functional modules of an embodiment of a data enhancement apparatus for object detection according to the present invention.

The data enhancement apparatus 100 applied to object detection of the present invention may be installed in an electronic device. According to the implemented functions, the data enhancement device 100 applied to object detection may include an image arrangement module 101, a dividing line generation module 102, a first image fusion module 103, a second image fusion module 104, and an object update module 105, which are a series of computer program segments capable of being executed by a processor of an electronic device and performing fixed functions, and are stored in a memory of the electronic device.

In the present embodiment, the functions of the modules are as follows:

the image arrangement module 101 is configured to arrange four images at four corners of a final image frame in a partially overlapping manner, where each image has a target bounding box labeled with a target therein, and the target bounding boxes are labeled with a category to which the target belongs and a center coordinate of the target bounding box.

As shown in fig. 2a, the final image frame is rectangular, and has a width W and a height h. And the four corners refer to the positions of the four preprocessed images, namely, the upper left position, the upper right position, the lower left position and the lower right position, in the final image frame, and preferably, the corners of the images are aligned with the corners of the final image frame so as to form the final image in the final image frame by final fusion splicing. In fig. 2a, the first image is shown laid on top left, the second image on top right, the third image on bottom left, and the fourth image on bottom right. For the sake of distinction, the four images are respectively identified herein as A, B, C, D, and the width and height of the image area correspond to w in turn_a，w_b，w_c，w_d，h_a，h_b，h_c，h_d. For clarity, fig. 2a is a drawing on the left side shown by a chain line, and a drawing on the right side shown by a chain double-dashed line. It can be seen that the right side of image a overlaps the left side of image B, the underside of image a overlaps both images C and D, the underside of image B does not overlap images C and D, and the upper side edges of images C and D are flush.

The dividing line generating module 102 is configured to pass through each vertical overlapping area, and randomly generate a corresponding vertical dividing line in the final image frame. And generating a lateral split line across the at least one lateral overlap region. The vertical overlapping area refers to an overlapping area formed between the left and right images in the final image frame. The lateral overlapping area refers to an overlapping area formed between images arranged one above the other in the final image frame.

And the first image fusion module 103 is used for multiplying the mask matrixes of the images on the two sides of each vertical dividing line with the image matrix of the image respectively and superposing the mask matrixes to form a fused image E on one side and a fused image F on the other side.

The present image area mask of the a image thus obtained is as follows:

The mask of the B image in the present image area is thus obtained as follows:

filling a mask of 0 value in the common minimum envelope rectangle of images A and B in the region outside the image region, i.e. filling the maskMask matrix M of B image can be obtained_b。

A_e＝M_a⊙A_a+M_b⊙A_b

the mask matrix for image D is as follows:

fusing and splicing the image C and the image D to obtain a fused image F on the other side, wherein the size of the fused image F is the common minimum envelope moment of the image C and the image DThe shapes are of uniform size and form as shown in FIG. 2c, thus obtaining an image matrix A for image F_f。

And the second image fusion module 104 is configured to multiply and superimpose the mask matrix of the one-side fused image E and the mask matrix of the other-side fused image F with the corresponding image matrix to form a final image. The horizontal overlapping area is an overlapping area formed between upper and lower images.

Likewise, a mask matrix M of the other-side fused image F can be obtained_f。

And the target updating module 105 is used for updating the coordinates of the target boundary box of each image in the final image frame, and taking the product of the classification vector of each target belonging to the classification and the average value of the mask values in the target boundary box as the classification vector of each target of the final image.

Fig. 8 is a schematic structural diagram of an embodiment of an electronic device for implementing the data enhancement method applied to target detection according to the present invention.

The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program stored in the memory 11 and executable on the processor 10, such as a data enhancement program 12 applied to object detection.

The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of a data enhancement program applied to object detection, etc., but also to temporarily store data that has been output or is to be output.

The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various call interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., data enhancement programs applied to object detection, etc.) stored in the memory 11 and calling data stored in the memory 11.

The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.

Fig. 8 only shows an electronic device with components, and it will be understood by those skilled in the art that the structure shown in fig. 8 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than those shown, or some components may be combined, or a different arrangement of components.

For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and optionally, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

Further, the electronic device 1 may further include a network call interface, and optionally, the network call interface may include a wired call interface and/or a wireless call interface (e.g., a WI-FI call interface, a bluetooth call interface, etc.), which are generally used to establish a communication connection between the electronic device 1 and another electronic device.

Optionally, the electronic device 1 may further include a user call interface, where the user call interface may be a Display (Display), an input unit (such as a Keyboard (Keyboard)), and optionally, the user call interface may also be a standard wired call interface or a wireless call interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.

It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.

The memory 11 in the electronic device 1 stores a data enhancement program 12 for object detection, which is a combination of instructions that, when executed in the processor 10, enable:

The specific operation flow is as shown in fig. 1, and may specifically refer to the description of the data enhancement method applied to target detection in fig. 1, which is not described herein again.

Further, the integrated modules of the electronic device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as independent products. The computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).

In the embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A data enhancement method applied to target detection, comprising:

2. The data enhancement method applied to object detection of claim 1,

the mask matrix includes a present image area mask in which a mask value is set so that a first setting value gradually decreases to a second setting value from an edge of one image to the dividing line in the overlapping area, the mask value is set so that the second setting value gradually decreases to 0 from the dividing line to an edge of the other image, and the mask value in the non-overlapping area is set to the first setting value,

3. The data enhancement method applied to object detection of claim 1,

before arranging each image at the four corners of the final image frame, preprocessing at least one of turning, scaling and tone conversion is carried out on each image.

4. The data enhancement method applied to object detection according to claim 2,

at least one corner of each of the four images is aligned with a corresponding corner of the final image frame.

5. The data enhancement method applied to object detection according to claim 2,

the classification vector is represented by a one-hot vector.

6. The data enhancement method applied to object detection of claim 1,

obtaining an image matrix A of the one-side fused image E_eThe formula of (1) is as follows:

A_e＝M_a⊙A_a+M_b⊙A_b

M_b、A_bthe mask matrix and the image matrix of the other image in the one-side fused image E;

obtaining an image matrix A of the other side fused image F_fThe formula of (1) is as follows:

A_f＝M_c⊙A_c+M_d⊙A_d

wherein M is_c、A_cA mask matrix of one image in the other side fused image F and an image matrix thereof；

A_mix＝M_e⊙A_e+M_f⊙A_f

M_eis a mask matrix of the one-side fused image E;

M_fis the mask matrix of the other side fused image F.

7. The data enhancement method applied to object detection according to claim 2,

the first set value is 1, and the second set value is 0.5.

8. A data enhancement apparatus for object detection, comprising:

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a data enhancement method for object detection as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a data enhancement method applied to object detection as claimed in any one of claims 1 to 7.