Disclosure of Invention
To solve the technical problem or at least partially solve the technical problem, the invention provides a sample generation method, a sample generation device, an electronic device and a computer-readable storage medium.
In a first aspect, the present invention provides a sample generation method, including:
acquiring an initial sample image containing a pig body and a plurality of background images;
extracting a pig body mask image from the initial sample image;
performing foreground enhancement on the pig body mask image to obtain a plurality of enhanced pig body mask images;
and respectively fusing each enhanced pig body mask image with a plurality of background images to obtain a plurality of training sample images and labeling information.
Optionally, the fusing each enhanced pig body mask image with the plurality of background images respectively to obtain a plurality of training sample images and labeling information includes:
randomly generating an adding position of each enhanced pig body mask image in any background image when each enhanced pig body mask image is fused with the background image;
covering the enhanced pig body mask image on the background image according to the adding position;
carrying out image fusion on the enhanced pig body mask image and the background image to obtain a training sample image;
and generating the marking information according to the adding position.
Optionally, overlaying the enhanced pig body mask image on the background image according to the adding position, including:
acquiring the image size of the enhanced pig body mask image;
calculating an adding area according to the adding position and the image size of the enhanced pig body mask image;
adding the enhanced pig body mask image into the addition region.
Optionally, image fusion is performed on the enhanced pig body mask image and the background image to obtain a training sample image, including:
calculating a first gradient field of the enhanced pig body mask image according to the pixel value of each pixel point of the enhanced pig body mask image;
calculating a second gradient field of the regional image except the adding region in the background image according to the pixel points except the adding region in the background image;
calculating a gradient field of an image to be reconstructed according to the first gradient field and the second gradient field;
calculating divergence according to the gradient field of the image to be reconstructed;
and calculating the RGB value of each pixel point of the training sample image according to the divergence, a preset coefficient matrix and a Poisson equation.
Optionally, generating the labeling information according to the adding position includes:
acquiring the image size of the enhanced pig body mask image;
determining an annotation range according to the adding position and the image size;
and generating the labeling information according to the labeling range.
Optionally, the foreground enhancement mode includes: one or more of translation, rotation, scaling, horizontal flipping, vertical flipping, blur processing, and adding salt and pepper noise.
In a second aspect, the present invention provides a sample generation device comprising:
the acquisition module is used for acquiring an initial sample image containing a pig body and a plurality of background images;
the extraction module is used for extracting a pig body mask image from the initial sample image;
the foreground enhancement module is used for carrying out foreground enhancement on the pig body mask image to obtain a plurality of enhanced pig body mask images;
and the fusion processing module is used for respectively fusing each enhanced pig body mask image with a plurality of background images to obtain a plurality of training sample images and labeling information.
Optionally, the fusion processing module includes:
the position generating unit is used for randomly generating the adding positions of the enhanced pig body mask images in the background images when each enhanced pig body mask image is fused with any background image;
the covering unit is used for covering the enhanced pig body mask image on the background image according to the adding position;
the fusion unit is used for carrying out image fusion on the enhanced pig body mask image and the background image to obtain a training sample image;
and the generating unit is used for generating the marking information according to the adding position.
Optionally, the covering unit includes:
the first acquisition subunit is used for acquiring the image size of the mask image of the enhanced pig body;
the first calculating subunit is used for calculating an adding area according to the adding position and the image size of the enhanced pig body mask image;
and the adding subunit is used for adding the enhanced pig body mask image into the adding area.
Optionally, the fusion unit includes:
the second calculation subunit is used for calculating a first gradient field of the enhanced pig body mask image according to the pixel value of each pixel point of the enhanced pig body mask image;
the third calculation subunit is used for calculating a second gradient field of the regional image except the adding region in the background image according to each pixel point except the adding region in the background image;
a fourth calculation unit, configured to calculate a gradient field of the image to be reconstructed according to the first gradient field and the second gradient field;
the fifth calculating unit is used for calculating the divergence according to the gradient field of the image to be reconstructed;
and the sixth calculating unit is used for calculating the RGB value of each pixel point of the training sample image according to the divergence, the preset coefficient matrix and the Poisson equation.
Optionally, the generating unit includes:
the second acquiring subunit is used for acquiring the image size of the enhanced pig mask image;
the determining subunit is used for determining an annotation range according to the adding position and the image size;
and the generating subunit is used for generating the labeling information according to the labeling range.
Optionally, the foreground enhancement mode includes: one or more of translation, rotation, scaling, horizontal flipping, vertical flipping, blur processing, and adding salt and pepper noise.
In a second aspect, the present invention provides an electronic device, including a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
a processor for implementing the method of generating a sample according to any one of the first aspect when executing a program stored in the memory.
In a second aspect, the present invention provides a computer-readable storage medium having stored thereon a program of a sample generation method, which when executed by a processor, implements the steps of the sample generation method of any one of the first aspects.
Compared with the prior art, the technical scheme provided by the embodiment of the invention has the following advantages:
the method comprises the steps of obtaining an initial sample image containing a pig body and a plurality of background images; extracting a pig body mask image from the initial sample image; performing foreground enhancement on the pig body mask image to obtain a plurality of enhanced pig body mask images; and finally, fusing each enhanced pig body mask image with a plurality of background images respectively to obtain a plurality of training sample images and marking information.
According to the embodiment of the invention, because the foreground enhancement is only performed on the pig mask image, and the data enhancement processing is not performed on the whole image, the loss of foreground information can be reduced; the mask image of the enhanced pig body is fused with the background images to obtain a plurality of training sample images, and a large number of different background images are used, so that a model obtained by training based on the training sample images with rich background images can be prevented from being influenced by the change of the background images in subsequent use, and the embodiment of the invention can avoid the over-fitting problem, namely, the model obtained by training the training sample by using a single background image can be avoided, and in the subsequent use process, the situation that the category cannot be identified due to the fact that the images to be classified are acquired in other scenes and the background images are different from the training sample images is avoided; furthermore, the embodiment of the invention can directly obtain the labeling information corresponding to each training sample image while obtaining a large number of training sample images, can directly input the model for training, does not need manual labeling, and saves the labor cost.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
The inventor finds that: since existing data enhancement is mainly based on some basic transformations of the whole image, information loss of the foreground may occur, and since there is no change to the fixed scene or limited change to the background complexity, it is easy to create a situation of overfitting to the scene. To this end, in a sample generation method, an apparatus, an electronic device, and a computer-readable storage medium provided in an embodiment of the present invention, as shown in fig. 1, the sample generation method may include the following steps:
step S101, obtaining an initial sample image containing a pig body and a plurality of background images;
in the embodiment of the present invention, the initial sample image may include the whole or part of the pig body, and the background image refers to an image acquired under different backgrounds, and is used as a background image in subsequent image fusion, for example: images collected at any angle in a pigsty, images collected at any angle indoors and outdoors, and the like.
Step S102, extracting a pig body mask image from the initial sample image;
in this step, a pig body mask image may be extracted from the initial sample image by a semantic segmentation technique.
Step S103, performing foreground enhancement on the pig body mask image to obtain a plurality of enhanced pig body mask images;
in the embodiment of the present invention, the foreground enhancement mode includes: one or more of translation, rotation, scaling, horizontal flipping, vertical flipping, blur processing, and adding salt and pepper noise.
Through multiple foreground enhancement, a plurality of enhanced pig body mask images can be obtained, and preparation is made for generating a large number of training sample images.
And S104, respectively fusing each enhanced pig body mask image with a plurality of background images to obtain a plurality of training sample images and labeling information.
As shown in fig. 2, step S104 may include the steps of:
step S201, when each enhanced pig body mask image is fused with any background image, randomly generating an adding position of the enhanced pig body mask image in the background image;
step S202, covering the enhanced pig body mask image on the background image according to the adding position;
in this step, the image size of the enhanced pig body mask image may be acquired, an adding region may be calculated according to the adding position and the image size of the enhanced pig body mask image, and the enhanced pig body mask image may be added to the adding region.
Step S203, carrying out image fusion on the enhanced pig body mask image and the background image to obtain a training sample image;
as shown in fig. 3, step S203 may include the steps of:
step S301, calculating a first gradient field of the enhanced pig body mask image according to the pixel value of each pixel point of the enhanced pig body mask image;
step S302, calculating a second gradient field of the regional image except the adding region in the background image according to each pixel point except the adding region in the background image;
step S303, calculating a gradient field of an image to be reconstructed according to the first gradient field and the second gradient field;
step S304, calculating divergence according to the gradient field of the image to be reconstructed;
and S305, calculating the RGB value of each pixel point of the training sample image according to the divergence, the preset coefficient matrix and the Poisson equation.
In the embodiment of the present invention, the poisson equation may be Ax ═ b, where b is divergence, a is a preset coefficient matrix, and x is an RGB value of each pixel point of the training sample image.
In this step, the fused result can be quickly calculated by using the seamlessClone () method in OpenCV.
According to the embodiment of the invention, the computer vision Poisson fusion method is utilized, the mask image of the enhanced pig body is naturally fused into different background images, the edge can be well attached to the background, and the simulation degree is very high.
Step S204, generating the labeling information according to the adding position;
in the step, the image size of the enhanced pig mask image can be obtained, an annotation range is determined according to the adding position and the image size, and the annotation information is generated according to the annotation range.
The method comprises the steps of obtaining an initial sample image containing a pig body and a plurality of background images; extracting a pig body mask image from the initial sample image; performing foreground enhancement on the pig body mask image to obtain a plurality of enhanced pig body mask images; and finally, fusing each enhanced pig body mask image with a plurality of background images respectively to obtain a plurality of training sample images and marking information.
According to the embodiment of the invention, because the foreground enhancement is only performed on the pig mask image, and the data enhancement processing is not performed on the whole image, the loss of foreground information can be reduced; the mask image of the enhanced pig body is fused with the background images to obtain a plurality of training sample images, and a large number of different background images are used, so that a model obtained by training based on the training sample images with rich background images can be prevented from being influenced by the change of the background images in subsequent use, and the embodiment of the invention can avoid the over-fitting problem, namely, the model obtained by training the training sample by using a single background image can be avoided, and in the subsequent use process, the situation that the category cannot be identified due to the fact that the images to be classified are acquired in other scenes and the background images are different from the training sample images is avoided; furthermore, the embodiment of the invention can directly obtain the labeling information corresponding to each training sample image while obtaining a large number of training sample images, can directly input the model for training, does not need manual labeling, and saves the labor cost.
In still another embodiment of the present invention, there is also provided a sample generation apparatus, as shown in fig. 4, including:
the acquisition module 11 is used for acquiring an initial sample image containing a pig body and a plurality of background images;
an extraction module 12, configured to extract a pig mask image from the initial sample image;
the foreground enhancement module 13 is used for performing foreground enhancement on the pig body mask image to obtain a plurality of enhanced pig body mask images;
and the fusion processing module 14 is configured to perform fusion processing on each enhanced pig body mask image and the plurality of background images respectively to obtain a plurality of training sample images and labeling information.
The method comprises the steps of obtaining an initial sample image containing a pig body and a plurality of background images; extracting a pig body mask image from the initial sample image; performing foreground enhancement on the pig body mask image to obtain a plurality of enhanced pig body mask images; and finally, fusing each enhanced pig body mask image with a plurality of background images respectively to obtain a plurality of training sample images and marking information.
According to the embodiment of the invention, because the foreground enhancement is only performed on the pig mask image, and the data enhancement processing is not performed on the whole image, the loss of foreground information can be reduced; the mask image of the enhanced pig body is fused with the background images to obtain a plurality of training sample images, and a large number of different background images are used, so that a model obtained by training based on the training sample images with rich background images can be prevented from being influenced by the change of the background images in subsequent use, and the embodiment of the invention can avoid the over-fitting problem, namely, the model obtained by training the training sample by using a single background image can be avoided, and in the subsequent use process, the situation that the category cannot be identified due to the fact that the images to be classified are acquired in other scenes and the background images are different from the training sample images is avoided; furthermore, the embodiment of the invention can directly obtain the labeling information corresponding to each training sample image while obtaining a large number of training sample images, can directly input the model for training, does not need manual labeling, and saves the labor cost.
In another embodiment of the present invention, the fusion processing module 14 includes:
the position generating unit is used for randomly generating the adding positions of the enhanced pig body mask images in the background images when each enhanced pig body mask image is fused with any background image;
the covering unit is used for covering the enhanced pig body mask image on the background image according to the adding position;
the fusion unit is used for carrying out image fusion on the enhanced pig body mask image and the background image to obtain a training sample image;
the generating unit is used for generating the labeling information according to the adding position;
in yet another embodiment of the present invention, the covering unit includes:
the first acquisition subunit is used for acquiring the image size of the mask image of the enhanced pig body;
the first calculating subunit is used for calculating an adding area according to the adding position and the image size of the enhanced pig body mask image;
and the adding subunit is used for adding the enhanced pig body mask image into the adding area.
In another embodiment of the present invention, the fusion unit includes:
the second calculation subunit is used for calculating a first gradient field of the enhanced pig body mask image according to the pixel value of each pixel point of the enhanced pig body mask image;
the third calculation subunit is used for calculating a second gradient field of the regional image except the adding region in the background image according to each pixel point except the adding region in the background image;
a fourth calculation unit, configured to calculate a gradient field of the image to be reconstructed according to the first gradient field and the second gradient field;
the fifth calculating unit is used for calculating the divergence according to the gradient field of the image to be reconstructed;
and the sixth calculating unit is used for calculating the RGB value of each pixel point of the training sample image according to the divergence, the preset coefficient matrix and the Poisson equation.
In another embodiment of the present invention, the generating unit includes:
the second acquiring subunit is used for acquiring the image size of the enhanced pig mask image;
the determining subunit is used for determining an annotation range according to the adding position and the image size;
and the generating subunit is used for generating the labeling information according to the labeling range.
In yet another embodiment of the present invention, the manner of foreground enhancement includes: one or more of translation, rotation, scaling, horizontal flipping, vertical flipping, blur processing, and adding salt and pepper noise.
In another embodiment of the present invention, an electronic device is further provided, which includes a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the sample generation method of the embodiment of the method when executing the program stored in the memory.
According to the electronic equipment provided by the embodiment of the invention, the processor realizes the acquisition of the initial sample image containing the pig body and a plurality of background images by executing the program stored in the memory; extracting a pig body mask image from the initial sample image; performing foreground enhancement on the pig body mask image to obtain a plurality of enhanced pig body mask images; and respectively fusing each enhanced pig body mask image with a plurality of background images to obtain a plurality of training sample images and labeling information. Because only the foreground enhancement is carried out on the pig body mask image, the data enhancement processing is not carried out on the whole image, the loss of foreground information can be reduced; the mask image of the enhanced pig body is fused with the background images to obtain a plurality of training sample images, and a large number of different background images are used, so that a model obtained by training based on the training sample images with rich background images can be prevented from being influenced by the change of the background images in subsequent use, and the embodiment of the invention can avoid the over-fitting problem, namely, the model obtained by training the training sample by using a single background image can be avoided, and in the subsequent use process, the situation that the category cannot be identified due to the fact that the images to be classified are acquired in other scenes and the background images are different from the training sample images is avoided; furthermore, the embodiment of the invention can directly obtain the labeling information corresponding to each training sample image while obtaining a large number of training sample images, can directly input the model for training, does not need manual labeling, and saves the labor cost.
The communication bus 1140 mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 1140 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
The communication interface 1120 is used for communication between the electronic device and other devices.
The memory 1130 may include a Random Access Memory (RAM), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The processor 1110 may be a general-purpose processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the integrated circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components.
In yet another embodiment of the present invention, there is also provided a computer-readable storage medium having stored thereon a program of a sample generation method, which when executed by a processor, implements the steps of the sample generation method described in the aforementioned method embodiments.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the invention are brought about in whole or in part when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk (ssd)), among others.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.